4,810 Matching Annotations
  1. Last 7 days
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Using a cross-modal sensory selection task in head-fixed mice, the authors attempted to characterize how different rules reconfigured representations of sensory stimuli and behavioral reports in sensory (S1, S2) and premotor cortical areas (medial motor cortex or MM, and ALM). They used silicon probe recordings during behavior, a combination of single-cell and population-level analyses of neural data, and optogenetic inhibition during the task.

      Strengths:

      A major strength of the manuscript was the clarity of the writing and motivation for experiments and analyses. The behavioral paradigm is somewhat simple but well-designed and wellcontrolled. The neural analyses were sophisticated, clearly presented, and generally supported the authors' interpretations. The statistics are clearly reported and easy to interpret. In general, my view is that the authors achieved their aims. They found that different rules affected preparatory activity in premotor areas, but not sensory areas, consistent with dynamical systems perspectives in the field that hold that initial conditions are important for determining trial-based dynamics.

      Weaknesses:

      The manuscript was generally strong. The main weakness in my view was in interpreting the optogenetic results. While the simplicity of the task was helpful for analyzing the neural data, I think it limited the informativeness of the perturbation experiments. The behavioral read-out was low dimensional -a change in hit rate or false alarm rate- but it was unclear what perceptual or cognitive process was disrupted that led to changes in these read-outs. This is a challenge for the field, and not just this paper, but was the main weakness in my view. I have some minor technical comments in the recommendations for authors that might address other minor weaknesses.

      I think this is a well-performed, well-written, and interesting study that shows differences in rule representations in sensory and premotor areas and finds that rules reconfigure preparatory activity in the motor cortex to support flexible behavior.

      Reviewer #2 (Public Review):

      Summary:

      Chang et al. investigate neuronal activity firing patterns across various cortical regions in an interesting context-dependent tactile vs visual detection task, developed previously by the authors (Chevee et al., 2021; doi: 10.1016/j.neuron.2021.11.013). The authors report the important involvement of a medial frontal cortical region (MM, probably a similar location to wM2 as described in Esmaeili et al., 2021 & 2022; doi: 10.1016/j.neuron.2021.05.005; doi: 10.1371/journal.pbio.3001667) in mice for determining task rules.

      Strengths:

      The experiments appear to have been well carried out and the data well analysed. The manuscript clearly describes the motivation for the analyses and reaches clear and well-justified conclusions. I find the manuscript interesting and exciting!

      Weaknesses:

      I did not find any major weaknesses.

      Reviewer #3 (Public Review):

      This study examines context-dependent stimulus selection by recording neural activity from several sensory and motor cortical areas along a sensorimotor pathway, including S1, S2, MM, and ALM. Mice are trained to either withhold licking or perform directional licking in response to visual or tactile stimulus. Depending on the task rule, the mice have to respond to one stimulus modality while ignoring the other. Neural activity to the same tactile stimulus is modulated by task in all the areas recorded, with significant activity changes in a subset of neurons and population activity occupying distinct activity subspaces. Recordings further reveal a contextual signal in the pre-stimulus baseline activity that differentiates task context. This signal is correlated with subsequent task modulation of stimulus activity. Comparison across brain areas shows that this contextual signal is stronger in frontal cortical regions than in sensory regions. Analyses link this signal to behavior by showing that it tracks the behavioral performance switch during task rule transitions. Silencing activity in frontal cortical regions during the baseline period impairs behavioral performance.

      Overall, this is a superb study with solid results and thorough controls. The results are relevant for context-specific neural computation and provide a neural substrate that will surely inspire follow-up mechanistic investigations. We only have a couple of suggestions to help the authors further improve the paper.

      (1) We have a comment regarding the calculation of the choice CD in Fig S3. The text on page 7 concludes that "Choice coding dimensions change with task rule". However, the motor choice response is different across blocks, i.e. lick right vs. no lick for one task and lick left vs. no lick for the other task. Therefore, the differences in the choice CD may be simply due to the motor response being different across the tasks and not due to the task rule per se. The authors may consider adding this caveat in their interpretation. This should not affect their main conclusion.

      We thank the Reviewer for the suggestion. We have discussed this caveat and performed a new analysis to calculate the choice coding dimensions using right-lick and left-lick trials (Fig. S3h) on page 8. 

      “Choice coding dimensions were obtained from left-lick and no-lick trials in respond-to-touch blocks and right-lick and no-lick trials in respond-to-light blocks. Because the required lick directions differed between the block types, the difference in choice CDs across task rules (Fig. S4f) could have been affected by the different motor responses. To rule out this possibility, we did a new version of this analysis using right-lick and left-lick trials to calculate the choice coding dimensions for both task rules. We found that the orientation of the choice coding dimension in a respond-to-touch block was still not aligned well with that in a respond-to-light block (Fig. S4h;  magnitude of dot product between the respond-to-touch choice CD and the respond-to-light choice CD, mean ± 95% CI for true vs shuffled data: S1: 0.39 ± [0.23, 0.55] vs 0.2 ± [0.1, 0.31], 10 sessions; S2: 0.32 ± [0.18, 0.46] vs 0.2 ± [0.11, 0.3], 8 sessions; MM: 0.35 ± [0.21, 0.48] vs 0.18 ± [0.11, 0.26], 9 sessions; ALM: 0.28 ± [0.17, 0.39] vs 0.21 ± [0.12, 0.31], 13 sessions).”

      We also have included the caveats for using right-lick and left-lick trials to calculate choice coding dimensions on page 13.

      “However, we also calculated choice coding dimensions using only right- and left-lick trials. In S1, S2, MM and ALM, the choice CDs calculated this way were also not aligned well across task rules (Fig. S4h), consistent with the results calculated from lick and no-lick trials (Fig. S4f). Data were limited for this analysis, however, because mice rarely licked to the unrewarded water port (# of licksunrewarded port  / # of lickstotal , respond-to-touch: 0.13, respond-to-light: 0.11). These trials usually came from rule transitions (Fig. 5a) and, in some cases, were potentially caused by exploratory behaviors. These factors could affect choice CDs.”

      (2) We have a couple of questions about the effect size on single neurons vs. population dynamics. From Fig 1, about 20% of neurons in frontal cortical regions show task rule modulation in their stimulus activity. This seems like a small effect in terms of population dynamics. There is somewhat of a disconnect from Figs 4 and S3 (for stimulus CD), which show remarkably low subspace overlap in population activity across tasks. Can the authors help bridge this disconnect? Is this because the neurons showing a difference in Fig 1 are disproportionally stimulus selective neurons?

      We thank the Reviewer for the insightful comment and agree that it is important to link the single-unit and population results. We have addressed these questions by (1) improving our analysis of task modulation of single neurons  (tHit-tCR selectivity) and (2) examining the relationship between tHit-tCR selective neurons and tHit-tCR subspace overlaps.  

      Previously, we averaged the AUC values of time bins within the stimulus window (0-150 ms, 10 ms bins). If the 95% CI on this averaged AUC value did not include 0.5, this unit was considered to show significant selectivity. This approach was highly conservative and may underestimate the percentage of units showing significant selectivity, particularly any units showing transient selectivity. In the revised manuscript, we now define a unit as showing significant tHit-tCR selectivity when three consecutive time bins (>30 ms, 10ms bins) of AUC values were significant. Using this new criterion, the percentage of tHittCR selective neurons increased compared with the previous analysis. We have updated Figure 1h and the results on page 4:

      “We found that 18-33% of neurons in these cortical areas had area under the receiver-operating curve (AUC) values significantly different from 0.5, and therefore discriminated between tHit and tCR trials (Fig. 1h; S1: 28.8%, 177 neurons; S2: 17.9%, 162 neurons; MM: 32.9%, 140 neurons; ALM: 23.4%, 256 neurons; criterion to be considered significant: Bonferroni corrected 95% CI on AUC did not include 0.5 for at least 3 consecutive 10-ms time bins).”

      Next, we have checked how tHit-tCR selective neurons were distributed across sessions. We found that the percentage of tHit-tCR selective neurons in each session varied (S1: 9-46%, S2: 0-36%, MM:25-55%, ALM:0-50%). We examined the relationship between the numbers of tHit-tCR selective neurons and tHit-tCR subspace overlaps. Sessions with more neurons showing task rule modulation tended to show lower subspace overlap, but this correlation was modest and only marginally significant (r= -0.32, p= 0.08, Pearson correlation, n= 31 sessions). While we report the percentage of neurons showing significant selectivity as a simple way to summarize single-neuron effects, this does neglect the magnitude of task rule modulation of individual neurons, which may also be relevant. 

      In summary, the apparent disconnect between the effect sizes of task modulation of single neurons and of population dynamics could be explained by (1) the percentages of tHit-tCR selective neurons were underestimated in our old analysis, (2) tHit-tCR selective neurons were not uniformly distributed among sessions, and (3) the percentages of tHit-tCR selective neurons were weakly correlated with tHit-tCR subspace overlaps. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      For the analysis of choice coding dimensions, it seems that the authors are somewhat data limited in that they cannot compare lick-right/lick-left within a block. So instead, they compare lick/no lick trials. But given that the mice are unable to initiate trials, the interpretation of the no lick trials is a bit complicated. It is not clear that the no lick trials reflect a perceptual judgment about the stimulus (i.e., a choice), or that the mice are just zoning out and not paying attention. If it's the latter case, what the authors are calling choice coding is more of an attentional or task engagement signal, which may still be interesting, but has a somewhat different interpretation than a choice coding dimension. It might be worth clarifying this point somewhere, or if I'm totally off-base, then being more clear about why lick/no lick is more consistent with choice than task engagement.

      We thank the Reviewer for raising this point. We have added a new paragraph on page 13 to clarify why we used lick/no-lick trials to calculate choice coding dimensions, and we now discuss the caveat regarding task engagement.  

      “No-lick trials included misses, which could be caused by mice not being engaged in the task. While the majority of no-lick trials were correct rejections (respond-to-touch: 75%; respond-to-light: 76%), we treated no-licks as one of the available choices in our task and included them to calculate choice coding dimensions (Fig. S4c,d,f). To ensure stable and balanced task engagement across task rules, we removed the last 20 trials of each session and used stimulus parameters that achieved similar behavioral performance for both task rules (Fig. 1d; ~75% correct for both rules).”

      In addition, to address a point made by Reviewer 3 as well as this point, we performed a new analysis to calculate choice coding dimensions using right-lick vs left-lick trials. We report this new analysis on page 8:

      “Choice coding dimensions were obtained from left-lick and no-lick trials in respond-to-touch blocks and right-lick and no-lick trials in respond-to-light blocks. Because the required lick directions differed between the block types, the difference in choice CDs across task rules (Fig. S4f) could have been affected by the different motor responses. To rule out this possibility, we did a new version of this analysis using right-lick and left-lick trials to calculate the choice coding dimensions for both task rules. We found that the orientation of the choice coding dimension in a respond-to-touch block was still not aligned well with that in a respond-to-light block (Fig. S4h;  magnitude of dot product between the respond-to-touch choice CD and the respond-to-light choice CD, mean ± 95% CI for true vs shuffled data: S1: 0.39 ± [0.23, 0.55] vs 0.2 ± [0.1, 0.31], 10 sessions; S2: 0.32 ± [0.18, 0.46] vs 0.2 ± [0.11, 0.3], 8 sessions; MM: 0.35 ± [0.21, 0.48] vs 0.18 ± [0.11, 0.26], 9 sessions; ALM: 0.28 ± [0.17, 0.39] vs 0.21 ± [0.12, 0.31], 13 sessions).” 

      We added discussion of the limitations of this new analysis on page 13:

      “However, we also calculated choice coding dimensions using only right- and left-lick trials. In S1, S2, MM and ALM, the choice CDs calculated this way were also not aligned well across task rules (Fig. S4h), consistent with the results calculated from lick and no-lick trials (Fig. S4f). Data were limited for this analysis, however, because mice rarely licked to the unrewarded water port (# of licksunrewarded port  / # of lickstotal , respond-to-touch: 0.13, respond-to-light: 0.11). These trials usually came from rule transitions (Fig. 5a) and, in some cases, were potentially caused by exploratory behaviors. These factors could affect choice CDs.”

      The authors find that the stimulus coding direction in most areas (S1, S2, and MM) was significantly aligned between the block types. How do the authors interpret that finding? That there is no major change in stimulus coding dimension, despite the change in subspace? I think I'm missing the big picture interpretation of this result.

      That there is no significant change in stimulus coding dimensions but a change in subspace suggests that the subspace change largely reflects a change in the choice coding dimensions.

      As I mentioned in the public review, I thought there was a weakness with interpretation of the optogenetic experiments, which the authors generally interpret as reflecting rule sensitivity. However, given that they are inhibiting premotor areas including ALM, one might imagine that there might also be an effect on lick production or kinematics. To rule this out, the authors compare the change in lick rate relative to licks during the ITI. What is the ITI lick rate? I assume pretty low, once the animal is welltrained, in which case there may be a floor effect that could obscure meaningful effects on lick production. In addition, based on the reported CI on delta p(lick), it looks like MM and AM did suppress lick rate. I think in the future, a task with richer behavioral read-outs (or including other measurements of behavior like video), or perhaps something like a psychological process model with parameters that reflect different perceptual or cognitive processes could help resolve the effects of perturbations more precisely.

      Eighteen and ten percent of trials had at least one lick in the ITI in respond-to-touch and  respond-tolight blocks, respectively. These relatively low rates of ITI licking could indeed make an effect of optogenetics on lick production harder to observe. We agree that future work would benefit from more complex tasks and measurements, and have added the following to make this point (page 14):

      “To more precisely dissect the effects of perturbations on different cognitive processes in rule-dependent sensory detection, more complex behavioral tasks and richer behavioral measurements are needed in the future.”

      Reviewer #2 (Recommendations For The Authors):

      I have the following minor suggestions that the authors might consider in revising this already excellent manuscript :

      (1) In addition to showing normalised z-score firing rates (e.g. Fig 1g), I think it is important to show the grand-average mean firing rates in Hz.

      We thank the Reviewer for the suggestion and have added the grand-average mean firing rates as a new supplementary figure (Fig. S2a). To provide more details about the firing rates of individual neurons, we have also added to this new figure the distribution of peak responses during the tactile stimulus period (Fig. S2b).

      (2) I think the authors could report more quantitative data in the main text. As a very basic example, I could not easily find how many neurons, sessions, and mice were used in various analyses.

      We have added relevant numbers at various points throughout the Results, including within the following examples:

      Page 3: “To examine how the task rules influenced the sensorimotor transformation occurring in the tactile processing stream, we performed single-unit recordings from sensory and motor cortical areas including S1, S2, MM and ALM (Fig. 1e-g, Fig. S1a-h, and Fig. S2a; S1: 6 mice, 10 sessions, 177 neurons, S2: 5 mice, 8 sessions, 162 neurons, MM: 7 mice, 9 sessions, 140 neurons, ALM: 8 mice, 13 sessions, 256 neurons).”

      Page 5: “As expected, single-unit activity before stimulus onset did not discriminate between tactile and visual trials (Fig. 2d; S1: 0%, 177 neurons; S2: 0%, 162 neurons; MM: 0%, 140 neurons; ALM: 0.8%, 256 neurons). After stimulus onset, more than 35% of neurons in the sensory cortical areas and approximately 15% of neurons in the motor cortical areas showed significant stimulus discriminability (Fig. 2e; S1: 37.3%, 177 neurons; S2: 35.2%, 162 neurons; MM: 15%, 140 neurons; ALM: 14.1%, 256 neurons).”

      Page 6: “Support vector machine (SVM) and Random Forest classifiers showed similar decoding abilities

      (Fig. S3a,b; medians of classification accuracy [true vs shuffled]; SVM: S1 [0.6 vs 0.53], 10 sessions, S2

      [0.61 vs 0.51], 8 sessions, MM [0.71 vs 0.51], 9 sessions, ALM [0.65 vs 0.52], 13 sessions; Random

      Forests: S1 [0.59 vs 0.52], 10 sessions, S2 [0.6 vs 0.52], 8 sessions, MM [0.65 vs 0.49], 9 sessions, ALM [0.7 vs 0.5], 13 sessions).”

      Page 6: “To assess this for the four cortical areas, we quantified how the tHit and tCR trajectories diverged from each other by calculating the Euclidean distance between matching time points for all possible pairs of tHit and tCR trajectories for a given session and then averaging these for the session (Fig. 4a,b; S1: 10 sessions, S2: 8 sessions, MM: 9 sessions, ALM: 13 sessions, individual sessions in gray and averages across sessions in black; window of analysis: -100 to 150 ms relative to stimulus onset; 10 ms bins; using the top 3 PCs; Methods).” 

      Page 8: “In contrast, we found that S1, S2 and MM had stimulus CDs that were significantly aligned between the two block types (Fig. S4e; magnitude of dot product between the respond-to-touch stimulus CDs and the respond-to-light stimulus CDs, mean ± 95% CI for true vs shuffled data: S1: 0.5 ± [0.34, 0.66] vs 0.21 ± [0.12, 0.34], 10 sessions; S2: 0.62 ± [0.43, 0.78] vs 0.22 ± [0.13, 0.31], 8 sessions; MM: 0.48 ± [0.38, 0.59] vs 0.24 ± [0.16, 0.33], 9 sessions; ALM: 0.33 ± [0.2, 0.47] vs 0.21 ± [0.13, 0.31], 13 sessions).”  Page 9: “For respond-to-touch to respond-to-light block transitions, the fractions of trials classified as respond-to-touch for MM and ALM decreased progressively over the course of the transition (Fig. 5d; rank correlation of the fractions calculated for each of the separate periods spanning the transition, Kendall’s tau, mean ± 95% CI: MM: -0.39 ± [-0.67, -0.11], 9 sessions, ALM: -0.29 ± [-0.54, -0.04], 13 sessions; criterion to be considered significant: 95% CI on Kendall’s tau did not include 0).

      Page 11: “Lick probability was unaffected during S1, S2, MM and ALM experiments for both tasks, indicating that the behavioral effects were not due to an inability to lick (Fig. 6i, j; 95% CI on Δ lick probability for cross-modal selection task: S1/S2 [-0.18, 0.24], 4 mice, 10 sessions; MM [-0.31, 0.03], 4 mice, 11 sessions; ALM [-0.24, 0.16], 4 mice, 10 sessions; Δ lick probability for simple tactile detection task: S1/S2 [-0.13, 0.31], 3 mice, 3 sessions; MM [-0.06, 0.45], 3 mice, 5 sessions; ALM [-0.18, 0.34], 3 mice, 4 sessions).”

      (3) Please include a clearer description of trial timing. Perhaps a schematic timeline of when stimuli are delivered and when licking would be rewarded. I may have missed it, but I did not find explicit mention of the timing of the reward window or if there was any delay period.

      We have added the following (page 3): 

      “For each trial, the stimulus duration was 0.15 s and an answer period extended from 0.1 to 2 s from stimulus onset.”

      (4) Please include a clear description of statistical tests in each figure legend as needed (for example please check Fig 4e legend).

      We have added details about statistical tests in the figure legends:

      Fig. 2f: “Relationship between block-type discriminability before stimulus onset and tHit-tCR discriminability after stimulus onset for units showing significant block-type discriminability prior to the stimulus. Pearson correlation: S1: r = 0.69, p = 0.056, 8 neurons; S2: r = 0.91, p = 0.093, 4 neurons; MM: r = 0.93, p < 0.001, 30 neurons; ALM: r = 0.83, p < 0.001, 26 neurons.” 

      Fig. 4e: “Subspace overlap for control tHit (gray) and tCR (purple) trials in the somatosensory and motor cortical areas. Each circle is a subspace overlap of a session. Paired t-test, tCR – control tHit: S1: -0.23, 8 sessions, p = 0.0016; S2: -0.23, 7 sessions, p = 0.0086; MM: -0.36, 5 sessions, p = <0.001; ALM: -0.35, 11 sessions, p < 0.001; significance: ** for p<0.01, *** for p<0.001.”  

      Fig. 5d,e: “Fraction of trials classified as coming from a respond-to-touch block based on the pre-stimulus population state, for trials occurring in different periods (see c) relative to respond-to-touch → respondto-light transitions. For MM (top row) and ALM (bottom row), progressively fewer trials were classified as coming from the respond-to-touch block as analysis windows shifted later relative to the rule transition. Kendall’s tau (rank correlation): MM: -0.39, 9 sessions; ALM: -0.29, 13 sessions. Left panels: individual sessions, right panels: mean ± 95% CI. Dash lines are chance levels (0.5). e, Same as d but for respond-to-light → respond-to-touch transitions. Kendall’s tau: MM: 0.37, 9 sessions; ALM: 0.27, 13 sessions.”

      Fig. 6: “Error bars show bootstrap 95% CI. Criterion to be considered significant: 95% CI did not include 0.”

      (5) P. 3 - "To examine how the task rules influenced the sensorimotor transformation occurring in the tactile processing stream, we performed single-unit recordings from sensory and motor cortical areas including S1, S2, MM, and ALM using 64-channel silicon probes (Fig. 1e-g and Fig. S1a-h)." Please specify if these areas were recorded simultaneously or not.

      We have added “We recorded from one of these cortical areas per session, using 64-channel silicon probes.”  on page 3.  

      (6) Figure 4b - Please describe what gray and black lines show.

      The gray traces are the distance between tHit and tCR trajectories in individual sessions and the black traces are the averages across sessions in different cortical areas. We have added this information on page 6 and in the Figure 4b legend. 

      Page 6: “To assess this for the four cortical areas, we quantified how the tHit and tCR trajectories diverged from each other by calculating the Euclidean distance between matching time points for all possible pairs of tHit and tCR trajectories for a given session and then averaging these for the session (Fig. 4a,b; S1: 10 sessions, S2: 8 sessions, MM: 9 sessions, ALM: 13 sessions, individual sessions in gray and averages across sessions in black; window of analysis: -100 to 150 ms relative to stimulus onset; 10 ms bins; using the top 3 PCs; Methods).

      Fig. 4b: “Distance between tHit and tCR trajectories in S1, S2, MM and ALM. Gray traces show the time varying tHit-tCR distance in individual sessions and black traces are session-averaged tHit-tCR distance (S1:10 sessions; S2: 8 sessions; MM: 9 sessions; ALM: 13 sessions).”

      (7) In addition to the analyses shown in Figure 5a, when investigating the timing of the rule switch, I think the authors should plot the left and right lick probabilities aligned to the timing of the rule switch time on a trial-by-trial basis averaged across mice.

      We thank the Reviewer for suggesting this addition. We have added a new figure panel to show the probabilities of right- and left-licks during rule transitions (Fig. 5a).

      Page 8: “The probabilities of right-licks and left-licks showed that the mice switched their motor responses during block transitions depending on task rules (Fig. 5a, mean ± 95% CI across 12 mice).” 

      (8) P. 12 - "Moreover, in a separate study using the same task (Finkel et al., unpublished), high-speed video analysis demonstrated no significant differences in whisker motion between respond-to-touch and respond-to-light blocks in most (12 of 14) behavioral sessions.". Such behavioral data is important and ideally would be included in the current analysis. Was high-speed videography carried out during electrophysiology in the current study?

      Finkel et al. has been accepted in principle for publication and will be available online shortly. Unfortunately we have not yet carried out simultaneous high-speed whisker video and electrophysiology in our cross-modal sensory selection task.

      Reviewer #3 (Recommendations For The Authors):

      (1) Minor point. For subspace overlap calculation of pre-stimulus activity in Fig 4e (light purple datapoints), please clarify whether the PCs for that condition were constructed in matched time windows. If the PCs are calculated from the stimulus period 0-150ms, the poor alignment could be due to mismatched time windows.

      We thank the Reviewer for the comment and clarify our analysis here. We previously used timematched windows to calculate subspace overlaps. However, the pre-stimulus activity was much weaker than the activity during the stimulus period, so the subspaces of reference tHit were subject to noise and we were not able to obtain reliable PCs. This caused the subspace overlap values between the reference tHit and control tHit to be low and variable (mean ± SD, S1:  0.46± 0.26, n = 8 sessions, S2: 0.46± 0.18, n = 7 sessions, MM: 0.44± 0.16, n = 5 sessions, ALM: 0.38± 0.22, n = 11 sessions).  Therefore, we used the tHit activity during the stimulus window to obtain PCs and projected pre-stimulus and stimulus activity in tCR trials onto these PCs. We have now added a more detailed description of this analysis in the Methods (page 32). 

      “To calculate the separation of subspaces prior to stimulus delivery, pre-stimulus activity in tCR trials (100 to 0 ms from stimulus onset) was projected to the PC space of the tHit reference group and the subspace overlap was calculated. In this analysis, we used tHit activity during stimulus delivery (0 to 150 ms from stimulus onset) to obtain reliable PCs.”   

      We acknowledge this time alignment issue and have now removed the reported subspace overlap between tHit and tCR during the pre-stimulus period from Figure 4e (light purple). However, we think the correlation between pre- and post- stimulus-onset subspace overlaps should remain similar regardless of the time windows that we used for calculating the PCs. For the PCs calculated from the pre-stimulus period (-100 to 0 ms), the correlation coefficient was 0.55 (Pearson correlation, p <0.01, n = 31 sessions). For the PCs calculated from the stimulus period (0-150 ms), the correlation coefficient was 0.68 (Figure 4f, Pearson correlation, p <0.001, n = 31 sessions). Therefore, we keep Figure 4f.  

      (2) Minor point. To help the readers follow the logic of the experiments, please explain why PPC and AMM were added in the later optogenetic experiment since these are not part of the electrophysiology experiment.

      We have added the following rationale on page 9.

      “We recorded from AMM in our cross-modal sensory selection task and observed visually-evoked activity (Fig. S1i-k), suggesting that AMM may play an important role in rule-dependent visual processing. PPC contributes to multisensory processing51–53 and sensory-motor integration50,54–58.  Therefore, we wanted to test the roles of these areas in our cross-modal sensory selection task.”

      (3) Minor point. We are somewhat confused about the timing of some of the example neurons shown in figure S1. For example, many neurons show visually evoked signals only after stimulus offset, unlike tactile evoked signals (e.g. Fig S1b and f). In addition, the reaction time for visual stimulus is systematically slower than tactile stimuli for many example neurons (e.g. Fig S1b) but somehow not other neurons (e.g. Fig S1g). Are these observations correct?

      These observations are all correct. We have a manuscript from a separate study using this same behavioral task (Finkel et al., accepted in principle) that examines and compares (1) the onsets of tactile- and visually-evoked activity and (2) the reaction times to tactile and visual stimuli. The reaction times to tactile stimuli were slightly but significantly shorter than the reaction times to visual stimuli (tactile vs visual, 397 ± 145 vs 521 ± 163 ms, median ± interquartile range [IQR], Tukey HSD test, p = 0.001, n =155 sessions). We examined how well activity of individual neurons in S1 could be used to discriminate the presence of the stimulus or the response of the mouse. For discriminability for the presence of the stimulus, S1 neurons could signal the presence of the tactile stimulus but not the visual stimulus. For discriminability for the response of the mouse, the onsets for significant discriminability occurred earlier for tactile compared with visual trials (two-sided Kolmogorov-Smirnov test, p = 1x10-16, n = 865 neurons with DP onset in tactile trials, n = 719 neurons with DP onset in visual trials).

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the Reviewers

      We thank the referees for their careful reading of the manuscript and their valuable suggestions for improvements.

      General Statements:

      Existing SMC-based loop extrusion models successfully predict and characterize mesoscale genome spatial organization in vertebrate organisms, providing a valuable computational tool to the genome organization and chromatin biology fields. However, to date this approach is highly limited in its application beyond vertebrate organisms. This limitation arises because existing models require knowledge of CTCF binding sites, which act as effective boundary elements, blocking loop-extruding SMC complexes and thus defining TAD boundaries. However, CTCF is the predominant boundary element only in vertebrates. On the other hand, vertebrates only contain a small proportion of species in the tree of life, while TADs are nearly universal and SMC complexes are largely conserved. Thus, there is a pressing need for loop extrusion models capable of predicting Hi-C maps in organisms beyond vertebrates.

      The conserved-current loop extrusion (CCLE) model, introduced in this manuscript, extends the quantitative application of loop extrusion models in principle to any organism by liberating the model from the lack of knowledge regarding the identities and functions of specific boundary elements. By converting the genomic distribution of loop extruding cohesin into an ensemble of dynamic loop configurations via a physics-based approach, CCLE outputs three-dimensional (3D) chromatin spatial configurations that can be manifested in simulated Hi-C maps. We demonstrate that CCLE-generated maps well describe experimental Hi-C data at the TAD-scale. Importantly, CCLE achieves high accuracy by considering cohesin-dependent loop extrusion alone, consequently both validating the loop extrusion model in general (as opposed to diffusion-capture-like models proposed as alternatives to loop extrusion) and providing evidence that cohesin-dependent loop extrusion plays a dominant role in shaping chromatin organization beyond vertebrates.

      The success of CCLE unambiguously demonstrates that knowledge of the cohesin distribution is sufficient to reconstruct TAD-scale 3D chromatin organization. Further, CCLE signifies a shifted paradigm from the concept of localized, well-defined boundary elements, manifested in the existing CTCF-based loop extrusion models, to a concept also encompassing a continuous distribution of position-dependent loop extrusion rates. This new paradigm offers greater flexibility in recapitulating diverse features in Hi-C data than strictly localized loop extrusion barriers.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This manuscript presents a mathematical model for loop extrusion called the conserved-current loop extrusion model (CCLE). The model uses cohesin ChIP-Seq data to predict the Hi-C map and shows broad agreement between experimental Hi-C maps and simulated Hi-C maps. They test the model on Hi-C data from interphase fission yeast and meiotic budding yeast. The conclusion drawn by the authors is that peaks of cohesin represent loop boundaries in these situations, which they also propose extends to other organism/situations where Ctcf is absent.

      __Response: __

      We would like to point out that the referee's interpretation of our results, namely that, "The conclusion drawn by the authors is that peaks of cohesin represent loop boundaries in these situations, ...", is an oversimplification, that we do not subscribe to. The referee's interpretation of our model is correct when there are strong, localized barriers to loop extrusion; however, the CCLE model allows for loop extrusion rates that are position-dependent and take on a range of values. The CCLE model also allows the loop extrusion model to be applied to organisms without known boundary elements. Thus, the strict interpretation of the positions of cohesin peaks to be loop boundaries overlooks a key idea to emerge from the CCLE model.

      __ Major comments:__

      1. More recent micro-C/Hi-C maps, particularly for budding yeast mitotic cells and meiotic cells show clear puncta, representative of anchored loops, which are not well recapitulated in the simulated data from this study. However, such punta are cohesin-dependent as they disappear in the absence of cohesin and are enhanced in the absence of the cohesin release factor, Wapl. For example - see the two studies below. The model is therefore missing some key elements of the loop organisation. How do the authors explain this discrepency? It would also be very useful to test whether the model can predict the increased strength of loop anchors when Wapl1 is removed and cohesin levels increase.

      Costantino L, Hsieh TS, Lamothe R, Darzacq X, Koshland D. Cohesin residency determines chromatin loop patterns. Elife. 2020 Nov 10;9:e59889. doi: 10.7554/eLife.59889. PMID: 33170773; PMCID: PMC7655110. Barton RE, Massari LF, Robertson D, Marston AL. Eco1-dependent cohesin acetylation anchors chromatin loops and cohesion to define functional meiotic chromosome domains. Elife. 2022 Feb 1;11:e74447. doi: 10.7554/eLife.74447. Epub ahead of print. PMID: 35103590; PMCID: PMC8856730.

      __Response: __

      We are perplexed by this referee comment. While we agree that puncta representing loop anchors are a feature of Hi-C maps, as noted by the referee, we would reinforce that our CCLE simulations of meiotic budding yeast (Figs. 5A and 5B of the original manuscript) demonstrate an overall excellent description of the experimental meiotic budding yeast Hi-C map, including puncta arising from loop anchors. This CCLE model-experiment agreement for meiotic budding yeast is described and discussed in detail in the original manuscript and the revised manuscript (lines 336-401).

      To further emphasize and extend this point we now also address the Hi-C of mitotic budding yeast, which was not included the original manuscript. We have now added an entire new section of the revised manuscript entitled "CCLE Describes TADs and Loop Configurations in Mitotic S. cerevisiae" including the new Figure 6, which presents a comparison between a portion of the mitotic budding yeast Hi-C map from Costantino et al. and the corresponding CCLE simulation at 500 bp-resolution. In this case too, the CCLE model well-describes the data, including the puncta, further addressing the referee's concern that the CCLE model is missing some key elements of loop organization.

      Concerning the referee's specific comment about the role of Wapl, we note that in order to apply CCLE when Wapl is removed, the corresponding cohesin ChIP-seq in the absence of Wapl should be available. To our knowledge, such data is not currently available and therefore we have not pursued this explicitly. However, we would reinforce that as Wapl is a factor that promotes cohesin unloading, its role is already effectively represented in the optimized value for LEF processivity, which encompasses LEF lifetime. In other words, if Wapl has a substantial effect it will be captured already in this model parameter.

      1. Related to the point above, the simulated data has much higher resolution than the experimental data (1kb vs 10kb in the fission yeast dataset). Given that loop size is in the 20-30kb range, a good resolution is important to see the structural features of the chromosomes. Can the model observe these details that are averaged out when the resolution is increased?

      __Response: __

      We agree with the referee that higher resolution is preferable to low resolution. In practice, however, there is a trade-off between resolution and noise. The first experimental interphase fission yeast Hi-C data of Mizuguchi et al 2014 corresponds to 10 kb resolution. To compare our CCLE simulations to these published experimental data, as described in the original manuscript, we bin our 1-kb-resolution simulations to match the 10 kb experimental measurements. Nevertheless, CCLE can readily predict the interphase fission yeast Hi-C map at higher resolution by reducing the bin size (or, if necessary, reducing the lattice site size of the simulations themselves). In the revised manuscript, we have added comparisons between CCLE's predicted Hi-C maps and newer Micro-C data for S. pombe from Hsieh et al. (Ref. [50]) in the new Supplementary Figures 5-9. We have chosen to present these comparisons at 2 kb resolution, which is the same resolution for our meiotic budding yeast comparisons. Also included in Supplementary Figures 5-9 are comparisons between the original Hi-C maps of Mizuguchi et al. and the newer maps of Hsieh et al., binned to 10 kb resolution. Inspection of these figures shows that CCLE provides a good description of Hsieh et al.'s experimental Hi-C maps and does not reveal any major new features in the interphase fission yeast Hi-C map on the 10-100 kb scale, that were not already apparent from the Hi-C maps of Mizuguchi et al 2014. Thus, the CCLE model performs well across this range of effective resolutions.

      3. Transcription, particularly convergent has been proposed to confer boundaries to loop extrusion. Can the authors recapitulate this in their model?

      __Response: __

      In response to the suggestion of the reviewer we have now calculated the correlation between cohesin ChIP-seq and the locations of convergent gene pairs, which is now presented in Supplementary Figures 17 and 18. Accordingly, in the revised manuscript, we have added the following text to the Discussion (lines 482-498):

      "In vertebrates, CTCF defines the locations of most TAD boundaries. It is interesting to ask what might play that role in interphase S. pombe as well as in meiotic and mitotic S. cerevisiae. A number of papers have suggested that convergent gene pairs are correlated with cohesin ChIP-seq in both S. pombe [65, 66] and S. cerevisiae [66-71]. Because CCLE ties TADs to cohesin ChIP-seq, a strong correlation between cohesin ChIP-seq and convergent gene pairs would be an important clue to the mechanism of TAD formation in yeasts. To investigate this correlation, we introduce a convergent-gene variable that has a nonzero value between convergent genes and an integrated weight of unity for each convergent gene pair. Supplementary Figure 17A shows the convergent gene variable, so-defined, alongside the corresponding cohesin ChIP-seq for meiotic and mitotic S. cerevisiae. It is apparent from this figure that a peak in the ChIP-seq data is accompanied by a non-zero value of the convergent-gene variable in about 80% of cases, suggesting that chromatin looping in meiotic and mitotic S. cerevisiae may indeed be tied to convergent genes. Conversely, about 50% of convergent genes match peaks in cohesin ChIP-seq. The cross-correlation between the convergent-gene variable and the ChIP-seq of meiotic and mitotic S. cerevisiae is quantified in Supplementary Figures 17B and C. By contrast, in interphase S. pombe, cross-correlation between convergent genes and cohesin ChIP-seq in each of five considered regions is unobservably small (Supplementary Figure 18A), suggesting that convergent genes per se do not have a role in defining TAD boundaries in interphase S. pombe."

      Minor comments:

      1. In the discussion, the authors cite the fact that Mis4 binding sites do not give good prediction of the HI-C maps as evidence that Mis4 is not important for loop extrusion. This can only be true if the position of Mis4 measured by ChIP is a true reflection of Mis4 position. However, Mis4 binding to cohesin/chromatin is very dynamic and it is likely that this is too short a time scale to be efficiently cross-linked for ChIP. Conversely, extensive experimental data in vivo and in vitro suggest that stimulation of cohesin's ATPase by Mis4-Ssl3 is important for loop extrusion activity.

      __Response: __

      We apologize for the confusion on this point. We actually intended to convey that the absence of Mis4-Psc3 correlations in S. pombe suggests, from the point of view of CCLE, that Mis4 is not an integral component of loop-extruding cohesin, during the loop extrusion process itself. We agree completely that Mis4/Ssl3 is surely important for cohesin loading, and (given that cohesin is required for loop extrusion) Mis4/Ssl3 is therefore important for loop extrusion. Evidently, this part of our Discussion was lacking sufficient clarity. In response to both referees' comments, we have re-written the discussion of Mis4 and Pds5 to more carefully explain our reasoning and be more circumspect in our inferences. The re-written discussion is described below in response to Referee #2's comments.

      Nevertheless, on the topic of whether Nipbl-cohesin binding is too transient to be detected in ChIP-seq, the FRAP analysis presented by Rhodes et al. eLife 6:e30000 "Scc2/Nipbl hops between chromosomal cohesin rings after loading" indicates that, in HeLa cells, Nipbl has a residence time bound to cohesin of about 50 seconds. As shown in the bottom panel of Supplementary Fig. 7 in the original manuscript (and the bottom panel of Supplementary Fig. 20 in the revised manuscript), there is a significant cross-correlation (~0.2) between the Nipbl ChIP-seq and Smc1 ChIP-seq in humans, indicating that a transient association between Nipbl and cohesin can be (and in fact is) detected by ChIP-seq.

      1. *Inclusion of a comparison of this model compared to previous models (for example bottom up models) would be extremely useful. What is the improvement of this model over existing models? *

      __Response: __

      As stated in the original manuscript, as far as we are aware, "bottom up" models, that quantitatively describe the Hi-C maps of interphase fission yeast or meiotic budding yeast or, indeed, of eukaryotes other than vertebrates, do not exist. Bottom-up models would require knowledge of the relevant boundary elements (e.g. CTCF sites), which, as stated in the submitted manuscript, are generally unknown for fission yeast, budding yeast, and other non-vertebrate eukaryotes. The absence of such models is the reason that CCLE fills an important need. Since bottom-up models for cohesin loop extrusion in yeast do not exist, we cannot compare CCLE to the results of such models.

      In the revised manuscript we now explicitly compare the CCLE model to the only bottom-up type of model describing the Hi-C maps of non-vertebrate eukaryotes by Schalbetter et al. Nat. Commun. 10:4795 2019, which we did cite extensively in our original manuscript. Schalbetter et al. use cohesin ChIP-seq peaks to define the positions of loop extrusion barriers in meiotic S. cerevisiae, for which the relevant boundary elements are unknown. In their model, specifically, when a loop-extruding cohesin anchor encounters such a boundary element, it either passes through with a certain probability, as if no boundary element is present, or stops extruding completely until the cohesin unbinds and rebinds.

      In the revised manuscript we refer to this model as the "explicit barrier" model and have applied it to interphase S. pombe, using cohesin ChIP-seq peaks to define the positions of loop extrusion barriers. The corresponding simulated Hi-C map is presented in Supplementary Fig. 19 in comparison with the experimental Hi-C. It is evident that the explicit barrier model provides a poorer description of the Hi-C data of interphase S. pombe compared to the CCLE model, as indicated by the MPR and Pearson correlation scores. While the explicit barrier model appears capable of accurately reproducing Hi-C data with punctate patterns, typically accompanied by strong peaks in the corresponding cohesin ChIP-seq, it seems less effective in several conditions including interphase S. pombe, where the Hi-C data lacks punctate patterns and sharp TAD boundaries, and the corresponding cohesin ChIP-seq shows low-contrast peaks. The success of the CCLE model in describing the Hi-C data of both S. pombe and S. cerevisiae, which exhibit very different features, suggests that the current paradigm of localized, well-defined boundary elements may not be the only approach to understanding loop extrusion. By contrast, CCLE allows for a concept of continuous distribution of position-dependent loop extrusion rates, arising from the aggregate effect of multiple interactions between loop extrusion complexes and chromatin. This paradigm offers greater flexibility in recapitulating diverse features in Hi-C data than strictly localized loop extrusion barriers.

      We have also added the following paragraph in the Discussion section of the manuscript to elaborate this point (lines 499-521):

      "Although 'bottom-up' models which incorporate explicit boundary elements do not exist for non-vertebrate eukaryotes, one may wonder how well such LEF models, if properly modified and applied, would perform in describing Hi-C maps with diverse features. To this end, we examined the performance of the model described in Ref. [49] in describing the Hi-C map of interphase S. cerevisiae. Reference [49] uses cohesin ChIP-seq peaks in meiotic S. cerevisiae to define the positions of loop extrusion barriers which either completely stall an encountering LEF anchor with a certain probability or let it pass. We apply this 'explicit barrier' model to interphase S. pombe, using its cohesin ChIP-seq peaks to define the positions of loop extrusion barriers, and using Ref. [49]'s best-fit value of 0.05 for the pass-through probability. Supplementary Figure 19A presents the corresponding simulated Hi-C map the 0.3-1.3 kb region of Chr 2 of interphase S. pombe in comparison with the corresponding Hi-C data. It is evident that the explicit barrier model provides a poorer description of the Hi-C data of interphase S. pombe compared to the CCLE model, as indicated by the MPR and Pearson correlation scores of 1.6489 and 0.2267, respectively. While the explicit barrier model appears capable of accurately reproducing Hi-C data with punctate patterns, typically accompanied by strong peaks in the corresponding cohesin ChIP-seq, it seems less effective in cases such as in interphase S. pombe, where the Hi-C data lacks punctate patterns and sharp TAD boundaries, and the corresponding cohesin ChIP-seq shows low-contrast peaks. The success of the CCLE model in describing the Hi-C data of both S. pombe and S. cerevisiae, which exhibit very different features, suggests that the current paradigm of localized, well-defined boundary elements may not be the only approach to understanding loop extrusion. By contrast, CCLE allows for a concept of continuous distribution of position-dependent loop extrusion rates, arising from the aggregate effect of multiple interactions between loop extrusion complexes and chromatin. This paradigm offers greater flexibility in recapitulating diverse features in Hi-C data than strictly localized loop extrusion barriers."

      Reviewer #1 (Significance (Required)):

      This simple model is useful to confirm that cohesin positions dictate the position of loops, which was predicted already and proposed in many studies. However, it should be considered a starting point as it does not faithfully predict all the features of chromatin organisation, particularly at better resolution.

      Response:

      As described in more detail above, we do not agree with the assertion of the referee that the CCLE model "does not faithfully predict all the features of chromatin organization, particularly at better resolution" and provide additional new data to support the conclusion that the CCLE model provides a much needed approach to model non-vertebrate contact maps and outperforms the single prior attempt to predict budding yeast Hi-C data using information from cohesin ChIP-seq.

      *It will mostly be of interest to those in the chromosome organisation field, working in organisms or systems that do not have ctcf. *

      __Response: __

      We agree that this work will be of special interest to researchers working on chromatin organization of non-vertebrate organisms. We would reinforce that yeast are frequently used models for the study of cohesin, condensin, and chromatin folding more generally. Indeed, in the last two months alone there are two Molecular Cell papers, one Nature Genetics paper, and one Cell Reports paper where loop extrusion in yeast models is directly relevant. We also believe, however, that the model will be of interest for the field in general as it simultaneously encompasses various scenarios that may lead to slowing down or stalling of LEFs.

      This reviewer is a cell biologist working in the chromosome organisation field, but does not have modelling experience and therefore does not have the expertise to determine if the modelling part is mathematically sound and has assumed that it is.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: Yuan et al. report on their development of an analytical model ("CCLE") for loop extrusion with genomic-position-dependent speed, with the idea of accounting for barriers to loop extrusion. They write down master equations for the probabilities of cohesin occupancy at each genomic site and obtain approximate steady-state solutions. Probabilities are governed by cohesin translocation, loading, and unloading. Using ChIP-seq data as an experimental measurement of these probabilities, they numerically fit the model parameters, among which are extruder density and processivity. Gillespie simulations with these parameters combined with a 3D Gaussian polymer model were integrated to generate simulated Hi-C maps and cohesin ChIP-seq tracks, which show generally good agreement with the experimental data. The authors argue that their modeling provides evidence that loop extrusion is the primary mechanism of chromatin organization on ~10-100 kb scales in S. pombe and S. cerevisiae.

      Major comments:

      1. I am unconvinced that this analysis specifically is sufficient to demonstrate that extrusion is the primary organizer of chromatin on these scales; moreover, the need to demonstrate this is questionable, as extrusion is widely accepted, even if not universally so. How is the agreement of CCLE with experiments more demonstrative of loop extrusion than previous modeling?

      __Response: __

      We agree with the referee's statement that "loop extrusion is extrusion is widely accepted, even if not universally so". We disagree with the referee that this state of affairs means that "the need to demonstrate this (i.e. loop extrusion) is questionable". On the contrary, studies that provide further compelling evidence that cohesin-based loop extrusion is the primary organizer of chromatin, such as ours, must surely be welcomed, first, in order to persuade those who remain unconvinced by the loop extrusion mechanism in general, and, secondly, because, until the present work, quantitative models of loop extrusion, capable of reproducing Hi-C maps quantitatively, in yeasts and other non-vertebrate eukaryotes have been lacking, leaving open the question of whether loop extrusion can describe Hi-C maps beyond vertebrates. CCLE has now answered that question in the affirmative. Moreover, the existence of a robust model to predict contact maps in non-vertebrate models, which are extensively used in the pursuit of research questions in chromatin biology, will be broadly enabling to the field.

      It is fundamental that if a simple, physically-plausible model/hypothesis is able to describe experimental data quantitatively, it is indeed appropriate to ascribe considerable weight to that model/hypothesis (until additional data become available to refute the model).

      How is the agreement of CCLE with experiments more demonstrative of loop extrusion than previous modeling?

      Response:

      As noted above and in the original manuscript, we are unaware of previous quantitative modeling of cohesin-based loop extrusion and the resultant Hi-C maps in organisms that lack CTCF, namely non-vertebrate eukaryotic models such as fission yeast or budding yeast, as we apply here. As noted in the original manuscript, previous quantitative modeling of Hi-C maps based on cohesin loop extrusion and CTCF boundary elements has been convincing that loop extrusion is indeed relevant in vertebrates, but the restriction to vertebrates excludes most of the tree of life.

      Below, the referee cites two examples of loop extrusion outside of vertebrates. The one that is suggested to correspond to yeast cells (Dequeker et al. Nature 606:197 2022) actually corresponds to mouse cells, which are vertebrate cells. The other one models the Hi-C map of the prokaryote, Bacillus subtilis, based on loop extrusion of the bacterial SMC complex thought to most resemble condensin (not cohesin), subject to barriers to loop extrusion that are related to genes or involving prokaryote-specific Par proteins (Brandao et al. PNAS 116:20489 2019). We have referenced this work in the revised manuscript but would reinforce that it lacks utility in predicting the contact maps for non-vertebrate eukaryotes.

      Relatedly, similar best fit values for S. pombe and S. cerevisiae might not point to a mechanistic conclusion (same "underlying mechanism" of loop extrusion), but rather to similar properties for loop-extruding cohesins in the two species.

      Response:

      In the revised manuscript, we have replaced "suggesting that the underlying mechanism that governs loop extrusion by cohesin is identical in both species" with "suggesting loop-extruding cohesins possess similar properties in both species" (lines 367-368).

      As an alternative, could a model with variable binding probability given by ChIP-seq and an exponential loop-size distribution work equally well? The stated lack of a dependence on extrusion timescale suggests that a static looping model might succeed. If not, why not?

      Response:

      A hypothetical mechanism that generates the same instantaneous loop distributions and correlations as loop extrusion would lead to the same Hi-C map as does loop extrusion. This circumstance is not confined to CCLE, but is equally applicable to previous CTCF-based loop extrusion models. It holds because Hi-C and ChIP-seq, and therefore models that seek to describe these measurements, provide a snapshot of the chromatin configuration at one instant of time.

      We would reinforce that there is no physical basis for a diffusion capture model with an approximately-exponential loop size distributions. Nevertheless, one can reasonably ask whether a physically-sensible diffusion capture model can simultaneously match cohesin ChIP-seq and Hi-C. Motivated by the referee's comment we have addressed this question and, accordingly, in the revised manuscript, we have added (1) an entire subsection entitled "Diffusion capture does not reproduce experimental interphase S. pombe Hi-C maps" (lines 303-335) and (2) Supplementary Figure 15. As we now demonstrate, the CCLE model vastly outperforms an equilibrium binding model in reproducing the experimental Hi-C maps and measured P(s).

      *2. I do not understand how the loop extrusion residence time drops out. As I understand it, Eq 9 converts ChIP-seq to lattice site probability (involving N_{LEF}, which is related to \rho, and \rho_c). Then, Eqs. 3-4 derive site velocities V_n and U_n if we choose rho, L, and \tau, with the latter being the residence time. This parameter is not specified anywhere and is claimed to be unimportant. It may be true that the choice of timescale is arbitrary in this procedure, but can the authors please clarify? *

      __Response: __

      As noted above, Hi-C and ChIP-seq both capture chromatin configuration at one instant in time. Therefore, such measurements cannot and do not provide any time-scale information, such as the loop extrusion residence time (LEF lifetime) or the mean loop extrusion rate. For this reason, neither our CCLE simulations, nor other researchers' previous simulations of loop extrusion in vertebrates with CTCF boundary elements, provide any time-scale information, because the experiments they seek to describe do not contain time-scale information. The Hi-C map simulations can and do provide information concerning the loop size, which is the product of the loop lifetime and the loop extrusion rate. Lines 304-305 of the revised manuscript include the text: "Because Hi-C and ChIP-seq both characterize chromatin configuration at a single instant of time, and do not provide any direct time-scale information, ..."

      In practice, we set the LEF lifetime to be some explicit value with arbitrary time-unit. We have added a sentence in the Methods that reads, "In practice, however, we set the LEF dissociation rate to 5e-4 time-unit-1 (equivalent to a lifetime of 2000 time-units), and the nominal LEF extrusion rate (aka \rho*L/\tau, see Supplementary Methods) can be determined from the given processivity" (lines 599-602), to clarify this point. We have also changed the terminology from "timesteps" to "LEF events" in the manuscript as the latter is more accurate for our purpose.

      1. The assumptions in the solution and application of the CCLE model are potentially constraining to a limited number of scenarios. In particular the authors specify that current due to binding/unbinding, A_n - D_n, is small. This assumption could be problematic near loading sites (centromeres, enhancers in higher eukaryotes, etc.) (where current might be dominated by A_n and V_n), unloading sites (D_n and V_{n-1}), or strong boundaries (D_n and V_{n-1}). The latter scenario is particularly concerning because the manuscript seems to be concerned with the presence of unidentified boundaries. This is partially mitigated by the fact that the model seems to work well in the chosen examples, but the authors should discuss the limitations due to their assumptions and/or possible methods to get around these limitations.

      4. Related to the above concern, low cohesin occupancy is interpreted as a fast extrusion region and high cohesin occupancy is interpreted as a slow region. But this might not be true near cohesin loading and unloading sites.

      __Response: __

      Our response to Referee 2's Comments 3. and 4. is that both in the original manuscript and in the revised manuscript we clearly delineate the assumptions underlying CCLE and we carefully assess the extent to which these assumptions are violated (lines 123-126 and 263-279 in the revised manuscript). For example, Supplementary Figure 12 shows that across the S. pombe genome as a whole, violations of the CCLE assumptions are small. Supplementary Figure 13 shows that violations are similarly small for meiotic S. cerevisiae. However, to explicitly address the concern of the referee, we have added the following sentences to the revised manuscript:

      Lines 277-279:

      "While loop extrusion in interphase S. pombe seems to well satisfy the assumptions underlying CCLE, this may not always be the case in other organisms."

      Lines 359-361:

      "In addition, the three quantities, given by Eqs. 6, 7, and 8, are distributed around zero with relatively small fluctuations (Supplementary Fig. 13), indicating that CCLE model is self-consistent in this case also."

      In the case of mitotic S. cerevisiae, Supplementary Figure 14 shows that these quantities are small for most of genomic locations, except near the cohesin ChIP-seq peaks. We ascribe these greater violations of CCLE's assumptions at the locations of cohesin peaks in part to the low processivity of mitotic cohesin in S. cerevisiae, compared to that of meiotic S. cerevisiae and interphase S. pombe, and in part to the low CCLE loop extrusion rate at the cohesin peaks. We have added a paragraph at the end of the Section "CCLE Describes TADs and Loop Configurations in Mitotic S. cerevisiae" to reflect these observations (lines 447-461).

      1. *The mechanistic insight attempted in the discussion, specifically with regard to Mis4/Scc2/NIPBL and Pds5, is problematic. First, it is not clear how the discussion of Nipbl and Pds5 is connected to the CCLE method; the justification is that CCLE shows cohesin distribution is linked to cohesin looping, which is already a questionable statement (point 1) and doesn't really explain how the model offers new insight into existing Nipbl and Pds5 data. *

      Furthermore, I believe that the conclusions drawn on this point are flawed, or at least, stated with too much confidence. The authors raise the curious point that Nipbl ChIP-seq does not correlate well with cohesin ChIP-seq, and use this as evidence that Nipbl is not a part of the loop-extruding complex in S. pombe, and it is not essential in humans. Aside from the molecular evidence in human Nipbl/cohesin (acknowledged by authors), there are other reasons to doubt this conclusion. First, depletion of Nipbl (rather than binding partner Mau2 as in ref 55) in mouse cells strongly inhibits TAD formation (Schwarzer et al. Nature 551:51 2017). Second, at least two studies have raised concerns about Nibpl ChIP-seq results: 1) Hu et al. Nucleic Acids Res 43:e132 2015, which shows that uncalibrated ChIP-seq can obscure the signal of protein localization throughout the genome due to the inability to distinguish from background * and 2) Rhodes et al. eLife 6:e30000, which uses FRAP to show that Nipbl binds and unbinds to cohesin rapidly in human cells, which could go undetected in ChIP-seq, especially when uncalibrated. It has not been shown that these dynamics are present in yeast, but there is no reason to rule it out yet.*

      Similar types of critiques could be applied to the discussion of Pds5. There is cross-correlation between Psc3 and Pds5 in S. pombe, but the authors are unable to account for whether Pds5 binding is transient and/or necessary to loop extrusion itself or, more importantly, whether Pds5 ChIP is associated with extrusive or cohesive cohesins; cross-correlation peaks at about 0.6, but note that by the authors own estimates, cohesive cohesins are approximately half of all cohesins in S. pombe (Table 3).

      *Due to the above issues, I suggest that the authors heavily revise this discussion to better reflect the current experimental understanding and the limited ability to draw such conclusions based on the current CCLE model. *

      __Response: __

      As stated above, our study demonstrates that the CCLE approach is able to take as input cohesin (Psc3) ChIP-seq data and produce as output simulated Hi-C maps that well reproduce the experimental Hi-C maps of interphase S. pombe and meiotic S. cerevisiae. This result is evident from the multiple Hi-C comparison figures in both the original and the revised manuscripts. In light of this circumstance, the referee's statement that it is "questionable", that CCLE shows that cohesin distribution (as quantified by cohesin ChIP-seq) is linked to cohesin looping (as quantified by Hi-C), is demonstrably incorrect.

      However, we did not intend to suggest that Nipbl and Pds5 are not crucial for cohesin loading, as the reviewer states. Rather, our inquiries relate to a more nuanced question of whether these factors only reside at loading sites or, instead, remain as a more long-lived constituent component of the loop extrusion complex. We regret any confusion and have endeavored to clarify this point in the revised manuscript in response to Referee 2's Comment 5. as well as Referee 1's Minor Comment 1. We have now better explained how the CCLE model may offer new insight from existing ChIP-seq data in general and from Mis4/Nipbl and Pds5 ChIP-seq, in particular. Accordingly, we have followed Referee 2's advice to heavily revise the relevant section of the Discussion.

      To this end, we have removed the following text from the original manuscript:

      "The fact that the cohesin distribution along the chromatin is strongly linked to chromatin looping, as evident by the success of the CCLE model, allows for new insights into in vivo LEF composition and function. For example, recently, two single-molecule studies [37, 38] independently found that Nipbl, which is the mammalian analogue of Mis4, is an obligate component of the loop-extruding human cohesin complex. Ref. [37] also found that cohesin complexes containing Pds5, instead of Nipbl, are unable to extrude loops. On this basis, Ref. [32] proposed that, while Nipbl-containing cohesin is responsible for loop extrusion, Pds5-containing cohesin is responsible for sister chromatid cohesion, neatly separating cohesin's two functions according to composition. However, the success of CCLE in interphase S. pombe, together with the observation that the Mis4 ChIP-seq signal is uncorrelated with the Psc3 ChIP-seq signal (Supplementary Fig. 7) allows us to infer that Mis4 cannot be a component of loop-extruding cohesin in S. pombe. On the other hand, Pds5 is correlated with Psc3 in S. pombe (Supplementary Fig. 7) suggesting that both proteins are involved in loop-extruding cohesin, contradicting a hypothesis that Pds5 is a marker for cohesive cohesin in S. pombe. In contrast to the absence of Mis4-Psc3 correlation in S. pombe, in humans, Nipbl ChIP-seq and Smc1 ChIP-seq are correlated (Supplementary Fig. 7), consistent with Ref. [32]'s hypothesis that Nipbl can be involved in loop-extruding cohesin in humans. However, Ref. [55] showed that human Hi-C contact maps in the absence of Nipbl's binding partner, Mau2 (Ssl3 in S. pombe [56]) show clear TADs, consistent with loop extrusion, albeit with reduced long-range contacts in comparison to wild-type maps, indicating that significant loop extrusion continues in live human cells in the absence of Nipbl-Mau2 complexes. These collected observations suggest the existence of two populations of loop-extruding cohesin complexes in vivo, one that involves Nipbl-Mau2 and one that does not. Both types are present in mammals, but only Mis4-Ssl3-independent loop-extruding cohesin is present in S. pombe."

      And we have replaced it by the following text in the revised manuscript (lines 533-568):

      "As noted above, the input for our CCLE simulations of chromatin organization in S. pombe, was the ChIP-seq of Psc3, which is a component of the cohesin core complex [75]. Accordingly, Psc3 ChIP-seq represents how the cohesin core complex is distributed along the genome. In S. pombe, the other components of the cohesin core complex are Psm1, Psm3, and Rad21. Because these proteins are components of the cohesin core complex, we expect that the ChIP-seq of any of these proteins would closely match the ChIP-seq of Psc3, and would equally well serve as input for CCLE simulations of S. pombe genome organization. Supplementary Figure 20C confirms significant correlations between Psc3 and Rad21. In light of this observation, we then reason that the CCLE approach offers the opportunity to investigate whether other proteins beyond the cohesin core are constitutive components of the loop extrusion complex during the extrusion process (as opposed to cohesin loading or unloading). To elaborate, if the ChIP-seq of a non-cohesin-core protein is highly correlated with the ChIP-seq of a cohesin core protein, we can infer that the protein in question is associated with the cohesin core and therefore is a likely participant in loop-extruding cohesin, alongside the cohesin core. Conversely, if the ChIP-seq of a putative component of the loop-extruding cohesin complex is uncorrelated with the ChIP-seq of a cohesin core protein, then we can infer that the protein in question is unlikely to be a component of loop-extruding cohesin, or at most is transiently associated with it.

      For example, in S. pombe, the ChIP-seq of the cohesin regulatory protein, Pds5 [74], is correlated with the ChIP-seq of Psc3 (Supplementary Fig. 20B) and with that of Rad21 (Supplementary Fig. 20D), suggesting that Pds5 can be involved in loop-extruding cohesin in S. pombe, alongside the cohesin core proteins. Interestingly, this inference concerning fission yeast cohesin subunit, Pds5, stands in contrast to the conclusion from a recent single-molecule study [38] concerning cohesin in vertebrates. Specifically, Reference [38] found that cohesin complexes containing Pds5, instead of Nipbl, are unable to extrude loops.

      Additionally, as noted above, in S. pombe the ChIP-seq signal of the cohesin loader, Mis4, is uncorrelated with the Psc3 ChIP-seq signal (Supplementary Fig. 20A), suggesting that Mis4 is, at most, a very transient component of loop-extruding cohesin in S. pombe, consistent with its designation as a "cohesin loader". However, both References [38] and [39] found that Nipbl (counterpart of S. pombe's Mis4) is an obligate component of the loop-extruding human cohesin complex, more than just a mere cohesin loader. Although CCLE has not yet been applied to vertebrates, from a CCLE perspective, the possibility that Nipbl may be required for the loop extrusion process in humans is bolstered by the observation that in humans Nipbl ChIP-seq and Smc1 ChIP-seq show significant correlations (Supplementary Fig. 20G), consistent with Ref. [32]'s hypothesis that Nipbl is involved in loop-extruding cohesin in vertebrates. A recent theoretical model of the molecular mechanism of loop extrusion by cohesin hypothesizes that transient binding by Mis4/Nipbl is essential for permitting directional reversals and therefore for two-sided loop extrusion [41]. Surprisingly, there are significant correlations between Mis4 and Pds5 in S. pombe (Supplementary Fig. 20E), indicating Pds5-Mis4 association, outside of the cohesin core complex."

      In response to Referee 2's specific comment that "at least two studies have raised concerns about Nibpl ChIP-seq results", we note (1) that, while Hu et al. Nucleic Acids Res 43:e132 2015 present a general method for calibrating ChIP-seq results, they do not measure Mis4/Nibpl ChIP-seq, nor do they raise any specific concerns about Mis4/Nipbl ChIP-seq, and (2) that (as noted above, in response to Referee 1's comment) while the FRAP analysis presented by Rhodes et al. eLife 6:e30000 indicates that, in HeLa cells, Nipbl has a residence time bound to cohesin of about 50 seconds, nevertheless, as shown in Supplementary Fig. 20G in the revised manuscript, there is a significant cross-correlation between the Nipbl ChIP-seq and Smc1 ChIP-seq in humans, indicating that a transient association between Nipbl and cohesin is detected by ChIP-seq, the referees' concerns notwithstanding.

      We thank the referee for pointing out Schwarzer et al. Nature 551:51 2017. However, our interpretation of these data is different than the referee's. As noted in our original manuscript, Nipbl has traditionally been considered to be a cohesin loading factor. If the role of Nipbl was solely to load cohesin, then we would expect that depleting Nipbl would have a major effect on the Hi-C map, because fewer cohesins are loaded onto the chromatin. Figure 2 of Schwarzer et al. Nature 551:51 2017, shows the effect of depleting Nibpl on a vertebrate Hi-C map. Even in this case when Nibpl is absent, this figure (Figure 2 of Schwarzer et al. Nature 551:51 2017) shows that TADs persist, albeit considerably attenuated. According to the authors' own analysis associated with Fig. 2 of their paper, these attenuated TADs correspond to a smaller number of loop-extruding cohesin complexes than in the presence of Nipbl. Since Nipbl is depleted, these loop-extruding cohesins necessarily cannot contain Nipbl. Thus, the data and analysis of Schwarzer et al. Nature 551:51 2017 actually seem consistent with the existence of a population of loop-extruding cohesin complexes that do not contain Nibpl.

      Concerning the referee's comment that we cannot be sure whether Pds5 ChIP is associated with extrusive or cohesive cohesin, we note that, as explained in the manuscript, we assume that the cohesive cohesins are uniformly distributed across the genome, and therefore that peaks in the cohesin ChIP-seq are associated with loop-extruding cohesins. The success of CCLE in describing Hi-C maps justifies this assumption a posteriori. Supplementary Figure 20B shows that the ChIP-seq of Pds5 is correlated with the ChIP-seq of Psc3 in S. pombe, that is, that peaks in the ChIP-seq of Psc3, assumed to derive from loop-extruding cohesin, are accompanied by peaks in the ChIP-seq of Pds5. This is the reasoning allowing us to associate Pds5 with loop-extruding cohesin in S. pombe.

      1. I suggest that the authors recalculate correlations for Hi-C maps using maps that are rescaled by the P(s) curves. As currently computed, most of the correlation between maps could arise from the characteristic decay of P(s) rather than smaller scale features of the contact maps. This could reduce the surprising observed correlation between distinct genomic regions in pombe (which, problematically, is higher than the observed correlation between simulation and experiment in cervisiae).

      Response:

      We thank the referee for this advice. Following this advice, throughout the revised manuscript, we have replaced our original calculation of the Pearson correlation coefficient of unscaled Hi-C maps with a calculation of the Pearson correlation coefficient of rescaled Hi-C maps. Since the MPR is formed from ratios of simulated to experimental Hi-C maps, this metric is unchanged by the proposed rescaling.

      As explained in the original manuscript, we attribute the lower experiment-simulation correlation in the meiotic budding yeast Hi-C maps to the larger statistical errors of the meiotic budding yeast dataset, which arises because of its higher genomic resolution - all else being equal we can expect 25 times the counts in a 10 kb x10 kb bin as in a 2 kb x 2 kb bin. For the same reason, we expect larger statistical errors in the mitotic budding yeast dataset as well. Lower correlations for noisier data are to be expected in general.

      *7. Please explain why the difference between right and left currents at any particular site, (R_n-L_n) / Rn+Ln, should be small. It seems easy to imagine scenarios where this might not be true, such as directional barriers like CTCF or transcribed genes. *

      __Response: __

      For simplicity, the present version of CCLE sets the site-dependent loop extrusion rates by assuming that the cohesin ChIP-seq signal has equal contributions from left and right anchors. Then, we carry out our simulations which subsequently allow us to examine the simulated left and right currents and their difference at every site. The distributions of normalized left-right difference currents are shown in Supplementary Figures 12B, 13B, and 14D, for interphase S. pombe, meiotic S. cerevisiae, and mitotic S. cerevisiae, respectively. They are all centered at zero with standard deviations of 0.12, 0.16, and 0.33. Thus, it emerges from our simulations that the difference current is indeed generally small.

      8. Optional, but I think would greatly improve the manuscript, but can the authors: a) analyze regions of high cohesin occupancy (assumed to be slow extrusion regions) to determine if there's anything special in these regions, such as more transcriptional activity

      __Response: __

      In response to Referee 1's similar comment, we have calculated the correlation between the locations of convergent genes and cohesin ChIP-seq. Supplementary Figure 18A in the revised manuscript shows that for interphase S. pombe no correlations are evident, whereas for both of meiotic and mitotic S. cerevisiae, there are significant correlations between these two quantities (Supplementary Fig. 17).

      *b) apply this methodology to vertebrate cell data *

      __Response: __

      The application of CCLE to vertebrate data is outside the scope of this paper which, as we have emphasized, has the goal of developing a model that can be robustly applied to non-vertebrate eukaryotic genomes. Nevertheless, CCLE is, in principle, applicable to all organisms in which loop extrusion by SMC complexes is the primary mechanism for chromatin spatial organization.

      1. *A Github link is provided but the code is not currently available. *

      __Response: __

      The code is now available.

      Minor Comments:

      1. Please state the simulated LEF lifetime, since the statement in the methods that 15000 timesteps are needed for equilibration of the LEF model is otherwise not meaningful. Additionally, please note that backbone length is not necessarily a good measure of steady state, since the backbone can be compacted to its steady-state value while the loop distribution continues to evolve toward its steady state.

      __Response: __

      The terminology "timesteps" used in the original manuscript in fact should mean "the number of LEF events performed" in the simulation. Therefore, we have changed the terminology from "timesteps" to "LEF events".

      The choice of 15000 LEF events is empirically determined to ensure that loop extrusion steady state is achieved, for the range of parameters considered. To address the referee's concern regarding the uncertainty of achieving steady state after 15000 LEF events, we compared two loop size distributions: each distribution encompasses 1000 data points, equally separated in time, one between LEF event 15000 and 35000, and the other between LEF event 80000 and 100000. The two distributions are within-errors identical, suggesting that the loop extrusion steady state is well achieved within 15000 LEF events.

      2. How important is the cohesive cohesin parameter in the model, e.g., how good are fits with \rho_c = 0?

      __Response: __

      As stated in the original manuscript, the errors on \rho_c on the order of 10%-20% (for S. pombe). Thus, fits with \rho_c=0 are significantly poorer than with the best-fit values of \rho_c.

      *3. A nice (but non-essential) supplemental visualization might be to show a scatter of sim cohesin occupancy vs. experiment ChIP. *

      __Response: __

      We have chosen not to do this, because we judge that the manuscript is already long enough. Figures 3A, 5D, and 6C already compare the experimental and simulated ChIP-seq, and these figures already contain more information than the figures proposed by the referee.

      1. *A similar calculation of Hi-C contacts based on simulated loop extruder positions using the Gaussian chain model was previously presented in Banigan et al. eLife 9:e53558 2020, which should be cited. *

      __Response: __

      We thank the referee for pointing out this citation. We have added it to the revised manuscript.

      1. It is stated that simulation agreement with experiments for cerevisiae is worse in part due to variability in the experiments, with MPR and Pearson numbers for cerevisiae replicates computed for reference. But these numbers are difficult to interpret without, for example, similar numbers for duplicate pombe experiments. Again, these numbers should be generated using Hi-C maps scaled by P(s), especially in case there are systematic errors in one replicate vs. another.

      __Response: __

      As noted above, throughout the revised manuscript, we now give the Pearson correlation coefficients of scaled-by-P(s) Hi-C maps.

      1. *In the model section, it is stated that LEF binding probabilities are uniformly distributed. Did the authors mean the probability is uniform across the genome or that the probability at each site is a uniformly distributed random number? Please clarify, and if the latter, explain why this unconventional assumption was made. *

      __Response: __

      It is the former. We have modified the manuscript to clarify that LEFs "initially bind to empty, adjacent chromatin lattice sites with a binding probability, that is uniformly distributed across the genome." (lines 587-588).

      *7. Supplement p4 line 86 - what is meant by "processivity of loops extruded by isolated LEFs"? "size of loops extruded by..." or "processivity of isolated LEFs"? *

      __Response: __

      Here "processivity of isolated LEFs" is defined as the processivity of one LEF without the interference (blocking) from other LEFs. We have changed "processivity of loops extruded by isolated LEFs" to "processivity of isolated LEFs" for clarity.

      1. The use of parentheticals in the caption to Table 2 is a little confusing; adding a few extra words would help.

      __Response: __

      In the revised manuscript, we have added an additional sentence, and have removed the offending parentheses.

      1. *Page 12 sentence line 315-318 is difficult to understand. The barrier parameter is apparently something from ref 47 not previously described in the manuscript. *

      __Response: __

      In the revised manuscript, we have removed mention of the "barrier parameter" from the discussion.

      1. *Statement on p14 line 393-4 is false: prior LEF models have not been limited to vertebrates, and the authors have cited some of them here. There are also non-vertebrate examples with extrusion barriers: genes as boundaries to condensin in bacteria (Brandao et al. PNAS 116:20489 2019) and MCM complexes as boundaries to cohesin in yeast (Dequeker et al. Nature 606:197 2022). *

      __Response: __

      In fact, Dequeker et al. Nature 606:197 2022 concerns the role of MCM complexes in blocking cohesin loop extrusion in mouse zygotes. Mouse is a vertebrate. The sole aspect of this paper, that is associated with yeast, is the observation of cohesin blocking by the yeast MCM bound to the ARS1 replication origin site, which is inserted on a piece of lambda phage DNA. No yeast genome is used in the experiment. Therefore, the referee is mistaken to suggest that this paper models yeast genome organization.

      We thank the referee for pointing out Brandao et al. PNAS 116:20489 2019, which includes the development of a tour-de-force model of condensin-based loop extrusion in the prokaryote, Bacillus subtilis, in the presence of gene barriers to loop extrusion. To acknowledge this paper, we have changed the objectionable sentence to now read (lines 571-575):

      "... prior LEF models have been overwhelmingly limited to vertebrates, which express CTCF and where CTCF is the principal boundary element. Two exceptions, in which the LEF model was applied to non-vertebrates, are Ref. [49], discussed above, and Ref. [76] (Brandao et al.), which models the Hi-C map of the prokaryote, Bacillus subtilis, on the basis of condensin loop extrusion with gene-dependent barriers."

      *Referees cross-commenting *

      I agree with the comments of Reviewer 1, which are interesting and important points that should be addressed.

      *Reviewer #2 (Significance (Required)):

      Analytically approaching extrusion by treating cohesin translocation as a conserved current is an interesting approach to modeling and analysis of extrusion-based chromatin organization. It appears to work well as a descriptive model. But I think there are major questions concerning the mechanistic value of this model, possible applications of the model, the provided interpretations of the model and experiments, and the limitations of the model under the current assumptions. I am unconvinced that this analysis specifically is sufficient to demonstrate that extrusion is the primary organizer of chromatin on these scales; moreover, the need to demonstrate this is questionable, as extrusion is widely accepted, even if not universally so. It is also unclear that the minimal approach of the CCLE necessarily offers an improved physical basis for modeling extrusion, as compared to previous efforts such as ref 47, as claimed by the authors. There are also questions about significance due to possible limitations of the model (detailed above). Applying the CCLE model to identify barriers would be interesting, but is not attempted. Overall, the work presents a reasonable analytical model and numerical method, but until the major comments above are addressed and some reasonable application or mechanistic value or interpretation is presented, the overall significance is somewhat limited.*

      __Response: __

      We agree with the referee that analytically approaching extrusion by treating cohesin translocation as a conserved current is an interesting approach to modeling and analysis of extrusion-based chromatin organization. We also agree with the referee that it works well as a descriptive model (of Hi-C maps in S. pombe and S. cerevisiae). Obviously, we disagree with the referee's other comments. For us, being able to describe the different-appearing Hi-C maps of interphase S. pombe (Fig. 1 and Supplementary Figures 1-9), meiotic S. cerevisiae (Fig. 5) and mitotic S. cerevisiae (Fig. 6), all with a common model with just a few fitting parameters that differ between these examples, is significant and novel. The reviewer prematurely ignores the fact that there are still debates about whether "diffusion-capture"-like model is the more dominant mechanism that shape chromatin spatial organization at the TAD-scale. Many works have argued that such models could describe TAD-scale chromatin organization, as cited in the revised manuscript (Refs. [11, 14, 15, 17, 20, 22-24, 55]). However, in contrast to the poor description of the Hi-C map using diffusion capture model (as demonstrated in the revised manuscript and Supplementary Fig. 15), the excellent experiment-simulation agreement achieved by CCLE provides compelling evidence that cohesin-based loop extrusion is indeed the primary organizer of TAD-scale chromatin.

      Importantly, CCLE provides a theoretical base for how loop extrusion models can be generalized and applied to organisms without known loop extrusion barriers. Our model also highlights that (and provides means to account for) distributed barriers that impede but do not strictly block LEFs could also impact chromatin configurations. This case might be of importance to organisms with CTCF motifs that infrequently coincide with TAD boundaries, for instance, in the case of Drosophila melanogaster. Moreover, CCLE promises theoretical descriptions of the Hi-C maps of other non-vertebrates in the future, extending the quantitative application of the LEF model across the tree of life. This too would be highly significant if successful.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      Yuan et al. report on their development of an analytical model ("CCLE") for loop extrusion with genomic-position-dependent speed, with the idea of accounting for barriers to loop extrusion. They write down master equations for the probabilities of cohesin occupancy at each genomic site and obtain approximate steady-state solutions. Probabilities are governed by cohesin translocation, loading, and unloading. Using ChIP-seq data as an experimental measurement of these probabilities, they numerically fit the model parameters, among which are extruder density and processivity. Gillespie simulations with these parameters combined with a 3D Gaussian polymer model were integrated to generate simulated Hi-C maps and cohesin ChIP-seq tracks, which show generally good agreement with the experimental data. The authors argue that their modeling provides evidence that loop extrusion is the primary mechanism of chromatin organization on ~10-100 kb scales in S. pombe and S. cerevisiae.

      Major comments:

      1. I am unconvinced that this analysis specifically is sufficient to demonstrate that extrusion is the primary organizer of chromatin on these scales; moreover, the need to demonstrate this is questionable, as extrusion is widely accepted, even if not universally so. How is the agreement of CCLE with experiments more demonstrative of loop extrusion than previous modeling? Relatedly, similar best fit values for S. pombe and S. cerevisiae might not point to a mechanistic conclusion (same "underlying mechanism" of loop extrusion), but rather to similar properties for loop-extruding cohesins in the two species. As an alternative, could a model with variable binding probability given by ChIP-seq and an exponential loop-size distribution work equally well? The stated lack of a dependence on extrusion timescale suggests that a static looping model might succeed. If not, why not?
      2. I do not understand how the loop extrusion residence time drops out. As I understand it, Eq 9 converts ChIP-seq to lattice site probability (involving N_{LEF}, which is related to \rho, and \rho_c). Then, Eqs. 3-4 derive site velocities V_n and U_n if we choose rho, L, and \tau, with the latter being the residence time. This parameter is not specified anywhere and is claimed to be unimportant. It may be true that the choice of timescale is arbitrary in this procedure, but can the authors please clarify?
      3. The assumptions in the solution and application of the CCLE model are potentially constraining to a limited number of scenarios. In particular the authors specify that current due to binding/unbinding, A_n - D_n, is small. This assumption could be problematic near loading sites (centromeres, enhancers in higher eukaryotes, etc.) (where current might be dominated by A_n and V_n), unloading sites (D_n and V_{n-1}), or strong boundaries (D_n and V_{n-1}). The latter scenario is particularly concerning because the manuscript seems to be concerned with the presence of unidentified boundaries. This is partially mitigated by the fact that the model seems to work well in the chosen examples, but the authors should discuss the limitations due to their assumptions and/or possible methods to get around these limitations.
      4. Related to the above concern, low cohesin occupancy is interpreted as a fast extrusion region and high cohesin occupancy is interpreted as a slow region. But this might not be true near cohesin loading and unloading sites.
      5. The mechanistic insight attempted in the discussion, specifically with regard to Mis4/Scc2/NIPBL and Pds5, is problematic. First, it is not clear how the discussion of Nipbl and Pds5 is connected to the CCLE method; the justification is that CCLE shows cohesin distribution is linked to cohesin looping, which is already a questionable statement (point 1) and doesn't really explain how the model offers new insight into existing Nipbl and Pds5 data.

      Furthermore, I believe that the conclusions drawn on this point are flawed, or at least, stated with too much confidence. The authors raise the curious point that Nipbl ChIP-seq does not correlate well with cohesin ChIP-seq, and use this as evidence that Nipbl is not a part of the loop-extruding complex in S. pombe, and it is not essential in humans. Aside from the molecular evidence in human Nipbl/cohesin (acknowledged by authors), there are other reasons to doubt this conclusion. First, depletion of Nipbl (rather than binding partner Mau2 as in ref 55) in mouse cells strongly inhibits TAD formation (Schwarzer et al. Nature 551:51 2017). Second, at least two studies have raised concerns about Nibpl ChIP-seq results: 1) Hu et al. Nucleic Acids Res 43:e132 2015, which shows that uncalibrated ChIP-seq can obscure the signal of protein localization throughout the genome due to the inability to distinguish from background and 2) Rhodes et al. eLife 6:e30000, which uses FRAP to show that Nipbl binds and unbinds to cohesin rapidly in human cells, which could go undetected in ChIP-seq, especially when uncalibrated. It has not been shown that these dynamics are present in yeast, but there is no reason to rule it out yet.

      Similar types of critiques could be applied to the discussion of Pds5. There is cross-correlation between Psc3 and Pds5 in S. pombe, but the authors are unable to account for whether Pds5 binding is transient and/or necessary to loop extrusion itself or, more importantly, whether Pds5 ChIP is associated with extrusive or cohesive cohesins; cross-correlation peaks at about 0.6, but note that by the authors own estimates, cohesive cohesins are approximately half of all cohesins in S. pombe (Table 3).

      Due to the above issues, I suggest that the authors heavily revise this discussion to better reflect the current experimental understanding and the limited ability to draw such conclusions based on the current CCLE model. 6. I suggest that the authors recalculate correlations for Hi-C maps using maps that are rescaled by the P(s) curves. As currently computed, most of the correlation between maps could arise from the characteristic decay of P(s) rather than smaller scale features of the contact maps. This could reduce the surprising observed correlation between distinct genomic regions in pombe (which, problematically, is higher than the observed correlation between simulation and experiment in cervisiae). 7. Please explain why the difference between right and left currents at any particular site, (R_n-L_n) / Rn+Ln, should be small. It seems easy to imagine scenarios where this might not be true, such as directional barriers like CTCF or transcribed genes. 8. Optional, but I think would greatly improve the manuscript, but can the authors: a) analyze regions of high cohesin occupancy (assumed to be slow extrusion regions) to determine if there's anything special in these regions, such as more transcriptional activity

      b) apply this methodology to vertebrate cell data 9. A Github link is provided but the code is not currently available.

      Minor Comments:

      1. Please state the simulated LEF lifetime, since the statement in the methods that 15000 timesteps are needed for equilibration of the LEF model is otherwise not meaningful. Additionally, please note that backbone length is not necessarily a good measure of steady state, since the backbone can be compacted to its steady-state value while the loop distribution continues to evolve toward its steady state.
      2. How important is the cohesive cohesin parameter in the model, e.g., how good are fits with \rho_c = 0?
      3. A nice (but non-essential) supplemental visualization might be to show a scatter of sim cohesin occupancy vs. experiment ChIP.
      4. A similar calculation of Hi-C contacts based on simulated loop extruder positions using the Gaussian chain model was previously presented in Banigan et al. eLife 9:e53558 2020, which should be cited.
      5. It is stated that simulation agreement with experiments for cerevisiae is worse in part due to variability in the experiments, with MPR and Pearson numbers for cerevisiae replicates computed for reference. But these numbers are difficult to interpret without, for example, similar numbers for duplicate pombe experiments. Again, these numbers should be generated using Hi-C maps scaled by P(s), especially in case there are systematic errors in one replicate vs. another.
      6. In the model section, it is stated that LEF binding probabilities are uniformly distributed. Did the authors mean the probability is uniform across the genome or that the probability at each site is a uniformly distributed random number? Please clarify, and if the latter, explain why this unconventional assumption was made.
      7. Supplement p4 line 86 - what is meant by "processivity of loops extruded by isolated LEFs"? "size of loops extruded by..." or "processivity of isolated LEFs"?
      8. The use of parentheticals in the caption to Table 2 is a little confusing; adding a few extra words would help.
      9. Page 12 sentence line 315-318 is difficult to understand. The barrier parameter is apparently something from ref 47 not previously described in the manuscript.
      10. Statement on p14 line 393-4 is false: prior LEF models have not been limited to vertebrates, and the authors have cited some of them here. There are also non-vertebrate examples with extrusion barriers: genes as boundaries to condensin in bacteria (Brandao et al. PNAS 116:20489 2019) and MCM complexes as boundaries to cohesin in yeast (Dequeker et al. Nature 606:197 2022).

      Referees cross-commenting

      I agree with the comments of Reviewer 1, which are interesting and important points that should be addressed.

      Significance

      Analytically approaching extrusion by treating cohesin translocation as a conserved current is an interesting approach to modeling and analysis of extrusion-based chromatin organization. It appears to work well as a descriptive model. But I think there are major questions concerning the mechanistic value of this model, possible applications of the model, the provided interpretations of the model and experiments, and the limitations of the model under the current assumptions. I am unconvinced that this analysis specifically is sufficient to demonstrate that extrusion is the primary organizer of chromatin on these scales; moreover, the need to demonstrate this is questionable, as extrusion is widely accepted, even if not universally so. It is also unclear that the minimal approach of the CCLE necessarily offers an improved physical basis for modeling extrusion, as compared to previous efforts such as ref 47, as claimed by the authors. There are also questions about significance due to possible limitations of the model (detailed above). Applying the CCLE model to identify barriers would be interesting, but is not attempted. Overall, the work presents a reasonable analytical model and numerical method, but until the major comments above are addressed and some reasonable application or mechanistic value or interpretation is presented, the overall significance is somewhat limited.

    1. Author response:

      [The following is the authors’ response to the current reviews.]

      In response to Reviewer #2, we agree with the reviewer that it needs to be noted that not all forms of recognition are the same and have added the following: "However, we note that not all forms of recognition are the same; researchers may prefer to have their work featured instead of personal stories or critiques of the scientific environment."


      [The following is the authors’ response to the previous reviews.]

      We thank both reviewers for their detailed comments and insightful suggestions. Below we summarize our responses to each concern in addition to the edits within the manuscript.

      We would also like to add a clarification to the eLife assessment, it states “This important bibliometric analysis shows that authors of scientific papers whose names suggest they are female or East Asian get quoted less often in news stories about their work.” We show that individuals with names predicted to be from women or East Asian name origins are less likely to be quoted or mentioned in Nature’s scientific news stories than expected by publication demographics. In this study, we did not compare the level of coverage of a scientific article by the demographics of the authors of the article.

      Reviewer #1

      The article is not so clearly structured, which makes it hard to follow. A better framing, contextualization, and conceptualization of their analysis would help the readers to better understand the results. There are some unclear definitions and wrong wording of key concepts.

      We have adapted our wording in the text and added a more detailed discussion which hopefully makes the paper easier to comprehend. These changes are described in the context of your reviewer's suggestions and addressed in the next section.

      Language use: Male/Female refers to sex, not to gender.

      We have now updated the language throughout the text. Thank you for pointing this out.

      Regional disparities are not the same as names' origin. While the first might relate to the academic origin of authors, inferred from their institutional belonging, the latter reflects the authors' inferred identity. Ethnic identities and the construction of prejudice against specific populations need proper contextualization.

      We have added better contextualization in the manuscript and reworded the section in our results and discussion to clarify that we are analyzing disparities related to perceived ethnicity and not regions. We also added the following text to the results section “In our analysis, we use name origin as an estimate for the perceived ethnicity of a primary source by a journalist. Our prediction is not intended to assign ethnicity to an individual, but to be used broadly as a tool to quantify representational differences in a journalist's sociologically constructed perception of a primary source's ethnicity.” We also added the following text to our Discussion: “Our use of name origins is a proxy for a journalist's or referring scholarly peer’s potential perceptions of the ethnicity of a primary source as signaled by an individual's name. We do not intend to assign an identity to an individual, but to generate a broad metric to measure possible bias for particular ethnicities during journalists' primary source gathering.”

      It would be helpful to have a clear definition of what are quotes, mentions, and citations. For me, it was not so clear and made understanding the results more difficult.

      We added the following text to the results section Extracted Data Used for Analysis: “Quoted names are any names that were attached to a quote within the article. Mentioned names are any names that were stated within the article. Cited names are all author names of a scientific paper that was cited in the news article.”

      The comparison against Nature published research articles is not perfect because journalists will also cover articles not published in Nature. If for example, the gender representation in the quoted articles is not the same between Nature journals and other journals, then this source of inequality would be missing (e.g. if the journalists are biased against women, but not as much when they published in Nature, because they are also biased towards Nature articles). Also, the gender representation among Nature authors could not be the same as in general. Nevertheless, this seems to be a fair benchmark, especially if the authors did not have access to other more comprehensive databases. But a statement of limitations including these potential issues would be good to have.

      To add better context to the generalizability of our work, we added the following text to our discussion: “Furthermore, the news articles present on "www.nature.com" are intended for a very specific readership that may not be reflective of more broad scientific news outlets. In a separate analysis, we took a cursory look into a comparison with The Guardian and found similar disparities in gender and name origin. However, it is not clear which publications should be used as a comparator for science-related articles in The Guardian, and difficult to compare relative rates of representation. While other science news outlets may not have a direct comparator, it would be useful to take a broad comparison across multiple science news outlets to compare against one another. Our existing pipeline could be easily applied to other science news outlets and identify if there exists a consistent pattern of disparity regardless of the intended readership.”

      "we select the highest probability origin for each name as the resultant assignment". Threshold based approaches for race/ethnicity name-based inference have been criticized by the literature as they might reproduce biases (see Kozlowski, D., Murray, D. S., Bell, A., Hulsey, W., Larivière, V., Monroe-White, T., & Sugimoto, C. R. (2022). Avoiding bias when inferring race using name-based approaches. Plos one, 17(3), e0264270.). The authors could use the full distribution of probabilities over names instead of selecting one. The formulae proposed (3-5) could be easily adapted to this change.

      We thank the author for pointing this out. We have updated our analysis to use the probabilities instead of hard assignments. Figure 3 and formulae 3-5 have been updated. While we observe a slight shift in the calculated values, the overall trends are unchanged.

      Is it possible to make an analysis that intersects both name origin and gender? I am not sure if the sample size would allow for this, but if some other dimensions were collapsed, it would be very important to show what happens at the intersection of these two dimensions of discrimination.

      We agree that identifying any differences in quotation patterns at the intersection of gender and name origin would be very useful to identify. To address this, we added supplemental table 5. This table identifies the number of quotes per predicted name origin and gender over all years and article types. In this table, we don’t see a significant difference in gender distribution across predicted name origins.

      Given a larger sample size, we would be able to better identify more subtle differences, but at this sample size, we cannot make more detailed inferences. Additionally, this also addresses a QC-issue, where predicted gender accuracy varies by name origin, specifically East Asian name origin. From our data, we don’t see a large difference in proportions across any name origin. We added the following text to the results section to incorporate this analysis:

      “However, it should be noted that the error rate varies by name origin with the largest decrease in performance on names with an Asian origin [@doi:10.7717/peerj-cs.156;@doi:10.5195/jmla.2021.1252]

      . In our analysis, we did not observe a large difference in names predicted to come from a man or woman between predicted East Asian and other name origins (Table 5). “

      The use of vocabulary should be more homogeneous. For example, in page 13 the authors start to use the concepts of over/under enrichment, which appeared before in a title but was not used.

      The text has been updated to remove all mentions of “over/under enrichment” with “over/under representation”

      In the discussions section, it would be important to see as a statement of limitations the problems that automatic origin and gender inference have.

      We thank the reviewer for this suggestion. We have added the following paragraph to our discussion.

      Computational tools enabled us to automatically analyze thousands of articles to identify existing disparities by gender and name origin, but these tools are not without limitations. Our tools are unable to identify non-binary people and rely on gender predictors that are known to have region-specific biases, with the largest decrease in performance on names of an Asian origin [@doi:10.7717/peerj-cs.156;@doi:10.5195/jmla.2021.1252]. Furthermore, name origin is only a proxy for externally perceived racial or ethnic origins of a source or author and is not as accurate as self-identified race or ethnicity. Self-identification better captures the lived experience of an individual that computational estimates from a name can not capture. This is highlighted in our inability to distinguish between Black and White people from the US by their names. As the collection of demographic data by publication outlets grows, we believe this will enable a more fine-grained and accurate analysis of disparities in scientific journalism.

      Figures 2a and 3a show that the affiliations of authors and their countries was going to be used in this analysis. Yet, this section is not present in the article. I would encourage the authors to add this to the analysis as it would show important patterns, and to intersect the dimensions of gender, name origin and country.

      We were interested in using this analysis in our work, but unfortunately the sample size of cited works in each country was too small to make inferences. If this work was extended to larger scientific outlets to include larger corpora such as The Guardian or New York Times, we think one could be able to make more robust inferences. Since our work only focuses on Nature, we decided not to include this analysis. However, we do include a section in our discussion for future work.

      “As a proxy for measuring possible geographical bias of a journalist, we attempted to identify if there was any geographical bias of cited authors. To do this, we identified the affiliation of each cited author and identified their affiliated country. Unfortunately, we could not robustly extract a large enough number of cited authors from different countries to make any conclusive statements. Expanding our work to other science journalism outlets could help identify possible ways in which geographic region, genders, and perceived ethnicity interact and affect scientific visibility of specific groups. While we are unable to identify that journalists have a specific geographical bias, having reporters explicitly focused on specific regional sources will broaden coverage of international opinions in science.”

      It is not clear at that point what column dependence means.

      The abstract has been updated to state, “Gender disparity in Nature quotes was dependent on the article type.”

      Reviewer #2

      We thank the reviewer for their very detailed and insightful suggestions regarding our analysis and the key caveats that needed better contextualization in our analysis. We went through each major point the reviewer brought up below and included any additional text that was needed.

      In some cases, the manuscript lacks consistency in terminology, and uses word choice that is strange (e.g., "enrichment" and "depletion" when discussion representation).

      We thank the review for pointing this out, we have removed all instances of depletion/enrichment for over/under-representation

      Caveats to Claim 1. So while Claim 1 holds, it does not hold for all comparator sets and for all years. I don't think this is critical of the paper-the authors do discuss the trend in Claim 2-but interpretation of this claim should take care of these caveats, and readers should consider the important differences in first and last authorship.

      We thank the reviewer for their detailed feedback on this section. We have added the missing contextualization of our results. In the results section, I changed the figure caption to: “Speakers predicted to be men are sometimes overrepresented in quotes, but this depends on the year and article type.” Added the following paragraph “When considering the relative proportion of authors and speakers predicted to be men, we only find a slight over-representation of men. This overrepresentation is dependent on the authorship position and the year. Before 2010, quotes predicted as from men are overrepresented in comparison to both first and last authors, but between 2010 and 2017 quotes predicted from men are only overrepresented in comparison for first authors. In 2020, we find a slight over-representation of quotes predicted to be from women relative to first and last authors, but still severely under-represented when considering the general population. The choice of comparison between first and last authors can reveal different aspects of the current state of academia. While this does not hold in all scientific fields, first authors are typically early career scientists and last authors are more senior scientists. It has also been shown that early career scientists tend to be more diverse than senior scientists [@doi:10.7554/eLife.60829; @doi:10.1096/fj.201800639]. Since we find that quotes are only slightly more likely to come from a last author, it is reasonable to compare the relative rate of predicted quotes from men to either authorship position. Comparison with last authorships may reveal more how gender bias currently exists whereas comparison with early career scientists may reveal bias in comparison to a future, more possibly diverse academic environment. We hope that increased representation and recognition of women in science, even beyond what is observed in authorship, can increase the proportion of women first and last authors such that it better reflects the general population.”

      Generalizability to other contexts of science journalism:

      We thank the reviewer for their feedback on the generalizability of our work. We have now added the following text to our discussion to provide the reader with a better context of our results: “To articles presented on "www.nature.com" are intended for a very specific readership that may not be reflective of more broad scientific news outlets. In a separate analysis, we took a cursory look into a comparison with The Guardian and found very similar disparities in gender and name origin. However, it is not clear which publications should be used as a comparator for science-related articles in The

      Guardian, and difficult to compare relative rates of representation. While other science news outlets may not have a direct comparator, it would be useful to take a broad comparison across multiple science news outlets to compare against one another. Our existing pipeline could be easily applied to other science news outlets and identify if there exists a consistent pattern of disparity regardless of the intended readership. ”

      Shallow discussion:

      The authors highlight gender parity in career features, but why exactly is there gender parity in this format

      We thank the reviewer for encouraging us to better contextualize our findings in the broader discourse. We have now added several sections to our Discussion. To address gender parity, we have added the following text: “This finding, coupled with the near equal number of articles written by journalists predicted to be men or women, argues for more diversity in topical coverage. "Career Feature" articles highlight current topics relevant to working scientists and frequently highlight systemic issues with the scientific environment. This column allows space for marginalized people to critique the current state of affairs in science or share their personal stories. This type of content encourages the journalist to seek out a diverse set of primary sources. Including more content that is not primarily focused on recent publications, but all topics surrounding the practice of science, can serve as an additional tool to rapidly achieve gender parity in journalistic recognition.”

      Representation in quotations varies by first and last author, most certainly as a result of the academic division of labor in the life sciences. However, what does it say about the scientific quotation that it appears first authors are more often to be quoted? Does this mean that the division of labor is changing such that the first authors are the lead scientists? Or does it imply that senior authors are being skipped over, or giving away their chance to comment on a study to the first author?

      We thank the reviewer for asking bringing up these important questions. We have added better context to our first author analysis in our discussion. We have included the following two sections to address this. Also, we want to state that we find last authors to be slightly more quoted than first authors, as depicted in Fig. 2d., with first author quotation percentage largely appearing below the red line. We included this text in a response above and include it again here for convenience.

      “Before 2010, quotes predicted as from men are overrepresented in comparison to both first and last authors, but between 2010 and 2017 quotes predicted from men are only overrepresented in comparison for first authors. In 2020, we find a slight over-representation of quotes predicted to be from women relative to first and last authors, but still severely under-represented when considering the general population. The choice of comparison between first and last authors can reveal different aspects of the current state of academia. While this does not hold in all scientific fields, first authors are typically early career scientists and last authors are more senior scientists. It has also been shown that early career scientists tend to be more diverse than senior scientists [@doi:10.7554/eLife.60829; @doi:10.1096/fj.201800639]. Since we find that quotes are only slightly more likely to come from a last author, it is reasonable to compare the relative rate of predicted quotes from men to either authorship position. Comparison with last authorships may reveal more how gender bias currently exists whereas comparison with early career scientists may reveal bias in comparison to a future, more possibly diverse academic environment. We hope that increased representation and recognition of women in science, even beyond what is observed in authorship, can increase the proportion of women first and last authors such that it better reflects the general population.”

      “In our analysis, we also find that there are more first authors with predicted East Asian name origin than last authors. This is in contrast to predicted Celtic/English and European name origins.

      Furthermore, we see that the amount of first author people with predicted East Asian name origins is increasing at a much faster rate than quotes are increasing. If this mismatched rate of representation continues, this could lead to an increasingly large erasure of early career scientists with East Asian name origins. As noted before, focusing on increasing engagement with early career scientists can help to reduce the growing disparity of public visibility of scientists with East Asian name origins.”

      What might be the downstream impacts on the public stemming from the under-representation of scientists with East Asian names? According to Figure 3d, not only are East Asian names under-represented in quotations, but they are becoming more under-represented over time as they appear as authors in a greater number of Nature publications; Those with European names are proportionately represented in quotations given their share of authors in Nature. Why might this be, especially seeing as Anglo names are heavily over-represented?

      To address this point, we have added the following text to our discussion: “In our analysis, we also find that there are more first authors with predicted East Asian name origin than last authors. This is in contrast to predicted Celtic/English and European name origins. Furthermore, the amount of first author people with predicted East Asian name origins is increasing at a much faster rate than quotes are increasing. If this mismatched rate of representation continues, this could lead to an increasingly large erasure of early career scientists with East Asian name origins. As noted before, focusing on increasing engagement with early career scientists can help to reduce the growing disparity of public visibility of scientists with East Asian name origins.”

      I am very confused by Figure 1B. It mixes the counts of News-related items with (non-Springer) research articles in a single stacked bar plot which makes determining the quantity of either difficult. I would advise splitting them out

      Figure 1B has been updated, and the News and Research articles have been separated.

      When querying the first 2000 or so results from the SpringerNature API, are the authors certain that they are getting a random sample of papers?

      These papers were the first 200 English language "Journal" papers returned by the Springer Nature API for each month, resulting in 2400 papers per year from 2005 through 2020. These papers are the first 200 papers published each month by a Springer Nature journal, which may not be completely random, but we believe to be a reasonably representative sample. Furthermore, the Springer Nature comparator set is being used as an additional comparator to the complete set of all Nature research papers used in our analyses.

      In all figures: the authors use capital letters to indicate panels in the caption, but lowercase letters in the figure itself and in the main text. This should be made consistent.

      This has been updated.

      In all figures: the authors should make the caption letter bold in the figure captions, which makes it much easier to find descriptions of specific panels

      This has been updated.

      In the section "coreNLP": the authors mention "co-reference resolution" but without really remarking why it is being used. This is an issue throughout the methods-the authors describe what method they are using but either they don't mention why they are using that method until later, or else not at all.

      We have added better reasoning behind our coreNLP selected methods: “We used the standard set of annotaters: tokenize, ssplit, pos, lemma, ner, parse, coref, and additionally the quote annotator. These perform text tokenization, sentence splitting, part of speech recognition, lemmatization, named entity recoginition, division of sentences into constituent phrases, co-reference resolution, and identification of quoted entities, respectively. We used the "statistical" algorithm to perform coreference resolution for speed. Each of these aspects is required to identify the names of quoted or mentioned speakers and identify any of their associated pronouns. All results were output to json format for further downstream processing.”

      We included a better description of scrapy: “Scrapy is a tool that applies user-defined rules to follow hyperlinks on webpages and return the information contained on each webpage.

      We used Scrapy to extract all web pages containing news articles and extract the text.”

      We also included our motivation for bootstrapping: “We used the boostrap method to construct confidence intervals for each of our calculated statistics.”

      In the section "Name Formatting for Gender Prediction in Quotes or Mentions", genderizeR is mentioned before an introduction to the tool

      We added the following text to provide context: “Even though genderizeR, the computational method used to predict the name's gender, only uses the first name to make the gender prediction, identifying the full name gives us greater confidence that we correctly identified the first name. “

      In the section "Name Formatting for Gender Prediction of Authors", you state that you exclude papers with only one author. How many papers is this? I assume few, in Nature, but if not I can imagine gender differences based on who writes first-authored papers.

      We find that the number excluded is roughly 7% of all papers, which is consistent across Nature and Springer Nature (1113/15013 for cited springer articles, 2899/42155 for random springer articles, 955/12459 for nature authors). We have added the following text to the manuscript for better context: “Roughly 7% of all papers were estimated to be by a single author and removed from this analysis.: 1113/15013 for cited Springer articles, 2899/42155 for random Springer articles, 955/12459 for Nature research articles.”

      In "Name Origin Analysis", for the in-text reference to Equation 3: include the prefix "Eq." or similar to mark this as referencing the equation and not something else

      This has been updated.

      The use of the word "enrichment" in reference to the representation of East Asian authors is strange and does not fit the colloquial definition of the term. I suggest just using a simpler term like "representation" instead

      Similarly, the authors use the word "depletion" to reflect the lower rate of quotes to scientists with East-Asian names, but I feel a simpler word would be more appropriate.

      We thank the reviewer for this suggestion, all instances of “enrichment/depletion” have been replaced with “over/under representation”

      The authors claim in Figure 2d that there is a steady increase in the rate of first author citations, however, this graph is not convincing. It appears to show much more noise than anything resembling a steady change.

      We have reworded our figure description to state that there is a consistent bias towards quoting last authors. Our figure description now states: “Panel d shows a consistent but slight bias towards quoting the last author of a cited article than the first author over time.”

      Supplemental Figures 1b and 1c do not seem to be mentioned in the main text, and I struggle to see their relevance.

      We thank the reviewer for identifying this error; these subpanels have been removed.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      I have trialled the package on my lab's data and it works as advertised. It was straightforward to use and did not require any special training. I am confident this is a tool that will be approachable even to users with limited computational experience. The use of artificial data to validate the approach - and to provide clear limits on applicability - is particularly helpful.

      The main limitation of the tool is that it requires the user to manually select regions. This somewhat limits the generalisability and is also more subjective - users can easily choose "nice" regions that better match with their hypothesis, rather than quantifying the data in an unbiased manner. However, given the inherent challenges in quantifying biological data, such problems are not easily circumventable.

      *

      * I have some comments to clarify the manuscript:

      1. A "straightforward installation" is mentioned. Given this is a Method paper, the means of installation should be clearly laid out.*

      __This sentence is now modified. In the revised manuscript we now describe how to install the toolset and we give the link to the toolset website if further information is needed. __On this website, we provide a full video tutorial and a user manual. The user manual is provided as a supplementary material of the manuscript.

      * It would be helpful if there was an option to generate an output with the regions analysed (i.e., a JPG image with the data and the drawn line(s) on top). There are two reasons for this: i) A major problem with user-driven quantification is accidental double counting of regions (e.g., a user quantifies a part of an image and then later quantifies the same region). ii) Allows other users to independently verify measurements at a later time.*

      We agree that it is helpful to save the analyzed regions. To answer this comment and the other two reviewers' comments pointing at a similar feature, we have now included an automatic saving of the regions of interest. The user will be able to reopen saved regions of interest using a new function we included in the new version of PatternJ.

      * 3. Related to the above point, it is highlighted that each time point would need to be analysed separately (line 361-362). It seems like it should be relatively straightforward to allow a function where the analysis line can be mapped onto the next time point. The user could then adjust slightly for changes in position, but still be starting from near the previous timepoint. Given how prevalent timelapse imaging is, this seems like (or something similar) a clear benefit to add to the software.*

      We agree that the analysis of time series images can be a useful addition. We have added the analysis of time-lapse series in the new version of PatternJ. The principles behind the analysis of time-lapse series and an example of such analysis are provided in Figure 1 - figure supplement 3 and Figure 5, with accompanying text lines 140-153 and 360-372. The analysis includes a semi-automated selection of regions of interest, which will make the analysis of such sequences more straightforward than having to draw a selection on each image of the series. The user is required to draw at least two regions of interest in two different frames, and the algorithm will automatically generate regions of interest in frames in which selections were not drawn. The algorithm generates the analysis immediately after selections are drawn by the user, which includes the tracking of the reference channel.

      * Line 134-135. The level of accuracy of the searching should be clarified here. This is discussed later in the manuscript, but it would be helpful to give readers an idea at this point what level of tolerance the software has to noise and aperiodicity.

      *

      We agree with the reviewer that a clarification of this part of the algorithm will help the user better understand the manuscript.__ We have modified the sentence to clarify the range of search used and the resulting limits in aperiodicity (now lines 176-181). __Regarding the tolerance to noise, it is difficult to estimate it a priori from the choice made at the algorithm stage, so we prefer to leave it to the validation part of the manuscript. We hope this solution satisfies the reviewer and future users.

      *

      **Referees cross-commenting**

      I think the other reviewer comments are very pertinent. The authors have a fair bit to do, but they are reasonable requests. So, they should be encouraged to do the revisions fully so that the final software tool is as useful as possible.

      Reviewer #1 (Significance (Required)):

      Developing software tools for quantifying biological data that are approachable for a wide range of users remains a longstanding challenge. This challenge is due to: (1) the inherent problem of variability in biological systems; (2) the complexity of defining clearly quantifiable measurables; and (3) the broad spread of computational skills amongst likely users of such software.

      In this work, Blin et al., develop a simple plugin for ImageJ designed to quickly and easily quantify regular repeating units within biological systems - e.g., muscle fibre structure. They clearly and fairly discuss existing tools, with their pros and cons. The motivation for PatternJ is properly justified (which is sadly not always the case with such software tools).

      Overall, the paper is well written and accessible. The tool has limitations but it is clearly useful and easy to use. Therefore, this work is publishable with only minor corrections.

      *We thank the reviewer for the positive evaluation of PatternJ and for pointing out its accessibility to the users.

      *

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      # Summary

      The authors present an ImageJ Macro GUI tool set for the quantification of one-dimensional repeated patterns that are commonly occurring in microscopy images of muscles.

      # Major comments

      In our view the article and also software could be improved in terms of defining the scope of its applicability and user-ship. In many parts the article and software suggest that general biological patterns can be analysed, but then in other parts very specific muscle actin wordings are used. We are pointing this out in the "Minor comments" sections below. We feel that the authors could improve their work by making a clear choice here. One option would be to clearly limit the scope of the tool to the analysis of actin structures in muscles. In this case we would recommend to also rename the tool, e.g. MusclePatternJ. The other option would be to make the tool about the generic analysis of one-dimensional patterns, maybe calling the tool LinePatternJ. In the latter case we would recommend to remove all actin specific wordings from the macro tool set and also the article should be in parts slightly re-written.

      *

      We agree with the reviewer that our initial manuscript used a mix of general and muscle-oriented vocabulary, which could make the use of PatternJ confusing especially outside of the muscle field. To make PatternJ useful for the largest community, we corrected the manuscript and the PatternJ toolset to provide the general vocabulary needed to make it understandable for every biologist. We modified the manuscript accordingly.

      * # Minor/detailed comments

      # Software

      We recommend considering the following suggestions for improving the software.

      ## File and folder selection dialogs

      In general, clicking on many of the buttons just opens up a file-browser dialog without any further information. For novel users it is not clear what the tool expects one to select here. It would be very good if the software could be rewritten such that there are always clear instructions displayed about which file or folder one should open for the different buttons.*

      We experienced with the current version of macOS that the file-browser dialog does not display any message; we suspect this is the issue raised by the reviewer. This is a known issue of Fiji on Mac and all applications on Mac since 2016. We provided guidelines in the user manual and on the tutorial video to correct this issue by changing a parameter in Fiji. Given the issues the reviewer had accessing the material on the PatternJ website, which we apologize for, we understand the issue raised. We added an extra warning on the PatternJ website to point at this problem and its solution. Additionally, we have limited the file-browser dialog appearance to what we thought was strictly necessary. Thus, the user will experience fewer prompts, speeding up the analysis.

      *

      ## Extract button

      The tool asks one to specify things like whether selections are drawn "M-line-to-M-line"; for users that are not experts in muscle morphology this is not understandable. It would be great to find more generally applicable formulations. *

      We agree that this muscle-oriented vocabulary can make the use of PatternJ confusing. We have now corrected the user interface to provide both general and muscle-specific vocabulary ("center-to-center or edge-to-edge (M-line-to-M-line or Z-disc-to-Z-disc)").*

      ## Manual selection accuracy

      The 1st step of the analysis is always to start from a user hand-drawn profile across intensity patterns in the image. However, this step can cause inaccuracy that varies with the shape and curve of the line profile drawn. If not strictly perpendicular to for example the M line patterns, the distance between intensity peaks will be different. This will be more problematic when dealing with non-straight and parallelly poised features in the image. If the structure is bended with a curve, the line drawn over it also needs to reproduce this curve, to precisely capture the intensity pattern. I found this limits the reproducibility and easy-usability of the software.*

      We understand the concern of the reviewer. On curved selections this will be an issue that is difficult to solve, especially on "S" curved or more complex selections. The user will have to be very careful in these situations. On non-curved samples, the issue may be concerning at first sight, but the errors go with the inverse of cosine and are therefore rather low. For example, if the user creates a selection off by 5 degrees, which is visually obvious, lengths will be affected by an increase of only 0.38%. The point raised by the reviewer is important to discuss, and we therefore added a paragraph to comment on the choice of selection (lines 94-98) and a supplementary figure to help make it clear (Figure 1 - figure supplement 1).*

      ### Reproducibility

      Since the line profile drawn on the image is the first step and very essential to the entire process, it should be considered to save together with the analysis result. For example, as ImageJ ROI or ROIset files that can be re-imported, correctly positioned, and visualized in the measured images. This would greatly improve the reproducibility of the proposed workflow. In the manuscript, only the extracted features are being saved (because the save button is also just asking for a folder containing images, so I cannot verify its functionality). *

      We agree that this is a very useful and important feature. We have added ROI automatic saving. Additionally, we now provide a simplified import function of all ROIs generated with PatternJ and the automated extraction and analysis of the list of ROIs. This can be done from ROIs generated previously in PatternJ or with ROIs generated from other ImageJ/Fiji algorithms. These new features are described in the manuscript in lines 120-121 and 130-132.

      *

      ## ? button

      It would be great if that button would open up some usage instructions.

      *

      We agree with the reviewer that the "?" button can be used in a better way. We have replaced this button with a Help menu, including a simple tutorial showing a series of images detailing the steps to follow by the user, a link to the user website, and a link to our video tutorial.

      * ## Easy improvement of workflow

      I would suggest a reasonable expansion of the current workflow, by fitting and displaying 2D lines to the band or line structure in the image, that form the "patterns" the author aims to address. Thus, it extracts geometry models from the image, and the inter-line distance, and even the curve formed by these sets of lines can be further analyzed and studied. These fitted 2D lines can be also well integrated into ImageJ as Line ROI, and thus be saved, imported back, and checked or being further modified. I think this can largely increase the usefulness and reproducibility of the software.

      *

      We hope that we understood this comment correctly. We had sent a clarification request to the editor, but unfortunately did not receive an answer within the requested 4 weeks of this revision. We understood the following: instead of using our 1D approach, in which we extract positions from a profile, the reviewer suggests extracting the positions of features not as a single point, but as a series of coordinates defining its shape. If this is the case, this is a major modification of the tool that is beyond the scope of PatternJ. We believe that keeping our tool simple, makes it robust. This is the major strength of PatternJ. Local fitting will not use line average for instance, which would make the tool less reliable.

      * # Manuscript

      We recommend considering the following suggestions for improving the manuscript. Abstract: The abstract suggests that general patterns can be quantified, however the actual tool quantifies specific subtypes of one-dimensional patterns. We recommend adapting the abstract accordingly.

      *

      We modified the abstract to make this point clearer.

      * Line 58: Gray-level co-occurrence matrix (GLCM) based feature extraction and analysis approach is not mentioned nor compared. At least there's a relatively recent study on Sarcomeres structure based on GLCM feature extraction: https://github.com/steinjm/SotaTool with publication: *https://doi.org/10.1002/cpz1.462

      • *

      We thank the reviewer for making us aware of this publication. We cite it now and have added it to our comparison of available approaches.

      * Line 75: "...these simple geometrical features will address most quantitative needs..." We feel that this may be an overstatement, e.g. we can imagine that there should be many relevant two-dimensional patterns in biology?!*

      We have modified this sentence to avoid potential confusion (lines 76-77).

      • *

      • Line 83: "After a straightforward installation by the user, ...". We think it would be convenient to add the installation steps at this place into the manuscript. *

      __This sentence is now modified. We now mention how to install the toolset and we provide the link to the toolset website, if further information is needed (lines 86-88). __On the website, we provide a full video tutorial and a user manual.

      * Line 87: "Multicolor images will give a graph with one profile per color." The 'Multicolor images' here should be more precisely stated as "multi-channel" images. Multi-color images could be confused with RGB images which will be treated as 8-bit gray value (type conversion first) images by profile plot in ImageJ. *

      We agree with the reviewer that this could create some confusion. We modified "multicolor" to "multi-channel".

      * Line 92: "...such as individual bands, blocks, or sarcomeric actin...". While bands and blocks are generic pattern terms, the biological term "sarcomeric actin" does not seem to fit in this list. Could a more generic wording be found, such as "block with spike"? *

      We agree with the reviewer that "sarcomeric actin" alone will not be clear to all readers. We modified the text to "block with a central band, as often observed in the muscle field for sarcomeric actin" (lines 103-104). The toolset was modified accordingly.

      * Line 95: "the algorithm defines one pattern by having the features of highest intensity in its centre". Could this be rephrased? We did not understand what that exactly means.*

      We agree with the reviewer that this was not clear. We rewrote this paragraph (lines 101-114) and provided a supplementary figure to illustrate these definitions (Figure 1 - figure supplement 2).

      * Line 124 - 147: This part the only description of the algorithm behind the feature extraction and analysis, but not clearly stated. Many details are missing or assumed known by the reader. For example, how it achieved sub-pixel resolution results is not clear. One can only assume that by fitting Gaussian to the band, the center position (peak) thus can be calculated from continuous curves other than pixels. *

      Note that the two sentences introducing this description are "Automated feature extraction is the core of the tool. The algorithm takes multiple steps to achieve this (Fig. S2):". We were hoping this statement was clear, but the reviewer may refer to something else. We agree that the description of some of the details of the steps was too quick. We have now expanded the description where needed.

      * Line 407: We think the availability of both the tool and the code could be improved. For Fiji tools it is common practice to create an Update Site and to make the code available on GitHub. In addition, downloading the example file (https://drive.google.com/file/d/1eMazyQJlisWPwmozvyb8VPVbfAgaH7Hz/view?usp=drive_link) required a Google login and access request, which is not very convenient; in fact, we asked for access but it was denied. It would be important for the download to be easier, e.g. from GitHub or Zenodo.

      *

      We are sorry for issues encountered when downloading the tool and additional material. We thank the reviewer for pointing out these issues that limited the accessibility of our tool. We simplified the downloading procedure on the website, which does not go through the google drive interface nor requires a google account. Additionally, for the coder community the code, user manual and examples are now available from GitHub at github.com/PierreMangeol/PatternJ, and are provided as supplementary material with the manuscript. To our knowledge, update sites work for plugins but not for macro toolsets. Having experience sharing our codes with non-specialists, a classical website with a tutorial video is more accessible than more coder-oriented websites, which deter many users.

      * Reviewer #2 (Significance (Required)):

      The strength of this study is that a tool for the analysis of one-dimensional repeated patterns occurring in muscle fibres is made available in the accessible open-source platform ImageJ/Fiji. In the introduction to the article the authors provide an extensive review of comparable existing tools. Their new tool fills a gap in terms of providing an easy-to-use software for users without computational skills that enables the analysis of muscle sarcomere patterns. We feel that if the below mentioned limitations could be addressed the tool could indeed be valuable to life scientists interested in muscle patterning without computational skills.

      In our view there are a few limitations, including the accessibility of example data and tutorials at sites.google.com/view/patternj, which we had trouble to access. In addition, we think that the workflow in Fiji, which currently requires pressing several buttons in the correct order, could be further simplified and streamlined by adopting some "wizard" approach, where the user is guided through the steps.

      *As answered above, the links on the PatternJ website are now corrected. Regarding the workflow, we now provide a Help menu with:

      1. __a basic set of instructions to use the tool, __
      2. a direct link to the tutorial video in the PatternJ toolset
      3. a direct link to the website on which both the tutorial video and a detailed user manual can be found. We hope this addresses the issues raised by this reviewer.

      *Another limitation is the reproducibility of the analysis; here we recommend enabling IJ Macro recording as well as saving of the drawn line ROIs. For more detailed suggestions for improvements please see the above sections of our review. *

      We agree that saving ROIs is very useful. It is now implemented in PatternJ.

      We are not sure what this reviewer means by "enabling IJ Macro recording". The ImageJ Macro Recorder is indeed very useful, but to our knowledge, it is limited to built-in functions. Our code is open and we hope this will be sufficient for advanced users to modify the code and make it fit their needs.*

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary In this manuscript, the authors present a new toolset for the analysis of repetitive patterns in biological images named PatternJ. One of the main advantages of this new tool over existing ones is that it is simple to install and run and does not require any coding skills whatsoever, since it runs on the ImageJ GUI. Another advantage is that it does not only provide the mean length of the pattern unit but also the subpixel localization of each unit and the distributions of lengths and that it does not require GPU processing to run, unlike other existing tools. The major disadvantage of the PatternJ is that it requires heavy, although very simple, user input in both the selection of the region to be analyzed and in the analysis steps. Another limitation is that, at least in its current version, PatternJ is not suitable for time-lapse imaging. The authors clearly explain the algorithm used by the tool to find the localization of pattern features and they thoroughly test the limits of their tool in conditions of varying SNR, periodicity and band intensity. Finally, they also show the performance of PatternJ across several biological models such as different kinds of muscle cells, neurons and fish embryonic somites, as well as different imaging modalities such as brightfield, fluorescence confocal microscopy, STORM and even electron microscopy.

      This manuscript is clearly written, and both the section and the figures are well organized and tell a cohesive story. By testing PatternJ, I can attest to its ease of installation and use. Overall, I consider that PatternJ is a useful tool for the analysis of patterned microscopy images and this article is fit for publication. However, i do have some minor suggestions and questions that I would like the authors to address, as I consider they could improve this manuscript and the tool:

      *We are grateful to this reviewer for this very positive assessment of PatternJ and of our manuscript.

      * Minor Suggestions: In the methodology section is missing a more detailed description about how the metric plotted was obtained: as normalized intensity or precision in pixels. *

      We agree with the reviewer that a more detailed description of the metric plotted was missing. We added this information in the method part and added information in the Figure captions where more details could help to clarify the value displayed.

      * The validation is based mostly on the SNR and patterns. They should include a dataset of real data to validate the algorithm in three of the standard patterns tested. *

      We validated our tool using computer-generated images, in which we know with certainty the localization of patterns. This allowed us to automatically analyze 30 000 images, and with varying settings, we sometimes analyzed 10 times the same image, leading to about 150 000 selections analyzed. From these analyses, we can provide with confidence an unbiased assessment of the tool precision and the tool capacity to extract patterns. We already provided examples of various biological data images in Figures 4-6, showing all possible features that can be extracted with PatternJ. In these examples, we can claim by eye that PatternJ extracts patterns efficiently, but we cannot know how precise these extractions are because of the nature of biological data: "real" positions of features are unknown in biological data. Such validation will be limited to assessing whether a pattern was found or not, which we believe we already provided with the examples in Figures 4-6.

      * The video tutorial available in the PatternJ website is very useful, maybe it would be worth it to include it as supplemental material for this manuscript, if the journal allows it. *

      As the video tutorial may have been missed by other reviewers, we agree it is important to make it more prominent to users. We have now added a Help menu in the toolset that opens the tutorial video. Having the video as supplementary material could indeed be a useful addition if the size of the video is compatible with the journal limits.

      * An example image is provided to test the macro. However, it would be useful to provide further example images for each of the three possible standard patterns suggested: Block, actin sarcomere or individual band.*

      We agree this can help users. We now provide another multi-channel example image on the PatternJ website including blocks and a pattern made of a linear intensity gradient that can be extracted with our simpler "single pattern" algorithm, which were missing in the first example. Additionally, we provide an example to be used with our new time-lapse analysis.

      * Access to both the manual and the sample images in the PatternJ website should be made publicly available. Right now they both sit in a private Drive account. *

      As mentioned above, we apologize for access issues that occurred during the review process. These files can now be downloaded directly on the website without any sort of authentication. Additionally, these files are now also available on GitHub.

      * Some common errors are not properly handled by the macro and could be confusing for the user: When there is no selection and one tries to run a Check or Extraction: "Selection required in line 307 (called from line 14). profile=getProfile( ;". A simple "a line selection is required" message would be useful there. When "band" or "block" is selected for a channel in the "Set parameters" window, yet a 0 value is entered into the corresponding "Number of bands or blocks" section, one gets this error when trying to Extract: "Empty array in line 842 (called from line 113). if ( ( subloc . length == 1 ) & ( subloc [ 0 == 0) ) {". This error is not too rare, since the "Number of bands or blocks" section is populated with a 0 after choosing "sarcomeric actin" (after accepting the settings) and stays that way when one changes back to "blocks" or "bands".*

      We thank the reviewer for pointing out these bugs. These bugs are now corrected in the revised version.

      * The fact that every time one clicks on the most used buttons, the getDirectory window appears is not only quite annoying but also, ultimately a waste of time. Isn't it possible to choose the directory in which to store the files only once, from the "Set parameters" window?*

      We have now found a solution to avoid this step. The user is only prompted to provide the image folder when pressing the "Set parameter" button. We kept the prompt for directory only when the user selects the time-lapse analysis or the analysis of multiple ROIs. The main reason is that it is very easy for the analysis to end up in the wrong folder otherwise.

      * The authors state that the outputs of the workflow are "user friendly text files". However, some of them lack descriptive headers (like the localisations and profiles) or even file names (like colors.txt). If there is something lacking in the manuscript, it is a brief description of all the output files generated during the workflow.*

      PatternJ generates multiple files, several of which are internal to the toolset. They are needed to keep track of which analyses were done, and which colors were used in the images, amongst others. From the user part, only the files obtained after the analysis All_localizations.channel_X.txt and sarcomere_lengths.txt are useful. To improve the user experience, we now moved all internal files to a folder named "internal", which we think will clarify which outputs are useful for further analysis, and which ones are not. We thank the reviewer for raising this point and we now mention it in our Tutorial.

      I don't really see the point in saving the localizations from the "Extraction" step, they are even named "temp".

      We thank the reviewer for this comment, this was indeed not necessary. We modified PatternJ to delete these files after they are used.

      * In the same line, I DO see the point of saving the profiles and localizations from the "Extract & Save" step, but I think they should be deleted during the "Analysis" step, since all their information is then grouped in a single file, with descriptive headers. This deleting could be optional and set in the "Set parameters" window.*

      We understand the point raised by the reviewer. However, the analysis depends on the reference channel picked, which is asked for when starting an analysis, and can be augmented with additional selections. If a user chooses to modify the reference channel or to add a new profile to the analysis, deleting all these files would mean that the user will have to start over again, which we believe will create frustration. An optional deletion at the analysis step is simple to implement, but it could create problems for users who do not understand what it means practically.

      * Moreover, I think it would be useful to also save the linear roi used for the "Extract & Save" step, and eventually combine them during the "Analysis step" into a single roi set file so that future re-analysis could be made on the same regions. This could be an optional feature set from the "Set parameters" window. *

      We agree with the reviewer that saving ROIs is very useful. ROIs are now saved into a single file each time the user extracts and saves positions from a selection. Additionally, the user can re-use previous ROIs and analyze an image or image series in a single step.

      * In the "PatternJ workflow" section of the manuscript, the authors state that after the "Extract & Save" step "(...) steps 1, 2, 4, and 5 can be repeated on other selections (...)". However, technically, only steps 1 and 5 are really necessary (alternatively 1, 4 and 5 if the user is unsure of the quality of the patterning). If a user follows this to the letter, I think it can lead to wasted time.

      *

      We agree with the reviewer and have corrected the manuscript accordingly (line 119-120).

      • *

      *I believe that the "Version Information" button, although important, has potential to be more useful if used as a "Help" button for the toolset. There could be links to useful sources like the manuscript or the PatternJ website but also some tips like "whenever possible, use a higher linewidth for your line selection" *

      We agree with the reviewer as pointed out in our previous answers to the other reviewers. This button is now replaced by a Help menu, including a simple tutorial in a series of images detailing the steps to follow, a link to the user website, and a link to our video tutorial.

      * It would be interesting to mention to what extent does the orientation of the line selection in relation to the patterned structure (i.e. perfectly parallel vs more diagonal) affect pattern length variability?*

      As answered to reviewer 1, we understand this concern, which needs to be clarified for readers. The issue may be concerning at first sight, but the errors grow only with the inverse of cosine and are therefore rather low. For example, if the user creates a selection off by 3 degrees, which is visually obvious, lengths will be affected by an increase of only 0.14%. The point raised by the reviewer is important to discuss, and we therefore have added a comment on the choice of selection (lines 94-98) as well as a supplementary figure (Figure 1 - figure supplement 1).

      * When "the algorithm uses the peak of highest intensity as a starting point and then searches for peak intensity values one spatial period away on each side of this starting point" (line 133-135), does that search have a range? If so, what is the range? *

      We agree that this information is useful to share with the reader. The range is one pattern size. We have modified the sentence to clarify the range of search used and the resulting limits in aperiodicity (now lines 176-181).

      * Line 144 states that the parameters of the fit are saved and given to the user, yet I could not find such information in the outputs. *

      The parameters of the fits are saved for blocks. We have now clarified this point by modifying the manuscript (lines 186-198) and modifying Figure 1 - figure supplement 5. We realized we made an error in the description of how edges of "block with middle band" are extracted. This is now corrected.

      * In line 286, authors finish by saying "More complex patterns from electron microscopy images may also be used with PatternJ.". Since this statement is not backed by evidence in the manuscript, I suggest deleting it (or at the very least, providing some examples of what more complex patterns the authors refer to). *

      This sentence is now deleted.

      * In the TEM image of the fly wing muscle in fig. 4 there is a subtle but clearly visible white stripe pattern in the original image. Since that pattern consists of 'dips', rather than 'peaks' in the profile of the inverted image, they do not get analyzed. I think it is worth mentioning that if the image of interest contains both "bright" and "dark" patterns, then the analysis should be performed in both the original and the inverted images because the nature of the algorithm does not allow it to detect "dark" patterns. *

      We agree with the reviewer's comment. We now mention this point in lines 337-339.

      * In line 283, the authors mention using background correction. They should explicit what method of background correction they used. If they used ImageJ's "subtract background' tool, then specify the radius.*

      We now describe this step in the method section.

      *

      Reviewer #3 (Significance (Required)):

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field. Being a software paper, the advance proposed by the authors is technical in nature. The novelty and significance of this tool is that it offers quick and simple pattern analysis at the single unit level to a broad audience, since it runs on the ImageJ GUI and does not require any programming knowledge. Moreover, all the modules and steps are well described in the paper, which allows easy going through the analysis.
      • Place the work in the context of the existing literature (provide references, where appropriate). The authors themselves provide a good and thorough comparison of their tool with other existing ones, both in terms of ease of use and on the type of information extracted by each method. While PatternJ is not necessarily superior in all aspects, it succeeds at providing precise single pattern unit measurements in a user-friendly manner.
      • State what audience might be interested in and influenced by the reported findings. Most researchers working with microscopy images of muscle cells or fibers or any other patterned sample and interested in analyzing changes in that pattern in response to perturbations, time, development, etc. could use this tool to obtain useful, and otherwise laborious, information. *

      We thank the reviewer for these enthusiastic comments about how straightforward for biologists it is to use PatternJ and its broad applicability in the bio community.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      The authors present an ImageJ Macro GUI tool set for the quantification of one-dimensional repeated patterns that are commonly occurring in microscopy images of muscles.

      Major comments

      In our view the article and also software could be improved in terms of defining the scope of its applicability and user-ship. In many parts the article and software suggest that general biological patterns can be analysed, but then in other parts very specific muscle actin wordings are used. We are pointing this out in the "Minor comments" sections below. We feel that the authors could improve their work by making a clear choice here. One option would be to clearly limit the scope of the tool to the analysis of actin structures in muscles. In this case we would recommend to also rename the tool, e.g. MusclePatternJ. The other option would be to make the tool about the generic analysis of one-dimensional patterns, maybe calling the tool LinePatternJ. In the latter case we would recommend to remove all actin specific wordings from the macro tool set and also the article should be in parts slightly re-written.

      Minor/detailed comments

      Software

      We recommend considering the following suggestions for improving the software.

      File and folder selection dialogs

      In general, clicking on many of the buttons just opens up a file-browser dialog without any further information. For novel users it is not clear what the tool expects one to select here. It would be very good if the software could be rewritten such that there are always clear instructions displayed about which file or folder one should open for the different buttons.

      Extract button

      The tool asks one to specify things like whether selections are drawn "M-line-to-M-line"; for users that are not experts in muscle morphology this is not understandable. It would be great to find more generally applicable formulations.

      Manual selection accuracy

      The 1st step of the analysis is always to start from a user hand-drawn profile across intensity patterns in the image. However, this step can cause inaccuracy that varies with the shape and curve of the line profile drawn. If not strictly perpendicular to for example the M line patterns, the distance between intensity peaks will be different. This will be more problematic when dealing with non-straight and parallelly poised features in the image. If the structure is bended with a curve, the line drawn over it also needs to reproduce this curve, to precisely capture the intensity pattern. I found this limits the reproducibility and easy-usability of the software.

      Reproducibility

      Since the line profile drawn on the image is the first step and very essential to the entire process, it should be considered to save together with the analysis result. For example, as ImageJ ROI or ROIset files that can be re-imported, correctly positioned, and visualized in the measured images. This would greatly improve the reproducibility of the proposed workflow. In the manuscript, only the extracted features are being saved (because the save button is also just asking for a folder containing images, so I cannot verify its functionality).

      ? button

      It would be great if that button would open up some usage instructions.

      Easy improvement of workflow

      I would suggest a reasonable expansion of the current workflow, by fitting and displaying 2D lines to the band or line structure in the image, that form the "patterns" the author aims to address. Thus, it extracts geometry models from the image, and the inter-line distance, and even the curve formed by these sets of lines can be further analyzed and studied. These fitted 2D lines can be also well integrated into ImageJ as Line ROI, and thus be saved, imported back, and checked or being further modified. I think this can largely increase the usefulness and reproducibility of the software.

      Manuscript

      We recommend considering the following suggestions for improving the manuscript. Abstract: The abstract suggests that general patterns can be quantified, however the actual tool quantifies specific subtypes of one-dimensional patterns. We recommend adapting the abstract accordingly.

      Line 58: Gray-level co-occurrence matrix (GLCM) based feature extraction and analysis approach is not mentioned nor compared. At least there's a relatively recent study on Sarcomeres structure based on GLCM feature extraction: https://github.com/steinjm/SotaTool with publication: https://doi.org/10.1002/cpz1.462

      Line 75: "...these simple geometrical features will address most quantitative needs..." We feel that this may be an overstatement, e.g. we can imagine that there should be many relevant two-dimensional patterns in biology?!

      Line 83: "After a straightforward installation by the user, ...". We think it would be convenient to add the installation steps at this place into the manuscript.

      Line 87: "Multicolor images will give a graph with one profile per color." The 'Multicolor images' here should be more precisely stated as "multi-channel" images. Multi-color images could be confused with RGB images which will be treated as 8-bit gray value (type conversion first) images by profile plot in ImageJ.

      Line 92: "...such as individual bands, blocks, or sarcomeric actin...". While bands and blocks are generic pattern terms, the biological term "sarcomeric actin" does not seem to fit in this list. Could a more generic wording be found, such as "block with spike"?

      Line 95: "the algorithm defines one pattern by having the features of highest intensity in its centre". Could this be rephrased? We did not understand what that exactly means.

      Line 124 - 147: This part the only description of the algorithm behind the feature extraction and analysis, but not clearly stated. Many details are missing or assumed known by the reader. For example, how it achieved sub-pixel resolution results is not clear. One can only assume that by fitting Gaussian to the band, the center position (peak) thus can be calculated from continuous curves other than pixels.

      Line 407: We think the availability of both the tool and the code could be improved. For Fiji tools it is common practice to create an Update Site and to make the code available on GitHub. In addition, downloading the example file (https://drive.google.com/file/d/1eMazyQJlisWPwmozvyb8VPVbfAgaH7Hz/view?usp=drive_link) required a Google login and access request, which is not very convenient; in fact, we asked for access but it was denied. It would be important for the download to be easier, e.g. from GitHub or Zenodo.

      Significance

      The strength of this study is that a tool for the analysis of one-dimensional repeated patterns occurring in muscle fibres is made available in the accessible open-source platform ImageJ/Fiji. In the introduction to the article the authors provide an extensive review of comparable existing tools. Their new tool fills a gap in terms of providing an easy-to-use software for users without computational skills that enables the analysis of muscle sarcomere patterns. We feel that if the below mentioned limitations could be addressed the tool could indeed be valuable to life scientists interested in muscle patterning without computational skills.

      In our view there are a few limitations, including the accessibility of example data and tutorials at sites.google.com/view/patternj, which we had trouble to access. In addition, we think that the workflow in Fiji, which currently requires pressing several buttons in the correct order, could be further simplified and streamlined by adopting some "wizard" approach, where the user is guided through the steps. Another limitation is the reproducibility of the analysis; here we recommend enabling IJ Macro recording as well as saving of the drawn line ROIs. For more detailed suggestions for improvements please see the above sections of our review.

    1. Reviewer #2 (Public Review):

      The document "Mapping spatial patterns to energetic benefits in groups of flow-coupled swimmers" by Heydari et al. uses several types of simulations and models to address aspects of stability of position and power consumption in few-body groups of pitching foils. I think the work has the potential to be a valuable and timely contribution to an important subject area. The supporting evidence is largely quite convincing, though some details could raise questions, and there is room for improvement in the presentation. My recommendations are focused on clarifying the presentation and perhaps spurring the authors to assess additional aspects:

      (1) Why do the authors choose to set the swimmers free only in the propulsion direction? I can understand constraining all the positions/orientations for investigating the resulting forces and power, and I can also understand the value of allowing the bodies to be fully free in x, y, and their orientation angle to see if possible configurations spontaneously emerge from the flow interactions. But why constrain some degrees of freedom and not others? What's the motivation, and what's the relevance to animals, which are fully free?

      (2) The model description in Eq. (1) and the surrounding text is confusing. Aren't the authors computing forces via CFD or the VS method and then simply driving the propulsive dynamics according to the net horizontal force? It seems then irrelevant to decompose things into thrust and drag, and it seems irrelevant to claim that the thrust comes from pressure and the drag from viscous effects. The latter claim may in fact be incorrect since the body has a shape and the normal and tangential components of the surface stress along the body may be complex.

      (3) The parameter taudiss in the VS simulations takes on unusual values such as 2.45T, making it seem like this value is somehow very special, and perhaps 2.44 or 2.46 would lead to significantly different results. If the value is special, the authors should discuss and assess it. Otherwise, I recommend picking a round value, like 2 or 3, which would avoid distraction.

      (4) Some of the COT plots/information were difficult to interpret because the correspondence of beneficial with the mathematical sign was changing. For example, DeltaCOT as introduced on p. 5 is such that negative indicates bad energetics as compared to a solo swimmer. But elsewhere, lower or more negative COT is good in terms of savings. Given the many plots, large amounts of data, and many quantities being assessed, the paper needs a highly uniform presentation to aid the reader.

      (5) I didn't understand the value of the "flow agreement parameter," and I didn't understand the authors' interpretation of its significance. Firstly, it would help if this and all other quantities were given explicit definitions as complete equations (including normalization). As I understand it, the quantity indicates the match of the flow velocity at some location with the flapping velocity of a "ghost swimmer" at that location. This does not seem to be exactly relevant to the equilibrium locations. In particular, if the match were perfect, then the swimmer would generate no relative flow and thus no thrust, meaning such a location could not be an equilibrium. So, some degree of mismatch seems necessary. I believe such a mismatch is indeed present, but the plots such as those in Figure 4 may disguise the effect. The color bar is saturated to the point of essentially being three tones (blue, white, red), so we cannot see that the observed equilibria are likely between the max and min values of this parameter.

      (6) More generally, and related to the above, I am favorable towards the authors' attempts to find approximate flow metrics that could be used to predict the equilibrium positions and their stability, but I think the reasoning needs to be more solid. It seems the authors are seeking a parameter that can indicate equilibrium and another that can indicate stability. Can they clearly lay out the motivation behind any proposed metrics, and clearly present complete equations for their definitions? Further, is there a related power metric that can be appropriately defined and which proves to be useful?

      (7) Why do the authors not carry out CFD simulations on the larger groups? Some explanations should be given, or some corresponding CFD simulations should be carried out. It would be interesting if CFD simulations were done and included, especially for the in-line case of many swimmers. This is because the results seem to be quite nuanced and dependent on many-body effects beyond nearest-neighbor interactions. It would certainly be comforting to see something similar happen in CFD.

      (8) Related to the above, the authors should discuss seemingly significant differences in their results for long in-line formations as compared to the CFD work of Peng et al. [48]. That work showed apparently stable groups for numbers of swimmers quite larger than that studied here. Why such a qualitatively different result, and how should we interpret these differences regarding the more general issue of the stability of tandem groups?

      (9) The authors seem to have all the tools needed to address the general question about how dynamically stable configurations relate to those that are energetically optimal. Are stable solutions optimal, or not? This would seem to have very important implications for animal groups, and the work addresses closely related topics but seems to miss the opportunity to give a definitive answer to this big question.

      (10) Time-delay particle model: This model seems to construct a simplified wake flow. But does the constructed flow satisfy basic properties that we demand of any flow, such as being divergence-free? If not, then the formulation may be troublesome.

    1. each person is seen as being rich in potential; as having power, dignity, and many, varied strengths.

      This is why I think it's important we recognize all kinds of strengths and celebrate the wins of every student, no matter how "small". Disabled or not, we all have different strengths that should be valued equally, but this is especially important for those with disabilities that may make traditional schooling or certain subjects more difficult. Students like that may thrive in more niche areas such as art, and I think it's especially important we uplift these students so they feel valued in a society that often uplifts certain skills over others.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1: 

      This is my first review of the article entitled "The canonical stopping network: Revisiting the role of the subcortex in response inhibition" by Isherwood and colleagues. This study is one in a series of excellent papers by the Forstmann group focusing on the ability of fMRI to reliably detect activity in small subcortical nuclei - in this case, specifically those purportedly involved in the hyper- and indirect inhibitory basal ganglia pathways. I have been very fond of this work for a long time, beginning with the demonstration of De Hollander, Forstmann et al. (HBM 2017) of the fact that 3T fMRI imaging (as well as many 7T imaging sequences) do not afford sufficient signal to noise ratio to reliably image these small subcortical nuclei. This work has done a lot to reshape my view of seminal past studies of subcortical activity during inhibitory control, including some that have several thousand citations.

      In the current study, the authors compiled five datasets that aimed to investigate neural activity associated with stopping an already initiated action, as operationalized in the classic stop-signal paradigm. Three of these datasets are taken from their own 7T investigations, and two are datasets from the Poldrack group, which used 3T fMRI.

      The authors make six chief points: 

      (1) There does not seem to be a measurable BOLD response in the purportedly critical subcortical areas in contrasts of successful stopping (SS) vs. going (GO), neither across datasets nor within each individual dataset. This includes the STN but also any other areas of the indirect and hyperdirect pathways.

      (2) The failed-stop (FS) vs. GO contrast is the only contrast showing substantial differences in those nodes.

      (3) The positive findings of STN (and other subcortical) activation during the SS vs. GO contrast could be due to the usage of inappropriate smoothing kernels.

      (4) The study demonstrates the utility of aggregating publicly available fMRI data from similar cognitive tasks. 

      (5) From the abstract: "The findings challenge previous functional magnetic resonance (fMRI) of the stop-signal task" 

      (6) and further: "suggest the need to ascribe a separate function to these networks." 

      I strongly and emphatically agree with points 1-5. However, I vehemently disagree with point 6, which appears to be the main thrust of the current paper, based on the discussion, abstract, and - not least - the title.

      To me, this paper essentially shows that fMRI is ill-suited to study the subcortex in the specific context of the stop-signal task. That is not just because of the issues of subcortical small-volume SNR (the main topic of this and related works by this outstanding group), but also because of its limited temporal resolution (which is unacknowledged, but especially impactful in the context of the stop-signal task). I'll expand on what I mean in the following.

      First, the authors are underrepresenting the non-fMRI evidence in favor of the involvement of the subthalamic nucleus (STN) and the basal ganglia more generally in stopping actions. 

      - There are many more intracranial local field potential recording studies that show increased STN LFP (or even single-unit) activity in the SS vs. FS and SS vs. GO contrast than listed, which come from at least seven different labs. Here's a (likely non-exhaustive) list of studies that come to mind:

      Ray et al., NeuroImage 2012 <br /> Alegre et al., Experimental Brain Research 2013 <br /> Benis et al., NeuroImage 2014 <br /> Wessel et al., Movement Disorders 2016 <br /> Benis et al., Cortex 2016 <br /> Fischer et al., eLife 2017 <br /> Ghahremani et al., Brain and Language 2018 <br /> Chen et al., Neuron 2020 <br /> Mosher et al., Neuron 2021 <br /> Diesburg et al., eLife 2021 

      - Similarly, there is much more evidence than cited that causally influencing STN via deep-brain stimulation also influences action-stopping. Again, the following list is probably incomplete: 

      Van den Wildenberg et al., JoCN 2006 <br /> Ray et al., Neuropsychologia 2009 <br /> Hershey et al., Brain 2010 <br /> Swann et al., JNeuro 2011 <br /> Mirabella et al., Cerebral Cortex 2012 <br /> Obeso et al., Exp. Brain Res. 2013 <br /> Georgiev et al., Exp Br Res 2016 <br /> Lofredi et al., Brain 2021 <br /> van den Wildenberg et al, Behav Brain Res 2021 <br /> Wessel et al., Current Biology 2022 

      - Moreover, evidence from non-human animals similarly suggests critical STN involvement in action stopping, e.g.: 

      Eagle et al., Cerebral Cortex 2008 <br /> Schmidt et al., Nature Neuroscience 2013 <br /> Fife et al., eLife 2017 <br /> Anderson et al., Brain Res 2020 

      Together, studies like these provide either causal evidence for STN involvement via direct electrical stimulation of the nucleus or provide direct recordings of its local field potential activity during stopping. This is not to mention the extensive evidence for the involvement of the STN - and the indirect and hyperdirect pathways in general - in motor inhibition more broadly, perhaps best illustrated by their damage leading to (hemi)ballism. 

      Hence, I cannot agree with the idea that the current set of findings "suggest the need to ascribe a separate function to these networks", as suggested in the abstract and further explicated in the discussion of the current paper. For this to be the case, we would need to disregard more than a decade's worth of direct recording studies of the STN in favor of a remote measurement of the BOLD response using (provably) sub ideal imaging parameters. There are myriads of explanations of why fMRI may not be able to reveal a potential ground-truth difference in STN activity between the SS and FS/GO conditions, beginning with the simple proposition that it may not afford sufficient SNR, or that perhaps subcortical BOLD is not tightly related to the type of neurophysiological activity that distinguishes these conditions (in the purported case of the stop-signal task, specifically the beta band). But essentially, this paper shows that a specific lens into subcortical activity is likely broken, but then also suggests dismissing existing evidence from superior lenses in favor of the findings from the 'broken' lens. That doesn't make much sense to me.

      Second, there is actually another substantial reason why fMRI may indeed be unsuitable to study STN activity, specifically in the stop-signal paradigm: its limited time resolution. The sequence of subcortical processes on each specific trial type in the stop-signal task is purportedly as follows: at baseline, the basal ganglia exert inhibition on the motor system. During motor initiation, this inhibition is lifted via direct pathway innervation. This is when the three trial types start diverging. When actions then have to be rapidly cancelled (SS and FS), cortical regions signal to STN via the hyperdirect pathway that inhibition has to be rapidly reinstated (see Chen, Starr et al., Neuron 2020 for direct evidence for such a monosynaptic hyperdirect pathway, the speed of which directly predicts SSRT). Hence, inhibition is reinstated (too late in the case of FS trials, but early enough in SS trials, see recordings from the BG in Schmidt, Berke et al., Nature Neuroscience 2013; and Diesburg, Wessel et al., eLife 2021). 

      Hence, according to this prevailing model, all three trial types involve a sequence of STN activation (initial inhibition), STN deactivation (disinhibition during GO), and STN reactivation (reinstantiation of inhibition during the response via the hyperdirect pathway on SS/FS trials, reinstantiation of inhibition via the indirect pathway after the response on GO trials). What distinguishes the trial types during this period is chiefly the relative timing of the inhibitory process (earliest on SS trials, slightly later on FS trials, latest on GO trials). However, these temporal differences play out on a level of hundreds of milliseconds, and in all three cases, processing concludes well under a second overall. To fMRI, given its limited time resolution, these activations are bound to look quite similar. 

      Lastly, further building on this logic, it's not surprising that FS trials yield increased activity compared to SS and GO trials. That's because FS trials are errors, which are known to activate the STN (Cavanagh et al., JoCN 2014; Siegert et al. Cortex 2014) and afford additional inhibition of the motor system after their occurrence (Guan et al., JNeuro 2022). Again, fMRI will likely conflate this activity with the abovementioned sequence, resulting in a summation of activity and the highest level of BOLD for FS trials. 

      In sum, I believe this study has a lot of merit in demonstrating that fMRI is ill-suited to study the subcortex during the SST, but I cannot agree that it warrants any reappreciation of the subcortex's role in stopping, which are not chiefly based on fMRI evidence. 

      We would like to thank reviewer 1 for their insightful and helpful comments. We have responded point-by-point below and will give an overview of how we reframed the paper here.  

      We agree that there is good evidence from other sources for the presence of the canonical stopping network (indirect and hyperdirect) during action cancellation, and that this should be reflected more in the paper. However, we do not believe that a lack of evidence for this network during the SST makes fMRI ill-suited for studying this task, or other tasks that have neural processes occurring in quick succession. What we believe the activation patterns of fMRI reflect during this task, is the large of amount of activation caused by failed stops. That is, that the role of the STN in error processing may be more pronounced that its role in action cancellation. Due to the replicability of fMRI results, especially at higher field strengths, we believe the activation profile of failed stop trials reflects a paramount role for the STN in error processing. Therefore, while we agree we do not provide evidence against the role of the STN in action cancellation, we do provide evidence that our outlook on subcortical activation during different trial types of this task should be revisited. We have reframed the article to reflect this, and discuss points such as fMRI reliability, validity and the complex overlapping of cognitive processes in the SST in the discussion. Please see all changes to the article indicated by red text.

      A few other points: 

      - As I said before, this team's previous work has done a lot to convince me that 3T fMRI is unsuitable to study the STN. As such, it would have been nice to see a combination of the subsamples of the study that DID use imaging protocols and field strengths suitable to actually study this node. This is especially true since the second 3T sample (and arguably, the Isherwood_7T sample) does not afford a lot of trials per subject, to begin with.

      Unfortunately, this study already comprises of the only 7T open access datasets available for the SST. Therefore, unless we combined only the deHollander_7T and Miletic_7T subsamples there is no additional analysis we can do for this right now. While looking at just the sub samples that were 7T and had >300 trials would be interesting, based on the new framing of the paper we do not believe it adds to the study, as the sub samples still lack the temporal resolution seemingly required for looking at the processes in the SST.

      - What was the GLM analysis time-locked to on SS and FS trials? The stop-signal or the GO-signal? 

      SS and FS trials were time-locked to the GO signal as this is standard practice. The main reason for this is that we use contrasts to interpret differences in activation patterns between conditions. By time-locking the FS and SS trials to the stop signal, we are contrasting events at different time points, and therefore different stages of processing, which introduces its own sources of error. We agree with the reviewer, however, that a separate analysis with time-locking on the stop-signal has its own merit, and now include results in the supplementary material where the FS and SS trials are time-locked to the stop signal as well.

      - Why was SSRT calculated using the outdated mean method? 

      We originally calculated SSRT using the mean method as this was how it was reported in the oldest of the aggregated studies. We have now re-calculated the SSRTs using the integration method with go omission replacement and thank the reviewer for pointing this out. Please see response to comment 3.

      - The authors chose 3.1 as a z-score to "ensure conservatism", but since they are essentially trying to prove the null hypothesis that there is no increased STN activity on SS trials, I would suggest erring on the side of a more lenient threshold to avoid type-2 error. 

      We have used minimum FDR-corrected thresholds for each contrast now, instead of using a blanket conservative threshold of 3.1 over all contrasts. The new thresholds for each contrast are shown in text. Please see below (page 12):

      “The thresholds for each contrast are as follows: 3.01 for FS > GO, 2.26 for FS > SS and 3.1 for SS > GO.”

      - The authors state that "The results presented here add to a growing literature exposing inconsistencies in our understanding of the networks underlying successful response inhibition". It would be helpful if the authors cited these studies and what those inconsistencies are. 

      We thank reviewer 1 for their detailed and thorough evaluation of our paper. Overall, we agree that there is substantial direct and indirect evidence for the involvement of the cortico-basal-ganglia pathways in response inhibition. We have taken the vast constructive criticism on board and agree with the reviewer that the paper should be reframed. We would like to thank the reviewer for the thoroughness of their helpful comments aiding the revising of the paper.

      (1) I would suggest reframing the study, abstract, discussion, and title to reflect the fact that the study shows that fMRI is unsuitable to study subcortical activity in the SST, rather than the fact that we need to question the subcortical model of inhibition, given the reasons in my public review.

      We agree with the reviewer that the article should be reframed and not taken as direct evidence against the large sum of literature pointing towards the involvement of the cortico-basal-ganglia pathway in response inhibition. We have significantly rewritten the article in light of this.

      (2) I suggest combining the datasets that provide the best imaging parameters and then analyzing the subcortical ROIs with a more lenient threshold and with regressors time-locked to the stop-signals (if that's not already the case). This would make the claim of a null finding much more impactful. Some sort of power analysis and/or Bayes factor analysis of evidence for the null would also be appreciated. 

      Instead of using a blanket conservative threshold of 3.1, we instead used only FDR-corrected thresholds. The threshold level is therefore different for each contrast and noted in the figures. We have also added supplementary figures including the group-level SPMs and ROI analyses when the FS and SS trials were time-locked to the stop signal instead of the GO signal (Supplementary Figs 4 & 5). But as mentioned above, due to the difference in time points when contrasting, we believe that time-locking to the GO signal for all trial types makes more sense for the main analysis.

      We have now also computed BFs on the first level ROI beta estimates for all contrasts using the BayesFactor package as implemented in R. We add the following section to the methods and updated the results section accordingly (page 8):

      “In addition to the frequentist analysis we also opted to compute Bayes Factors (BFs) for each contrast per ROI per hemisphere. To do this, we extracted the beta weights for each individual trial type from our first level model. We then compared the beta weights from each trial type to one another using the ‘BayesFactor’ package as implement in R (Morey & Rouder, 2015). We compared the full model comprising of trial type, dataset and subject as predictors to the null model comprising of only the dataset and subject as predictor. The datasets and subjects were modeled as random factors. We divided the resultant BFs from the full model by the null model to provide evidence for or against a significant difference in beta weights for each trial type. To interpret the BFs, we used a modified version of Jeffreys’ scale (Jeffreys, 1939; Lee & Wagenmakers, 2014).”

      (3) I suggest calculating SSRT using the integration method with the replacement of Go omissions, as per the most recent recommendation (Verbruggen et al., eLife 2019).

      We agree we should have used a more optimal method for SSRT estimation. We have replaced our original estimations with that of the integration method with go omissions replacement, as suggested and adapted the results in table 3.

      We have also replaced text in the methods sections to reflect this (page 5):

      “For each participant, the SSRT was calculated using the mean method, estimated by subtracting the mean SSD from median go RT (Aron & Poldrack, 2006; Logan & Cowan, 1984).”

      Now reads:

      “For each participant, the SSRT was calculated using the integration method with replacement of go omissions (Verbruggen et al., 2019), estimated by integrating the RT distribution and calculating the point at which the integral equals p(respond|signal). The completion time of the stop process aligns with the nth RT, where n equals the number of RTs in the RT distribution of go trials multiplied by the probability of responding to a signal.”

      Reviewer #2:

      This work aggregates data across 5 openly available stopping studies (3 at 7 tesla and 2 at 3 tesla) to evaluate activity patterns across the common contrasts of Failed Stop (FS) > Go, FS > stop success (SS), and SS > Go. Previous work has implicated a set of regions that tend to be positively active in one or more of these contrasts, including the bilateral inferior frontal gyrus, preSMA, and multiple basal ganglia structures. However, the authors argue that upon closer examination, many previous papers have not found subcortical structures to be more active on SS than FS trials, bringing into question whether they play an essential role in (successful) inhibition. In order to evaluate this with more data and power, the authors aggregate across five datasets and find many areas that are *more* active for FS than SS, specifically bilateral preSMA, caudate, GPE, thalamus, and VTA, and unilateral M1, GPi, putamen, SN, and STN. They argue that this brings into question the role of these areas in inhibition, based upon the assumption that areas involved in inhibition should be more active on successful stop than failed stop trials, not the opposite as they observed. 

      As an empirical result, I believe that the results are robust, but this work does not attempt a new theoretical synthesis of the neuro-cognitive mechanisms of stopping. Specifically, if these many areas are more active on failed stop than successful stop trials, and (at least some of) these areas are situated in pathways that are traditionally assumed to instantiate response inhibition like the hyperdirect pathway, then what function are these areas/pathways involved in? I believe that this work would make a larger impact if the author endeavored to synthesize these results into some kind of theoretical framework for how stopping is instantiated in the brain, even if that framework may be preliminary. 

      I also have one main concern about the analysis. The authors use the mean method for computing SSRT, but this has been shown to be more susceptible to distortion from RT slowing (Verbruggen, Chambers & Logan, 2013 Psych Sci), and goes against the consensus recommendation of using the integration with replacement method (Verbruggen et al., 2019). Therefore, I would strongly recommend replacing all mean SSRT estimates with estimates using the integration with replacement method. 

      I found the paper clearly written and empirically strong. As I mentioned in the public review, I believe that the main shortcoming is the lack of theoretical synthesis. I would encourage the authors to attempt to synthesize these results into some form of theoretical explanation. I would also encourage replacing the mean method with the integration with replacement method for computing SSRT. I also have the following specific comments and suggestions (in the approximate order in which they appear in the manuscript) that I hope can improve the manuscript: 

      We would like to thank reviewer 2 for their insightful and interesting comments. We have adapted our paper to reflect these comments. Please see direct responses to your comments below. We agree with the reviewer that some type of theoretical synthesis would help with the interpretability of the article. We have substantially reworked the discussion and included theoretical considerations behind the newer narrative. Please see all changes to the article indicated by red text.

      (1) The authors say "performance on successful stop trials is quantified by the stop signal reaction time". I don't think this is technically accurate. SSRT is a measure of the average latency of the stop process for all trials, not just for the trials in which subjects successfully stop. 

      Thank you for pointing this technically incorrect statement. We have replaced the above sentence with the following (page 1):

      “Inhibition performance in the SST as a whole is quantified by the stop signal reaction time (SSRT), which estimates the speed of the latent stopping process (Verbruggen et al., 2019).”

      (2) The authors say "few studies have detected differences in the BOLD response between FS and SS trials", but then do not cite any papers that detected differences until several sentences later (de Hollander et al., 2017; Isherwood et al., 2023; Miletic et al., 2020). If these are the only ones, and they only show greater FS than SS, then I think this point could be made more clearly and directly. 

      We have moved the citations to the correct place in the text to be clearer. We have also rephrased this part of the introduction to make the points more direct (page 2).

      “In the subcortex, functional evidence is relatively inconsistent. Some studies have found an increase in BOLD response in the STN in SS > GO contrasts (Aron & Poldrack, 2006; Coxon et al., 2016; Gaillard et al., 2020; Yoon et al., 2019), but others have failed to replicate this (Bloemendaal et al., 2016; Boehler et al., 2010; Chang et al., 2020; B. Xu et al., 2015). Moreover, some studies have actually found higher STN, SN and thalamic activation in failed stop trials, not successful ones (de Hollander et al., 2017; Isherwood et al., 2023; Miletić et al., 2020).

      (3) Unless I overlooked it, I don't believe that the author specified the criterion that any given subject is excluded based upon. Given some studies have significant exclusions (e.g., Poldrack_3T), I think being clear about how many subjects violated each criterion would be useful. 

      This is indeed interesting and important information to include. We have added the number of participants who were excluded for each criterion. Please see added text below (page 4):

      “Based on these criteria, no subjects were excluded from the Aron_3T dataset. 24 subjects were excluded from the Poldrack_3T dataset (3 based on criterion 1, 9 on criterion 2, 11 on criterion 3, and 8 on criterion 4). Three subjects were excluded from the deHollander_7T dataset (2 based on criterion 1 and 1 on criterion 2). Five subjects were excluded from the Isherwood_7T dataset (2 based on criterion 1, 1 on criterion 2, and 2 on criterion 4). Two subjects were excluded from the Miletic_7T dataset (1 based on criterion 2 and 1 on criterion 4). Note that some participants in the Poldrack_3T study failed to meet multiple inclusion criteria.”

      (4) The Method section included very exhaustive descriptions of the neuroimaging processing pipeline, which was appreciated. However, it seems that much of what is presented is not actually used in any of the analyses. For example, it seems that "functional data preprocessing" section may be fMRIPrep boilerplate, which again is fine, but I think it would help to clarify that much of the preprocessing was not used in any part of the analysis pipeline for any results. For example, at first blush, I thought the authors were using global signal regression, but after a more careful examination, I believe that they are only computing global signals but never using them. Similarly with tCompCor seemingly being computed but not used. If possible, I would recommend that the authors share code that instantiates their behavioral and neuroimaging analysis pipeline so that any confusion about what was actually done could be programmatically verified. At a minimum, I would recommend more clearly distinguishing the pipeline steps that actually went into any presented analyses.

      We thank the reviewer for finding this inconsistency. The methods section indeed uses the fMRIprep boilerplate text, which we included so to be as accurate as possible when describing the preprocessing steps taken. While we believe leaving the exact boilerplate text that fMRIprep gives us is the most accurate method to show our preprocessing, we have adapted some of the text to clarify which computations were not used in the subsequent analysis. As a side-note, for future reference, we’d like to add that the fmriprep authors expressly recommend users to report the boilerplate completely and unaltered, and as such, we believe this may become a recurring issue (page 7).

      “While many regressors were computed in the preprocessing of the fMRI data, not all were used in the subsequent analysis. The exact regressors used for the analysis can be found above. For example, tCompCor and global signals were calculated in our generic preprocessing pipeline but not part of the analysis. The code used for preprocessing and analysis can be found in the data and code availability statement.”

      (5) What does it mean for the Poldrack_3T to have N/A for SSD range? Please clarify. 

      Thank you for pointing out this omission. We had not yet found the possible SSD range for this study. We have replaced this value with the correct value (0 – 1000 ms).

      (6) The SSD range of 0-2000ms for deHollander_7T and Miletic_7T seems very high. Was this limit ever reached or even approached? SSD distributions could be a useful addition to the supplement. 

      Thank you for also bringing this mistake to light. We had accidentally placed the max trial duration in these fields instead of the max allowable SSD value. We have replaced the correct value (0 – 900 ms).

      (7) The author says "In addition, median go RTs did not correlate with mean SSRTs within datasets (Aron_3T: r = .411, p = .10, BF = 1.41; Poldrack_3T: r = .011, p = .91, BF = .23; deHollander_7T: r = -.30, p = .09, BF = 1.30; Isherwood_7T: r = .13, p = .65, BF = .57; Miletic_7T: r = .37, p = .19, BF = 1.02), indicating independence between the stop and go processes, an important assumption of the horse-race model (Logan & Cowan, 1984)." However, the independent race model assumes context independence (the finishing time of the go process is not affected by the presence of the stop process) and stochastic independence (the duration of the go and stop processes are independent on a given trial). This analysis does not seem to evaluate either of these forms of independence, as it correlates RT and SSRT across subjects, so it was unclear how this analysis evaluated either of the types of independence that are assumed by the independent race model. Please clarify or remove. 

      Thank you for this comment. We realize that this analysis indeed does not evaluate either context or stochastic independence and therefore we have removed this from the manuscript.

      (8) The RTs in Isherwood_7T are considerably slower than the other studies, even though the go stimulus+response is the same (very simple) stimulus-response mapping from arrows to button presses. Is there any difference in procedure or stimuli that might explain this difference? It is the only study with a visual stop signal, but to my knowledge, there is no work suggesting visual stop signals encourage more proactive slowing. If possible, I think a brief discussion of the unusually slow RTs in Isherwood_7T would be useful. 

      We have included the following text in the manuscript to reflect this observed difference in RT between the Isherwood_7T dataset and the other datasets (page 9).

      “Longer RTs were found in the Isherwood_7T dataset in comparison to the four other datasets. The only difference in procedure in the Isherwood_7T dataset is the use of a visual stop signal as opposed to an auditory stop signal. This RT difference is consistent with previous research, where auditory stop signals and visual go stimuli have been associated with faster RTs compared to unimodal visual presentation (Carrillo-de-la-Peña et al., 2019; Weber et al., 2024). The mean SSRTs and probability of stopping are within normal range, indicating that participants understood the task and responded in the expected manner.”

      (9) When the authors included both 3T and 7T data, I thought they were preparing to evaluate the effect of magnet strength on stop networks, but they didn't do this analysis. Is this because the authors believe there is insufficient power? It seems that this could be an interesting exploratory analysis that could improve the paper.

      We thank the reviewer for this interesting comment. As our dataset sample contains only two 3T and three 7T datasets we indeed believe there is insufficient power to warrant such an analysis. In addition, we wanted the focus of this paper to be how fMRI examines the SST in general, and not differences between acquisition methods. With a greater number of datasets with different imaging parameters (especially TE or resolution) in addition to field strength, we agree such an analysis would be interesting, although beyond the scope of this article.

      (10) The authors evaluate smoothing and it seems that the conclusion that they want to come to is that with a larger smoothing kernel, the results in the stop networks bleed into surrounding areas, producing false positive activity. However, in the absence of a ground truth of the true contributions of these areas, it seems that an alternative interpretation of the results is that the denser maps when using a larger smoothing kernel could be closer to "true" activation, with the maps using a smaller smoothing kernel missing some true activity. It seems worth entertaining these two possible interpretations for the smoothing results unless there is clear reason to conclude that the smoothed results are producing false positive activity. 

      We agree with the view of the reviewer on the interpretation of the smoothing results. We indeed cannot rule this out as a possible interpretation of the results, due to a lack of ground truth. We have added text to the article to reflect this view and discuss the types of errors we can expect for both smaller and larger smoothing kernels (page 15).

      “In the absence of a ground truth, we are not able to fully justify the use of either larger or smaller kernels to analyse such data. On the one hand, aberrantly large smoothing kernels could lead to false positives in activation profiles, due to bleeding of observed activation into surrounding tissues. On the other side, too little smoothing could lead to false negatives, missing some true activity in surrounding regions. While we cannot concretely validate either choice, it should be noted that there is lower spatial uncertainty in the subcortex compared to the cortex, due to the lower anatomical variability. False positives from smoothing spatially unmatched signal, are more likely than false negatives. It may be more prudent for studies to use a range of smoothing kernels, to assess the robustness of their fMRI activation profiles.”

    1. Reviewer #3 (Public Review):

      I remain enthusiastic about this study. The manuscript is well-written, logical, and conceptually clear. To my knowledge, no prior modeling study has tackled the question of 'why prepare before executing, why not just execute?' Prior studies have simply assumed, to emulate empirical findings, that preparatory inputs precede execution. They never asked why. The authors show that, when there are constraints on inputs, preparation becomes a natural strategy. In contrast, with no constraint on inputs, there is no need for preparation as one could get anything one liked just via the inputs during movement. For the sake of tractability, the authors use a simple magnitude constraint: the cost function punishes the integral of the squared inputs. Thus, if small inputs before movement can reduce the size of the inputs needed during movement, preparation is a good strategy. This occurs if (and only if) the network has strong dynamics (otherwise feeding it preparatory activity would not produce anything interesting). All of this is sensible and clarifying.

      As discussed in the prior round of reviews, the central constraint that the authors use is a mathematically tractable stand-in for a range of plausible (but often trickier to define and evaluate) constraints, such as simplicity of inputs (or inputs being things that other areas could provide). The manuscript now embraces this fact more explicitly, and also gives some results showing that other constraints (such as on the derivative of activity, which is one component of complexity) can have the same effect. The manuscript also now discusses and addresses a modest weakness of the previous manuscript: the preparatory activity in their simulations is often overly complex temporally, lacking the (rough) plateau typically seen for data. Depending on your point of view, this is simply 'window dressing', but from my perspective it was important to know that their approach could yield more realistic-looking preparatory activity. Both these additions (the new constraint, and the more realistic temporal profile of preparatory activity) are added simply as supplementary figures rather than in the main text, and are brought up only in the Discussion. At first this struck me as slightly odd, but in the end I think this is appropriate. These are really Discussion-type issues, and dealing with them there makes sense. The 'different constraints' issue in particular is deep, tricky to explore for technical reasons, and could thus support a small research program. I think it is fair to talk about it thoughtfully (as the Discussion now does) and then just mention some simple results.

      My remaining comments largely pertain to some subtle (but to me important) nuances at a few locations in the text. These should be easy for the authors to address, in whatever way they see fit.

      Specific comments:

      (1) The authors state the following on line 56: "For preparatory processes to avoid triggering premature movement, any pre-movement activity in the motor and dorsal pre-motor (PMd) cortices must carefully exclude those pyramidal tract neurons."<br /> This constraint is overly restrictive. PT neurons absolutely can change their activity during preparation in principle (and appear to do so in practice). The key constraint is looser: those changes should have no net effect on the muscles. E.g., if d is the vector of changes in PT neuron firing rates, and b is the vector of weights, then the constraint is that b'd = 0. d = 0 is one good way of doing this, but only one. Half the d's could go up and half could go down. Or they all go up, but half the b's are negative. Put differently, there is no reason the null space has to be upstream of the PT neurons. It could be partly, or entirely, downstream.<br /> In the end, this doesn't change the point the authors are making. It is still the case that d has to be structured to avoid causing muscle activity, which raises exactly the point the authors care about: why risk this unless preparation brings benefits? However, this point can be made with a more accurate motivation. This matters, because people often think that a null-space is a tricky thing to engineer, when really it is quite natural. With enough neurons, preparing in the null space is quite simple.

      (2) Line 167: 'near-autonomous internal dynamics in M1'.<br /> It would be good if such statements, early in the paper, could be modified to reflect the fact that the dynamics observed in M1 may depend on recurrence that is NOT purely internal to M1. A better phrase might be 'near-autonomous dynamics that can be observed in M1'. A similar point applies on line 13. This issue is handled very thoughtfully in the Discussion, starting on line 713. Obviously it is not sensible to also add multiple sentences making the same point early on. However, it is still worth phrasing things carefully, otherwise the reader may have the wrong impression up until the Discussion (i.e. they may think that both the authors, and prior studies, believe that all the relevant dynamics are internal to M1). If possible, it might also be worth adding one sentence, somewhere early, to keep readers from falling into this hole (and then being stuck there till the Discussion digs them out).

      (3) The authors make the point, starting on line 815, that transient (but strong) preparatory activity empirically occurs without a delay. They note that their model will do this but only if 'no delay' means 'no external delay'. For their model to prepare, there still needs to be an internal delay between when the first inputs arrive and when movement generating inputs arrive.

      This is not only a reasonable assumption, but is something that does indeed occur empirically. This can be seen in Figure 8c of Lara et al. Similarly, Kaufman et al. 2016 noted that "the sudden change in the CIS [the movement triggering event] occurred well after (~150 ms) the visual go cue... (~60 ms latency)" Behavioral experiments have also argued that internal movement-triggering events tend to be quite sluggish relative to the earliest they could be, causing RTs to be longer than they should be (Haith et al. Independence of Movement Preparation and Movement Initiation). Given this empirical support, the authors might wish to add a sentence indicating that the data tend to justify their assumption that the internal delay (separating the earliest response to sensory events from the events that actually cause movement to begin) never shrinks to zero.

      While on this topic, the Haith and Krakauer paper mentioned above good to cite because it does ponder the question of whether preparation is really necessary. By showing that they could get RTs to shrink considerably before behavior became inaccurate, they showed that people normally (when not pressured) use more preparation time than they really need. Given Lara et al, we know that preparation does always occur, but Haith and Krakauer were quite right that it can be very brief. This helped -- along with neural results -- change our view of preparation from something more cognitive that had to occur, so something more mechanical that was simply a good network strategy, which is indeed the authors current point. Working a discussion of this into the current paper may or may not make sense, but if there is a place where it is easy to cite, it would be appropriate.

    2. Author response:

      The following is the authors’ response to the original reviews.

      General response:

      We thank all the reviewers for their detailed reviews.

      All reviewers made a number of valuable comments, in particular by highlighting several points that would benefit from additional clarifications and discussion. We really appreciate the time and effort that went into the reviews. We have updated the paper to reflect the changes we have made in response to the reviewers' comments (largely by including more discussion regarding the model limitations and the effect of various modeling choices). We have also included several new supplementary figures (S7, S8, S9, S10) that provide further details of the model behavior, and show the effect of changing some of the terms in the cost. Below, we go through the individual comments, and highlight the places in which we have made changes to address the reviewers’ comments.

      Reviewer 1:

      Thank you for your review and pointing out multiple things to be discussed and clarified! Below, we go through the various limitations you pointed out and refer to the places where we have tried to address them.

      (1) It's important to keep in mind that this work involves simplified models of the motor system, and often the terminology for 'motor cortex' and 'models of motor cortex' are used interchangeably, which may mislead some readers. Similarly, the introduction fails in many cases to state what model system is being discussed (e.g. line 14, line 29, line 31), even though these span humans, monkeys, mice, and simulations, which all differ in crucial ways that cannot always be lumped together.

      That is a good point. We have clarified this in the text (Introduction and Discussion), to highlight the fact that our model isn’t necessarily meant to just capture M1. We have also updated the introduction to make it more clear which species the experiments which motivate our investigation were performed in.

      (2) At multiple points in the manuscript thalamic inputs during movement (in mice) is used as a motivation for examining the role of preparation. However, there are other more salient motivations, such as delayed sensory feedback from the limb and vision arriving in the motor cortex, as well as ongoing control signals from other areas such as the premotor cortex.

      Yes – the motivation for thalamic inputs came from the fact that those have specifically been shown to be necessary for accurate movement generation in mice. However, it is true that the inputs in our model are meant to capture any signals external to the dynamical system modeled, and as such are likely to represent a mixture of sensory signals, and feedback from other areas. We have clarified this in the Discussion, and have added this additional motivation in the Introduction.

      (3) Describing the main task in this work as a delayed reaching task is not justified without caveats (by the authors' own admission: line 687), since each network is optimized with a fixed delay period length. Although this is mentioned to the reader, it's not clear enough that the dynamics observed during the delay period will not resemble those in the motor cortex for typical delayed reaching tasks.

      Yes, we completely agree that the terminology might be confusing. While the task we are modeling is a delayed reaching task, it does differ from the usual setting since the network has knowledge of the delay period, and that is indeed a caveat of the model. We have added a brief paragraph just after the description of the optimal control objective to highlight this limitation.

      We have also performed additional simulations using two different variants of a model-predictive control approach that allow us to relax the assumption that the go-cue time is known in advance. We show that these modifications of the optimal controller yield results that remain consistent with our main conclusions, and can in fact in some settings lead to preparatory activity plateaus during the preparation epoch as often found in monkey M1 (e.g in Elsayed et al. 2016). We have modified the Discussion to explain these results and their limitations, which are summarized in a new Supplementary Figure (S9).

      (4) A number of simplifications in the model may have crucial consequences for interpretation.

      a) Even following the toy examples in Figure 4, all the models in Figure 5 are linear, which may limit the generalisability of the findings.

      While we agree that linear models may be too simplistic, much prior analyses of M1 data suggest that it is often good enough to capture key aspects of M1 dynamics; for example, the generative model underlying jPCA is linear, and Sussillo et al. (2015) showed that the internal activity of nonlinear RNN models trained to reproduce EMG data aligned best with M1 activity when heavily regularized; in this regime, the RNN dynamics were close to linear. Nevertheless, this linearity assumption is indeed convenient from a modeling viewpoint: the optimal control problem is more easily solved for linear network dynamics and the optimal trajectories are more consistent across networks. Indeed, we had originally attempted to perform the analyses of Figure 5 in the nonlinear setting, but found that while the results were overall similar to what we report in the linear regime, iLQR was occasionally trapped into local minimal, resulting in more variable results especially for inhibition-stabilized network in the strongly connected end of the spectrum. Finally, Figure 5 is primarily meant to explore to what extent motor preparation can be predicted from basic linear control-theoretic properties of the Jacobian of the dynamics; in this regard, it made sense to work with linear RNNs (for which the Jacobian is constant).

      b) Crucially, there is no delayed sensory feedback in the model from the plant. Although this simplification is in some ways a strength, this decision allows networks to avoid having to deal with delayed feedback, which is a known component of closed-loop motor control and of motor cortex inputs and will have a large impact on the control policy.

      This comment resonates well with Reviewer 3's remark regarding the autonomous nature (or not) of M1 during movement. Rather than thinking of our RNN models as anatomically confined models of M1 alone, we think of them as models of the dynamics which M1 implements possibly as part of a broader network involving “inter-area loops and (at some latency) sensory feedback”, and whose state appears to be near-fully decodable from M1 activity alone. We have added a paragraph of Discussion on this important point.

      (5) A key feature determining the usefulness of preparation is the direction of the readout dimension. However, all readouts had a similar structure (random Gaussian initialization). Therefore, it would be useful to have more discussion regarding how the structure of the output connectivity would affect preparation, since the motor cortex certainly does not follow this output scheme.

      We agree with this limitation of our model — indeed one key message of Figure 4 is that the degree of reliance on preparatory inputs depends strongly on how the dynamics align with the readout. However, this strong dependence is somewhat specific to low-dimensional models; in higher-dimensional models (most of our paper), one expects that any random readout matrix C will pick out activity dimensions in the RNN that are sufficiently aligned with the most controllable directions of the dynamics to encourage preparation.

      We did consider optimizing C away (which required differentiating through the iLQR optimizer, which is possible but very costly), but the question inevitably arises what exactly should C be optimized for, and under what constraints (e.g fixed norm or not). One possibility is to optimize C with respect to the same control objective that the control inputs are optimized for, and constrain its norm (otherwise, inputs to the M1 model, and its internal activity, could become arbitrarily small as C can grow to compensate). We performed this experiment (new Supplementary Figure S7) and obtained a similar preparation index; there was one notable difference, namely that the optimized readout modes led to greater observability compared to a random readout; thus, the same amount of “muscle energy” required for a given movement could now be produced by a smaller initial condition. In turn, this led to smaller control inputs, consistent with a lower control cost overall.

      Whilst we could have systematically optimized C away, we reasoned that (i) it is computationally expensive, and (ii) the way M1 affects downstream effectors is presumably “optimized” for much richer motor tasks than simple 2D reaching, such that optimizing C for a fixed set of simple reaches could lead to misleading conclusions. We therefore decided to stick with random readouts.

      Additional comments :

      (1) The choice of cost function seems very important. Is it? For example, penalising the square of u(t) may produce very different results than penalising the absolute value.

      Yes, the choice of cost function does affect the results, at least qualitatively. The absolute value of the inputs is a challenging cost to use, as iLQR relies on a local quadratic approximation of the cost function. However, we have included additional experiments in which we penalized the squared derivative of the inputs (Supplementary Figure S8; see also our response to Reviewer 3's suggestion on this topic), and we do see differences in the qualitative behavior of the model (though the main takeaway, i.e. the reliance on preparation, continues to hold). This is now referred to and discussed in the Discussion section.

      (2) In future work it would be useful to consider the role of spinal networks, which are known to contribute to preparation in some cases (e.g. Prut and Fetz, 1999).

      (3) The control signal magnitude is penalised, but not the output torque magnitude, which highlights the fact that control in the model is quite different from muscle control, where co-contraction would be a possibility and therefore a penalty of muscle activation would be necessary. Future work should consider the role of these differences in control policy.

      Thank you for pointing us to this reference! Regarding both of these concerns, we agree that the model could be greatly improved and made more realistic in future work (another avenue for this would be to consider a more realistic biophysical model, e.g. using the MotorNet library). We hope that the current Discussion, which highlights the various limitations of our modeling choices, makes it clear that a lot of these choices could easily be modified depending on the specific assumptions/investigation being performed.

      Reviewer 2:

      Thank you for your positive review! We very much agree with the limitations you pointed out, some of which overlapped with the comments of the other reviewers. We have done our best to address them through additional discussion and new supplementary figures. We briefly highlight below where those changes can be found.

      (1) Though the optimal control theory framework is ideal to determine inputs that minimize output error while regularizing the input norm, it however cannot easily account for some other varied types of objectives especially those that may lead to a complex optimization landscape. For instance, the reusability of parts of the circuit, sparse use of additional neurons when learning many movements, and ease of planning (especially under uncertainty about when to start the movement), may be alternative or additional reasons that could help explain the preparatory activity observed in the brain. It is interesting to note that inputs that optimize the objective chosen by the authors arguably lead to a trade-off in terms of other desirable objectives. Specifically, the inputs the authors derive are time-dependent, so a recurrent network would be needed to produce them and it may not be easy to interpolate between them to drive new movement variants. In addition, these inputs depend on the desired time of output and therefore make it difficult to plan, e.g. in circumstances when timing should be decided depending on sensory signals. Finally, these inputs are specific to the full movement chain that will unfold, so they do not permit reuse of the inputs e.g. in movement sequences of different orders.

      Yes, that is a good point! We have incorporated further Discussion related to this point. We have additionally included a new example in which we regularize the temporal complexity of the inputs (see also our response to Reviewer 3's suggestion on this topic), which leads to more slowly varying inputs, and may indeed represent a more realistic constraint and lead to simpler inputs that can more easily be interpolated between. We also agree that uncertainty about the upcoming go cue may play an important role in the strategy adopted by the animals. While we have not performed an extensive investigation of the topic, we have included a Supplementary Figure (S9) in which we used Model Predictive Control to investigate the effect of planning under uncertainty about the go cue arrival time. We hope that this will give the reader a better sense of what sort of model extensions are possible within our framework.

      (2) Relatedly, if the motor circuits were to balance different types of objectives, the activity and inputs occurring before each movement may be broken down into different categories that may each specialize into one objective. For instance, previous work (Kaufman et al. eNeuron 2016, Iganaki et al., Cell 2022, Zimnik and Churchland, Nature Neuroscience 2021) has suggested that inputs occurring before the movement could be broken down into preparatory inputs 'stricto sensu' - relating to the planned characteristics of the movement - and a trigger signal, relating to the transition from planning to execution - irrespective of whether the movement is internally timed or triggered by an external event. The current work does not address which type(s) of early input may be labeled as 'preparatory' or may be thought of as a part of 'planning' computations.

      Yes, our model does indeed treat inputs in a very general way, and does not distinguish between the different types of processes they may be composed of. This is partly because we do not explicitly model where the inputs come from, such that our inputs likely englobe multiple processes. We have added discussion related to this point.

      (3) While the authors rightly point out some similarities between the inputs that they derive and observed preparatory activity in the brain, notably during motor sequences, there are also some differences. For instance, while both the derived inputs and the data show two peaks during sequences, the data reproduced from Zimnik and Churchland show preparatory inputs that have a very asymmetric shape that really plummets before the start of the next movement, whereas the derived inputs have larger amplitude during the movement period - especially for the second movement of the sequence. In addition, the data show trigger-like signals before each of the two reaches. Finally, while the data show a very high correlation between the pattern of preparatory activity of the second reach in the double reach and compound reach conditions, the derived inputs appear to be more different between the two conditions. Note that the data would be consistent with separate planning of the two reaches even in the compound reach condition, as well as the re-use of the preparatory input between the compound and double reach conditions. Therefore, different motor sequence datasets - notably, those that would show even more coarticulation between submovements - may be more promising to find a tight match between the data and the author's inputs. Further analyses in these datasets could help determine whether the coarticulation could be due to simple filtering by the circuits and muscles downstream of M1, planning of movements with adjusted curvature to mitigate the work performed by the muscles while permitting some amount of re-use across different sequences, or - as suggested by the authors - inputs fully tailored to one specific movement sequence that maximize accuracy and minimize the M1 input magnitude.

      Regarding the exact shape of the occupancy plots, it is important to note that some of the more qualitative aspects (e.g the relative height of the two peaks) will change if we change the parameters of the cost function. Right now, we have chosen the parameters to ensure that both reaches would be performed at roughly the same speed (as a way to very loosely constrain the parameters based on the observed behavior). However, small changes to the hyperparameters can lead to changes in the model output (e.g one of the two consecutive reaches being performed using greater acceleration than the other), and since our biophysical model is fairly simple, changes in the behavior are directly reflected in the network activity. Essentially, what this means is that while the double occupancy is a consistent feature of the model, the exact shape of the peaks is more sensitive to hyperparameters, and we do not wish to draw any strong conclusions from them, given the simplicity of the biophysical model. However, we do agree that our model exhibits some differences with the data. As discussed above, we have included additional discussion regarding the potential existence of separate inputs for planning vs triggering the movement in the context of single reaches.

      Overall, we are excited about the suggestions made by the Reviewer here about using our approach to analyze other motor sequence datasets, but we think that in order to do this properly, one would need to adopt a more realistic musculo-skeletal model (such as one provided by MotorNet).

      (4) Though iLQR is a powerful optimization method to find inputs optimizing the author's cost function, it also has some limitations. First, given that it relies on a linearization of the dynamics at each timestep, it has a limited ability to leverage potential advantages of nonlinearities in the dynamics. Second, the iLQR algorithm is not a biologically plausible learning rule and therefore it might be difficult for the brain to learn to produce the inputs that it finds. It remains unclear whether using alternative algorithms with different limitations - for instance, using variants of BPTT to train a separate RNN to produce the inputs in question - could impact some of the results.

      We agree that our choice of iLQR has limitations: while it offers the advantage of convergence guarantees, it does indeed restrict the choice of cost function and dynamics that we can use. We have now included extensive discussion of how the modeling choices affect our results.

      We do not view the lack of biological plausibility of iLQR as an issue, as the results are agnostic to the algorithm used for optimization. However, we agree that any structure imposed on the inputs (e.g by enforcing them to be the output of a self-contained dynamical system) would likely alter the results. A potentially interesting extension of our model would be to do just what the reviewer suggested, and try to learn a network that can generate the optimal inputs. However, this is outside the scope of our investigation, as it would then lead to new questions (e.g what brain region would that other RNN represent?).

      (5) Under the objective considered by the authors, the amount of input occurring before the movement might be impacted by the presence of online sensory signals for closed-loop control. It is therefore an open question whether the objective and network characteristics suggested by the authors could also explain the presence of preparatory activity before e.g. grasping movements that are thought to be more sensory-driven (Meirhaeghe et al., Cell Reports 2023).

      It is true that we aren’t currently modeling sensory signals explicitly. However, some of the optimal inputs we infer may be capturing upstream information which could englobe some sensory information. This is currently unclear, and would likely depend on how exactly the model is specified. We have added new discussion to emphasize that our dynamics should not be understood as just representing M1, but more general circuits whose state can be decoded from M1.

      Reviewer #2 (Recommendations For The Authors):

      Additionally, thank you for pointing out various typos in the manuscript, we have fixed those!

      Reviewer 3:

      Thank you very much for your review, which makes a lot of very insightful points, and raises several interesting questions. In summary, we very much agree with the limitations you pointed out. In particular, the choice of input cost is something we had previously discussed, but we had found it challenging to decide on what a reasonable cost for “complexity” could be. Following your comment, we have however added a first attempt at penalizing “temporal complexity”, which shows promising behavior. We have only included those additional analyses as supplementary figures, and we have included new discussion, which hopefully highlights what we meant by the different model components, and how the model behavior may change as we vary some of our choices. We hope this can be informative for future models that may use a similar approach. Below, we highlight the changes that we have made to address your comments.

      The main limitation of the study is that it focuses exclusively on one specific constraint - magnitude - that could limit motor-cortex inputs. This isn't unreasonable, but other constraints are at least as likely, if less mathematically tractable. The basic results of this study will probably be robust with regard such issues - generally speaking, any constraint on what can be delivered during execution will favor the strategy of preparing - but this robustness cuts both ways. It isn't clear that the constraint used in the present study - minimizing upstream energy costs - is the one that really matters. Upstream areas are likely to be limited in a variety of ways, including the complexity of inputs they can deliver. Indeed, one generally assumes that there are things that motor cortex can do that upstream areas can't do, which is where the real limitations should come from. Yet in the interest of a tractable cost function, the authors have built a system where motor cortex actually doesn't do anything that couldn't be done equally well by its inputs. The system might actually be better off if motor cortex were removed. About the only thing that motor cortex appears to contribute is some amplification, which is 'good' from the standpoint of the cost function (inputs can be smaller) but hardly satisfying from a scientific standpoint.

      The use of a term that punishes the squared magnitude of control signals has a long history, both because it creates mathematical tractability and because it (somewhat) maps onto the idea that one should minimize the energy expended by muscles and the possibility of damaging them with large inputs. One could make a case that those things apply to neural activity as well, and while that isn't unreasonable, it is far from clear whether this is actually true (and if it were, why punish the square if you are concerned about ATP expenditure?). Even if neural activity magnitude an important cost, any costs should pertain not just to inputs but to motor cortex activity itself. I don't think the authors really wish to propose that squared input magnitude is the key thing to be regularized. Instead, this is simply an easily imposed constraint that is tractable and acts as a stand-in for other forms of regularization / other types of constraints. Put differently, if one could write down the 'true' cost function, it might contain a term related to squared magnitude, but other regularizing terms would by very likely to dominate. Using only squared magnitude is a reasonable way to get started, but there are also ways in which it appears to be limiting the results (see below).

      I would suggest that the study explore this topic a bit. Is it possible to use other forms of regularization? One appealing option is to constrain the complexity of inputs; a long-standing idea is that the role of motor cortex is to take relatively simple inputs and convert them to complex time-evolving inputs suitable for driving outputs. I realize that exploring this idea is not necessarily trivial. The right cost-function term is not clear (should it relate to low-dimensionality across conditions, or to smoothness across time?) and even if it were, it might not produce a convex cost function. Yet while exploring this possibility might be difficult, I think it is important for two reasons.

      First, this study is an elegant exploration of how preparation emerges due to constraints on inputs, but at present that exploration focuses exclusively on one constraint. Second, at present there are a variety of aspects of the model responses that appear somewhat unrealistic. I suspect most of these flow from the fact that while the magnitude of inputs is constrained, their complexity is not (they can control every motor cortex neuron at both low and high frequencies). Because inputs are not complexity-constrained, preparatory activity appears overly complex and never 'settles' into the plateaus that one often sees in data. To be fair, even in data these plateaus are often imperfect, but they are still a very noticeable feature in the response of many neurons. Furthermore, the top PCs usually contain a nice plateau. Yet we never get to see this in the present study. In part this is because the authors never simulate the situation of an unpredictable delay (more on this below) but it also seems to be because preparatory inputs are themselves strongly time-varying. More realistic forms of regularization would likely remedy this.

      That is a very good point, and it mirrors several concerns that we had in the past. While we did focus on the input norm for the sake of simplicity, and because it represents a very natural way to regularize our control solutions, we agree that a “complexity cost” may be better suited to models of brain circuits. We have addressed this in a supplementary investigation. We chose to focus on a cost that penalizes the temporal complexity of the inputs, as ||u(t+1) - u(t)||^2. Note that this required augmenting the state of the model, making the computations quite a bit slower; while it is doable if we only penalize the first temporal derivative, it would not scale well to higher orders.

      Interestingly, we did find that the activity in that setting was somewhat more realistic (see new Supplementary Figure S8), with more sustained inputs and plateauing activity. While we have kept the original model for most of the investigations, the somewhat more realistic nature of the results under that setting suggests that further exploration of penalties of that sort could represent a promising avenue to improve the model.

      We also found the idea of a cost that would ensure low-dimensionality of the inputs across conditions very interesting. However, it is challenging to investigate with iLQR as we perform the optimization separately for each condition; nevertheless, it could be investigated using a different optimizer.

      At present, it is also not clear whether preparation always occurs even with no delay. Given only magnitude-based regularization, it wouldn't necessarily have to be. The authors should perform a subspace-based analysis like that in Figure 6, but for different delay durations. I think it is critical to explore whether the model, like monkeys, uses preparation even for zero-delay trials. At present it might or might not. If not, it may be because of the lack of more realistic constraints on inputs. One might then either need to include more realistic constraints to induce zero-delay preparation, or propose that the brain basically never uses a zero delay (it always delays the internal go cue after the preparatory inputs) and that this is a mechanism separate from that being modeled.

      I agree with the authors that the present version of the model, where optimization knows the exact time of movement onset, produces a reasonably realistic timecourse of preparation when compared to data from self-paced movements. At the same time, most readers will want to see that the model can produce realistic looking preparatory activity when presented with an unpredictable delay. I realize this may be an optimization nightmare, but there are probably ways to trick the model into optimizing to move soon, but then forcing it to wait (which is actually what monkeys are probably doing). Doing so would allow the model to produce preparation under the circumstances where most studies have examined it. In some ways this is just window-dressing (showing people something in a format they are used to and can digest) but it is actually more than that, because it would show that the model can produce a reasonable plateau of sustained preparation. At present it isn't clear it can do this, for the reasons noted above. If it can't, regularizing complexity might help (and even if this can't be shown, it could be discussed).

      In summary, I found this to be a very strong study overall, with a conceptually timely message that was well-explained and nicely documented by thorough simulations. I think it is critical to perform the test, noted above, of examining preparatory subspace activity across a range of delay durations (including zero) to see whether preparation endures as it does empirically. I think the issue of a more realistic cost function is also important, both in terms of the conceptual message and in terms of inducing the model to produce more realistic activity. Conceptually it matters because I don't think the central message should be 'preparation reduces upstream ATP usage by allowing motor cortex to be an amplifier'. I think the central message the authors wish to convey is that constraints on inputs make preparation a good strategy. Many of those constraints likely relate to the fact that upstream areas can't do things that motor cortex can do (else you wouldn't need a motor cortex) and it would be good if regularization reflected that assumption. Furthermore, additional forms of regularization would likely improve the realism of model responses, in ways that matter both aesthetically and conceptually. Yet while I think this is an important issue, it is also a deep and tricky one, and I think the authors need considerable leeway in how they address it. Many of the cost-function terms one might want to use may be intractable. The authors may have to do what makes sense given technical limitations. If some things can't be done technically, they may need to be addressed in words or via some other sort of non-optimization-based simulation.

      Specific comments

      As noted above, it would be good to show that preparatory subspace activity occurs similarly across delay durations. It actually might not, at present. For a zero ms delay, the simple magnitude-based regularization may be insufficient to induce preparation. If so, then the authors would either have to argue that a zero delay is actually never used internally (which is a reasonable argument) or show that other forms of regularization can induce zero-delay preparation.

      Yes, that is a very interesting analysis to perform, which we had not considered before! When investigating this, we found that the zero-delay strategy does not rely on preparation in the same way as is seen in the monkeys. This seems to be a reflection of the fact that our “Go cue” corresponds to an “internal” go cue which would likely come after the true, “external go cue” – such that we would indeed never actually be in the zero delay setting. This is not something we had addressed (or really considered) before, although we had tried to ensure we referred to “delta prep” as the duration of the preparatory period but not necessarily the delay period. We have now included more discussion on this topic, as well as a new Supplementary Figure S10.

      I agree with the authors that prior modeling work was limited by assuming the inputs to M1, which meant that prior work couldn't address the deep issue (tackled here) of why there should be any preparatory inputs at all. At the same time, the ability to hand-select inputs did provide some advantages. A strong assumption of prior work is that the inputs are 'simple', such that motor cortex must perform meaningful computations to convert them to outputs. This matters because if inputs can be anything, then they can just be the final outputs themselves, and motor cortex would have no job to do. Thus, prior work tried to assume the simplest inputs possible to motor cortex that could still explain the data. Most likely this went too far in the 'simple' direction, yet aspects of the simplicity were important for endowing responses with realistic properties. One such property is a large condition-invariant response just before movement onset. This is a very robust aspect of the data, and is explained by the assumption of a simple trigger signal that conveys information about when to move but is otherwise invariant to condition. Note that this is an implicit form of regularization, and one very different from that used in the present study: the input is allowed to be large, but constrained to be simple. Preparatory inputs are similarly constrained to be simple in the sense that they carry only information about which condition should be executed, but otherwise have little temporal structure. Arguably this produces slightly too simple preparatory-period responses, but the present study appears to go too far in the opposite direction. I would suggest that the authors do what they can to address these issue via simulations and/or discussion. I think it is fine if the conclusion is that there exist many constraints that tend to favor preparation, and that regularizing magnitude is just one easy way of demonstrating that. Ideally, other constraints would be explored. But even if they can't be, there should be some discussion of what is missing - preparatory plateaus, a realistic condition-invariant signal tied to movement onset - under the present modeling assumptions.

      As described above, we have now included two additional figures. In the first one (S8, already discussed above), we used a temporal smoothness prior, and we indeed get slightly more realistic activity plateaus. In a second supplementary figure (S9), we have also considered using model predictive control (MPC) to optimize the inputs under an uncertain go cue arrival time. There, we found that removing the assumption that the delay period is known came with new challenges: in particular, it requires the specification of a “mental model” of when the Go cue will arrive. While it is reasonable to expect that monkeys will have a prior over the go time arrival cue that will be shaped by the design of the experiment, some assumptions must be made about the utility functions that should be used to weigh this prior. For instance, if we imagine that monkeys carry a model of the possible arrival time of the go cue that is updated online, they could nonetheless act differently based on this information, for instance by either preparing so as to be ready for the earliest go cue possible or alternatively to be ready for the average go cue. This will likely depend on the exact task design and reward/penalty structure. Here, we added simulations with those two cases (making simplifying assumptions to make the problem tractable/solvable using model predictive control), and found that the “earliest preparation” strategy gives rise to more realistic plateauing activity, while the model where planning is done for the “most likely go time” does not. We suspect that more realistic activity patterns could be obtained by e.g combining this framework with the temporal smoothness cost. However, the main point we wished to make with this new supplementary figure is that it is possible to model the task in a slightly more realistic way (although here it comes at the cost of additional model assumptions). We have now added more discussion related to those points. Note that we have kept our analyses on these new models to a minimum, as the main takeaway we wish to convey from them is that most components of the model could be modified/made more realistic. This would impact the qualitative behavior of the system and match to data but – in the examples we have so far considered – does not appear to modify the general strategy of networks relying on preparation.

      On line 161, and in a few other places, the authors cite prior work as arguing for "autonomous internal dynamics in M1". I think it is worth being careful here because most of that work specifically stated that the dynamics are likely not internal to M1, and presumably involve inter-area loops and (at some latency) sensory feedback. The real claim of such work is that one can observe most of the key state variables in M1, such that there are periods of time where the dynamics are reasonably approximated as autonomous from a mathematical standpoint. This means that you can estimate the state from M1, and then there is some function that predicts the future state. This formal definition of autonomous shouldn't be conflated with an anatomical definition.

      Yes, that is a good point, thank you for making it so clearly! Indeed, as previous work, we do not think of our “M1 dynamics” as being internal to M1, but they may instead include sensory feedback / inter-area loops, which we summarize into the connectivity, that we chose to have dynamics that qualitatively resemble data. We have now incorporated more discussion regarding what exactly the dynamics in our model represent.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment 

      Dasgupta and colleagues make a valuable contribution to the understanding how the guidance factor Sema7a promotes connections between mechanosensory hair cells and afferent neurons of the zebrafish lateral line system. The authors provide solid evidence that loss of Sema7a function results in fewer contacts between hair cells and afferents through comprehensive quantitative analysis. Additional work is needed to distinguish the effects of different isoforms of Sema7a to determine whether there are specific roles of secreted and membrane bound forms. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Dasguta et al. have dissected the role of Sema7a in fine tuning of a sensory microcircuit in the posterior lateral line organ of zebrafish. They attempt to also outline the different roles of a secreted verses membrane-bound form of Sema7a in this process. Using genetic perturbations and axonal network analysis, the authors show that loss of both Sema7a isoforms causes abnormal axon terminal structure with more bare terminals and fewer loops in contact with presynaptic sensory hair cells. Further, they show that loss of Sema7a causes decreased number and size of both the pre- and post-synapse. Finally, they show that overexpression of the secreted form of Sema7a specifically can elicit axon terminal outgrowth to an ectopic Sema7a expressing cell. Together, the analysis of Sema7a loss of function and overexpression on axon arbor structure is fairly thorough and revealed a novel role for Sema7a in axon terminal structure. However, the connection between different isoforms of Sema7a and the axon arborization needs to be substantiated. Furthermore, the effect of loss of Sema7a on the presynaptic cell is not ruled out as a contributing factor to the synaptic and axon structure phenotypes. These issues weaken the claims made by the authors including the statement that they have identified dual roles for the GPI-anchored verses secreted forms of Sema7a on synapse formation and as a chemoattractant for axon arborization respectively. 

      Reviewer #2 (Public Review):

      In this work, Dasgupta et al. investigates the role of Sema7a in the formation of peripheral sensory circuit in the lateral line system of zebrafish. They show that Sema7a protein is present during neuromast maturation and localized, in part, to the base of hair cells (HCs). This would be consistent with pre-synaptic Sema7a mediating formation and/or stabilization of the synapse. They use sema7a loss-of-function strain to show that lateral line sensory terminals display abnormal arborization. They provide highly quantitative analysis of the lateral line terminal arborization to show that a number of specific topological parameters are affected in mutants. Next, they ectopically express a secreted form of Sema7a to show that lateral line terminals can be ectopically attracted to the source. Finally, they also demonstrate that the synaptic assembly is impaired in the sema7a mutant. Overall, the data are of high quality and properly controlled. The availability of Sema7a antibody is a big plus, as it allows to address the endogenous protein localization as well to show the signal absence in the sema7a mutant. The quantification of the arbor topology should be useful to people in the field who are looking at the lateral line as well as other axonal terminals. I think some results are overinterpreted though. The authors state: "Our findings demonstrate that Sema7A functions both as a juxtracrine and as a secreted cue to pattern neural circuitry during sensory organ development." However, they have not actually demonstrated which isoform functions in HCs (also see comments below). In addition, they have to be careful in interpreting their topology analysis, as they cannot separate individual axons. Thus, such analysis can generate artifacts. They can perform additional experiments to address these issues or adjust their interpretations. 

      Reviewer #3 (Public Review):

      The data reported here demonstrate that Sema7a defines the local behavior of growing axons in the developing zebrafish lateral line. The analysis is sophisticated and convincingly demonstrates effects on axon growth and synapse architecture. Collectively, the findings point to the idea that the diffusible form of sema7a may influence how axons grow within the neuromast and that the GPI-linked form of sema7a may subsequently impact how synapses form, though additional work is needed to strongly link each form to its' proposed effect on circuit assembly. 

      The revised manuscript is significantly improved. The authors comprehensively and appropriately addressed most of the reviewers' concerns. In particular, they added evidence that hair cells express both Sema7A isoforms, showed that membrane bound Sema7A does not have long range effects on guidance, demonstrated how axons behave close to ectopic Sema7A, and analyzed other features of the hair cells that revealed no strong phenotypes. The authors also softened the language in many, but not all places. Overall, I am satisfied with the study as a whole. 

      Reviewer #4 (Public Review):

      This study provides direct evidence showing that Sema7a plays a role in the axon growth during the formation of peripheral sensory circuits in the lateral-line system of zebrafish. This is a valuable finding because the molecules for axon growth in hair-cell sensory systems are not well understood. The majority of the experimental evidence is convincing, and the analysis is rigorous. The evidence supporting Sema7a's juxtracrine vs. secreted role and involvement in synapse formation in hair cells is less conclusive. The study will be of interest to cell, molecular and developmental biologists, and sensory neuroscientists. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In their revised manuscript, Dasgupta et al. have provided further experiments to address the role of Sema7a (sec and GPI-anchored) in regulating axon guidance in the lateral line system. Specifically, the inclusion of the heat shock controls and FM labeling to show hair cell mechanotransduction were crucial to interpretation of the results. However, there are still concerns about the specificity of the results. My primary concern is if the change in axon patterning is specifically due to loss of Sema7a in the mutant hair cells. These animals are morphologically very abnormal and, in the rebuttal, the authors state that hair cell number is reduced. This is not quantified in the manuscript and should be included. 

      Thank you for this suggestion. We have included the data in the manuscript in lines 137-139, in Figure 2—figure supplement 1B, and in the source data for Figure 2 and Figure 2-figure supplements.

      If there is not a function for Sema7a in hair cells themselves, why is the number reduced? 

      The sema7a-/- homozygous mutants are not viable and they die by 6 dpf. The loss of Sema7A protein produce other developmental defects including brain edema and a curved body axis. We believe a slight but not significant decrease in hair cell number may arise from a minute developmental delay in the morphogenesis of the neuromast. We have accordingly quantified our data at three distinct developmental stages-at 2 dpf, 3 dpf, and 4 dpf-and have incorporated them in the revised manuscript.

      Additionally, FM data should be quantified and presented in animals without a transgene in the same excitation/emission spectra for clearer interpretation of the staining.

      We have quantified the intensities of labeling with FM 4-64 styryl dye from the control and the sema7a-/- mutant larvae and incorporated the data in lines 139-146, in Figure 2—figure supplement 1D, and in source data for Figure 2 and Figure 2-figure supplements. We Kept the transgenes to concurrently show the arborization phenotype, hair cell morphology, and the FM 4-64 incorporation between the genotypes. 

      Rescue analysis using the myo6d promotor would allow the authors to ensure that the axon deficits can be rescued by putting Sema7a back into the sensory hair cells. Transient transgenesis could be useful for this approach and would not require the creation of a stable line. This could be done with both forms of Sema7a allowing the true assessment of whether or not the secreted and GPI-anchored form have disparate functions as claimed in lines 418424. 

      Although we recognize the importance of the rescue of the sema7a-/- mutant phenotype with the sema7asec and the sema7aGPI transcripts, it is not possible for us to perform that experiment at the moment, for the first author will leave the lab next week.  However, he plans to continue work on this project as an independent investigator to dissect the individual roles of the transcript variants in specifying the pattern of sensory arborization, a project that includes generation of transcript-specific knockout animals and rescue experiments with stable transgenic fish lines. 

      Other concerns:

      (1) The timeline of the heat shock experiment is confusing to me and, therefore, it makes me question the specificity of those results. Based on the speed of axon outgrowth and the time necessary for transcription and translation after heat shock induction of the transgene, it is unclear to me how the axon growth defects could occur in the timeline provided. Imaging two hours after the start of the heat shock is very rapid and speaks to either an indirect effect of the transgenesis on the axon growth or a leaky promotor/induction paradigm. It is possible I am just misunderstanding the set up but, from what I could gather, the imaging is being done 2 hrs after the start of the heat shock. This should be clarified. 

      The axons of the zebrafish posterior lateral line migrate relatively fast. The pioneering axons migrate at around 120 μm/hour (Sato et. al., 2010) and the follower axons migrate at almost 30-80 μm/hour (Sato et. al., 2010). The heat-shock promoter that we have utilized, hsp70l, is highly effective in inducing gene expression and subsequent protein formation within 30 to 60 mins. We believe an hour of heat shock and an hour of incubation post heat shock is sufficient to induce directed axon migration to a distance that spans from 27 μm to 140 μm. 

      We strongly believe that the directed arborization of the sensory axons towards the Sema7Asec source is not due to an indirect effect of transgenesis or leaky promoter induction, as in all 18 of the injected but not heat-shocked control larvae we did not observe ectopic Sema7Asec expression, and no aberrant projection was formed from the sensory arbor network. We highlight this observation in lines 297-299 and in Figure 4E.

      Sato et. al., 2010: Single-cell analysis of somatotopic map formation in the zebrafish lateral line system. Developmental Dynamics 239:2058–2065, 2010.

      Similarly, it would help to clarify if t(0) in the figure is the onset of the heat shock or onset of imaging two hours after the heat shock is started. 

      The t=0 hour in the Figure 4I denotes the onset of imaging two hours after the heat shock began. We have clarified this in the manuscript in lines 1155-1156.

      (2) In the rebuttal, the line numbers cited do not match up with the appropriate text, I believe.

      We have corrected this and updated the manuscript.

      (3) Some of the supplemental figures are not mentioned in the text, or I could not find them. For example: Figure 1 supplement 2J. 

      Thank you for pointing this. We have corrected the manuscript, and the new information is added in line 114.  

      (4) Table 1 statistics: were these adjusted for multiple comparisons using a bonferroni correction or something similar? This is necessary for statistical significance to be meaningful. 

      We did not adjust the p-values for multiple comparisons because the values correspond to only three or four statistical tests per experiment, strongly indicating the unlikelihood of erroneous significance due solely to multiple tests.

      (5) Figure 1I and 1-S3 - The legend states a positive correlation between axonal signal and sema7A signal. Correlations are 0.5, 0.6, and 0.4 (2,3, 4dpf). This is not a convincing positive correlation. At best this is no to a very weak positive correlation. 

      In lines 122-126 we mention that the basal association of the sensory arbors shows a positive correlation with Sema7A accumulation. We never emphasize on the strength of the correlation. However, a consistent positive correlation at three different developmental stages suggests that progressive Sema7A accumulation at the base of the hair cells may guide the sensory arbors to increasingly associate themselves with the hair cells.    

      Reviewer #2 (Recommendations For The Authors):

      I am a bit disappointed that the authors elected not to experimentally address the issue raised by all reviewers: whether the secreted or membrane bound isoform is active in hair cells. They rather decided to change their interpretation in the text. It is fine, given the eLife review structure. However, that would make the manuscript much stronger. Other issues were adequately addressed through textual changes as well. 

      Although we recognize the importance of the rescue of the sema7a-/- mutant phenotype with the sema7asec and the sema7aGPI transcripts, it is not possible for us to perform that experiment at the moment, for the first author will leave the lab next week.  However, he plans to continue work on this project as an independent investigator to dissect the individual roles of the transcript variants in specifying the pattern of sensory arborization, a project that includes generation of transcript-specific knockout animals and rescue experiments with stable transgenic fish lines. 

      Reviewer #3 (Recommendations For The Authors):

      Overall, I am satisfied with the study as a whole and just have a few minor comments that remain to be addressed. 

      (1) Although the authors say that they added appropriate no plasmid/heatshock-only and plasmid-only/no heatshock controls, these results need to be presented more clearly, as they are separated in the paper and only one was quantified (i.e. 100% of embryos showed no defect). Please just make it clear that no defects were observed in either control for either experiment (both secreted and membrane bound ectopic expression). 

      We have clearly stated this information in lines 297-299 and 343-345.

      (2) Please add a compass to Fig. 1A to indicate the orientation of the neuromast. It would also be helpful to add labels for developmental ages to all of the figures, rather than making the reader look it up in the legend. 

      We have updated the Figure 1A and the corresponding figure legend in lines 882883 . We have denoted the larval age in the figure legends to keep the individual images uncluttered.  

      (3) For the RT-PCR experiments in Figure 1, no negative control was included to show that supporting cell or neuronal genes are not detected in the purified hair cells and v.v. that neither isoform is detected in supporting cells or neurons. I ask only because there is a lot of immune-signal outside of the hair cells and I am curious whether that is secreted or might come from other cell types. For neurons and supporting cells, simply demonstrating absence of Sema7a overall would suffice. 

      We have utilized the transgenic line Tg(myo6b:actb1-EGFP) that expresses the fluorophore GFP specifically in the hair cells of the neuromast. Unfortunately, we do not possess a transgenic line that reliably and specifically labels the support cells in the neuromast. Hence, in our sorting experiment the GFP-negative cells that are collected from the trunk segments of the larvae contain all the non-hair cells including epidermal cells, neuronal cells, and immune cells etc. Such a mixture of varied cellular identity may not serve as a reliable negative control. 

      In Figure 7, we have plotted the normalized expression values of the sema7a gene in the neuromast. The plot clearly depicts that the source of Sema7A is the young and the mature hair cells, not the support cells. We further confirm this observation by

      immunostaining where the Sema7A signal is highly restricted to the hair cells and not in any other cell in the neuromast (Figure 1E). Immunostaining further demonstrates that the lateral line sensory arbors also do not produce the Sema7A protein (Figure 1H; Video 1).

      We agree with the reviewer that there are diverse immune cells, including macrophages in and around the neuromast. These macrophages are dynamic and possess highly ramified structure (Denans et. al., 2022). In all our Sema7A immunostainings, we never observed structures that resemble macrophages. Albeit we cannot confirm that Sema7A is not expressed in a distant immune cell, but we highly doubt that signal coming from immune cells is impacting hair cell innervation by the sensory arbors during homeostatic development.

      Denans et. al., 2022: Nature Communications volume 13, Article number: 5356 (2022).

      (4) In Figure 1, Supplement 4, I do not see the immunogen labeled in blue. 

      We have corrected the figure legend. The immunogenic region of the Sema7A protein is now clearly denoted in the figure legend of Figure 1—figure supplement 4.

      (5) In Figure 2, please add a control image as requested, as that enables direct comparison. There is ample room in the figure. 

      We have updated the Figure 2 and made the suggested change.

      (6) In Figure 2, Supplement 1, the FM4-64 data are not presented in a quantified fashion. Please report at least how many embryos showed reliable uptake and preferably how many hair cells per embryo showed reliable uptake. 

      We have quantified the FM 4-64 intensities in control and sema7a-/- mutant larvae. The new data is added to the manuscript in lines 142-146, 577-579 , and in Figure 2—figure supplement 1D.

      (7) In Figure 3, there seems to be a typo in the figure legend: "mutants in the same larvae" does not make sense to me. 

      We have corrected the error. The modified statement is represented in lines 10671068.

      (8) The text should refer more explicitly to the statistical tests reported in Table 1, i.e. as the results are presented. 

      In lines 1105 and 1109, we clearly state the statistical tests that were performed.

      (9) In Figure 6, Supplement 1, please show the raw data points not just the bar graphs

      We have updated the Figure 6—figure supplement 1.

      (10) Minor point: the authors state that they addressed the distance over which secreted Sema7A may act, but this was not evident to me in the text. Please make this finding clearer.

      We have clarified this information in lines 310-311.

      (11) Finally, the discussion contains a statement that is not supported by the data: "We have discovered dual modes of Sema7A function in vivo." They have discovered evidence that there are two isoforms, that loss of both disrupts connectivity, and that overexpression of only the secreted form can elicit growth from a distance. However, there is no direct evidence that the membrane-bound form is responsible for local effects. It is formally possible still that the phenotypes are a result of dual roles for the secreted form. It is clear that another manuscript is forthcoming that will expand on the role of the transmembrane form, but for this manuscript, the authors should make firm conclusions only about the data presented herein.

      Thank you for this suggestion. We have modified the manuscript in lines 425-434.

      Reviewer #4 (Recommendations For The Authors):

      The authors have made significant changes to the manuscript based on the comments of the reviewers. It is now suitable for publication.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This manuscript represents a cleanly designed experiment for assessing biological motion processing in children (mean age = 9) with and without ADHD. The group differences concerning accuracy in global and local motion processing abilities are solid, but the analyses suggesting dissociable relationships between global and local processing and social skills, age, and IQ are inconclusive. The results are useful in terms of understanding ADHD and the ontogenesis of different components of the processing of biological motion.

      We thank the editors and reviewers for their valuable feedback and constructive comments. We have carefully considered each point raised by the reviewers and made the necessary revisions to the manuscript. Regarding the relationships between global and local BM processing, the accumulated evidence from previous studies has converged on the dissociation of the two BM components, e.g., while global BM processing is susceptible to learning and practice, local BM processing does not show a learning trend (Chang and Troje, 2009; Grossman et al., 2004), and the brain activations in response to local and global BM cues are different (Chang et al., 2018; Duarte et al., 2022). Nevertheless, we concurred with reviewers that the evidence for such dissociation from the current study by itself is not strong enough. Therefore, we have toned down on this point and no longer claimed the dissociation (including the title). Based on the current results, we focused our discussion on the different aspects of BM processing in children with and without ADHD.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The paper presents a nice study investigating the impairments of biological motion perception in individuals with ADHD in comparison with neurotypical controls. Motivated by the idea that there is a relationship between biological motion perception and social capabilities, the authors investigated the impairments of local and global (holistic) biological motion perception, the diagnosis status, and several additional behavioral variables that are affected in ADHS (IQ, social responsiveness, and attention / impulsivity). As well local as global biological motion perception is impaired in ADHD individuals. In addition, the study demonstrates a significant correlation between local biological motion perception skills and the social responsiveness score in the ADHD group, but not in controls. A path analysis in the ADHD group suggests that general performance in biological motion perception is influenced mainly by global biological motion perception performance and attentional and perceptual reasoning skills.

      Strengths:

      It is true that there exists not much work on biological motion perception and ADHD. Therefore, the presented study contributes an interesting new result to the biological motion literature, and adds potentially also new behavioral markers for this clinical group. The design of the study is straightforward and technically sound, and the drawn conclusions are supported by the presented results.

      Thanks for this positive assessment of our work.

      Weaknesses:

      Some of the claims about the relationship between genetic factors and ADHD and the components of biological motion processing have to remain speculative at this point because genetic influences were not explicitly tested in this paper. Specifically, the hypothesis that the perception of human social interaction is critically based on a local mechanism for the detection of asymmetry in foot trajectories of walkers (this is what 'BL-local' really measures), or on the detection of live agents in cluttered scenes seems not very plausible.

      Thanks for these comments. We agree that the relationship between genetic factors and BM perception remains to be further examined, as we did not test the genetic influences in this study. We have deleted relavant discussion about genetics. Based on our results, we discuss the possible mechanisms behind the relationship between local BM processing and social interaction in the revised manuscript as follows:

      “As mentioned above, we found a significant negative correlation between the SRS total score and the accuracy of local BM processing, specifically in the ADHD group. This could be due to decreased visual input related to atypical local BM processing, which further impairs global BM processing. According to the two-process theory of biological motion processing61, local BM cues guide visual attention towards BM stimuli55,62. Consequently, the visual input of BM stimuli increases, facilitating the development of the ability to process global BM cues through learning21,63. The latter is a prerequisite for attributing intentions to others and facilitating social interactions with other individuals20,64,65. Thus, atypical local BM processing may contribute to impaired social interaction through altered visual inputs. Further empirical studies are required to confirm these hypotheses.” (lines 417 - 428)

      Based on my last comments, now the discussion has been changed in a way that tries to justify the speculative claims by citing a lot of other speculative papers, which does not really address the problem. For example, the fact that chicks walk towards biological motion stimuli is interesting. To derive that this verifies a fundamental mechanism in human biological motion processing is extremely questionable, given that birds do not even have a cortex. Taking the argumentation of the authors serious, one would have to assume that the 'Local BM' mechanism is probably located in the mesencephalon in humans, and then would have to interact in some way with social perception differences of ADHD children. To me all this seems to make very strong (over-)claims. I suggest providing a much more modest interpretation of the interesting experimental result, based on what has been really experimentally shown by the authors and closely related other data, rather than providing lots of far-reaching speculations.

      In the same direction, in my view, go claims like 'local BM is an intrinsic trait' (L. 448) , which is not only imprecise (maybe better 'mechanisms of processing of local BM cues') but also rather questionable. Likely, this' local processing of BM' is a lower level mechanisms, located probably in early and mid-levels of the visual cortex, with a possible influence of lower structures. It seems not really plausible that this is related to a classical trait variables in the sense of psychology, like personality, as seems to be suggested here. Also here I suggest a much more moderate and less speculative interpretation of the results.

      We thank the reviewer for pointing out these issues. According to these comments, we have carefully revised the discussion to avoid strong (over-) claims. We have deleted the example of chicks, but substituted with more empirical studies to explain our results. We agree that the Local BM mechanism is probably located in subcortical regions in humans, which were reported by some MRI studies (Chang et al., 2018; Hirai and Senju, 2020; Loula et al., 2005). We have added some evidence that atypical local BM processing may decrease visual inputs related to social information as follows:

      “According to the two-process theory of biological motion processing61, local BM cues guide visual attention towards BM stimuli55,62. Consequently, the visual input of BM stimuli increases, facilitating the development of the ability to process global BM cues through learning21,63. The latter is a prerequisite for attributing intentions to others and facilitating social interactions with other individuals20,64,65. Thus, atypical local BM processing may contribute to impaired social interaction through altered visual inputs.” (lines 421 - 427)

      We have also deleted the clarims of 'local BM is an intrinsic trait' (originally L. 448) and related discussion as it was not conclusive based on the current study.

      Reviewer #2 (Public Review):

      Summary:

      Tian et al. aimed to assess differences in biological motion (BM) perception between children with and without ADHD, as well as relationships to indices of social functioning and possible predictors of BM perception (including demographics, reasoning ability and inattention). In their study, children with ADHD showed poorer performance relative to typically developing children in three tasks measuring local, global, and general BM perception. The authors further observed that across the whole sample, performance in all three BM tasks was negatively correlated with scores on the social responsiveness scale (SRS), whereas within groups a significant relationship to SRS scores was only observed in the ADHD group and for the local BM task. Local and global BM perception showed a dissociation in that global BM processing was predicted by age, while local BM perception was not. Finally, general (local & global combined) BM processing was predicted by age and global BM processing, while reasoning ability mediated the effect of inattention on BM processing.

      Strengths:

      Overall, the manuscript is presented in a clear fashion and methods and materials are presented with sufficient detail so the study could be reproduced by independent researchers. The study uses an innovative, albeit not novel, paradigm to investigate two independent processes underlying BM perception. The results are novel and have the potential to have wide-reaching impact on multiple fields.

      We appreciate the reviewer’s positive feedback very much.

      Weaknesses:

      The manuscript has greatly improved in clarity and methodological considerations in response to the review. There are only a few minor points which deserve the authors' attention:

      When outlining the moviation for the current study, results from studies in ADHD and ASD are used too interchangeably. The authors use a lack of evidence for contributing (psychological/developmental) factors on BM processing in ASD to motivate the present study and refer to evidence for differences between typical and non-typical BM processing using studies in both ASD and ADHD. While there are certainly overlapping features between the two conditions/neurotypes, they are not to be considered identical and may have distinct etiologies, therefore the distinction between the two should be made clearer.

      We thank the reviewer for pointing out this issue. We have removed some unnecessary citations about ASD and referred to studies about social cognition in ADHD to elaborate the motivation of this study:

      “Further exploration of a diverse range of social cognitions (e.g., biological motion perception) can provide a fresh perspective on the impaired social function observed in ADHD. Moreover, recent studies have indicated that the social cognition in ADHD may vary depending on different factors at the cognitive, pathological, or developmental levels, such as general cognitive impairment5, symptoms severity8, or age5. Nevertheless, understanding how these factors relate to social cognitive dysfunction of in ADHD is still in its infancy. Bridging this gap is crucial as it can help depict the developmental trajectory of social cognition and identify effective interventions for impaired social interaction in individuals with ADHD.” (lines 53 - 62)

      In the first/main analysis, is unclear to me why in the revised manuscript the authors changed the statistical method from ANOVA/ANCOVA to independent samples t-tests (unless the latter were only used for post-hoc comparisons, then this needs to be stated). Furthermore, although p-values look robust, for this analysis too it should be indicated whether and how multiple comparison problems were accounted for.

      Thanks for the reviewer’s comments. According to the suggestions from reviewer #3, it may be inapposite to regard gender as a covariate in ANOVA, which may violate the assumptions of ANCOVA. To ensure that gender does not influence the results, firstly, we separated boys and girls on the plots with different coloured individual data points, and there are no signs of a gender effect in their TD group. Secondly, we use t-tests to examine the difference between TD and ADHD groups. Finally, we conducted a subsampling analysis with balanced data, and the results remained consistent.

      In part 1 of the results, we aimed to compare the task accuracies between the TD and ADHD groups in three independent tasks, which assess the participants’ abilities to process three types of BM cues. We assumed that individuals with ADHD show poorer performance in three tasks compared to TD individuals. With regard to that, we consider that multiple comparisons may not be necessary.

      Reviewer #3 (Public Review):

      Strengths:

      The authors present differences between ADHD and TD children in biological motion processing, and this question has not received as much attention as equivalent processing capabilities in autism. They use a task that appears well controlled. They raise some interesting mechanistic possibilities for differences in local and global motion processing, which are distinctions worth exploring. The group differences will therefore be of interest to those studying ADHD, as well as other developmental conditions, and those examining biological motion processing mechanisms in general.

      We appreciate the reviewer’s positive assessment of this work.

      Weaknesses:

      The data are not strong enough to support claims about differences between global and lobal processing wrt social communication skills and age. The mechanistic possibilities for why these abilities may dissociate in such a way are interesting, but the crucial tests of differences between correlations do not present a clear picture. Further empirical work would be needed to test the authors' claims. Specifics:

      The authors state frequently that it was the local BM task that related to social communication skills (SRS) and not the global tasks. However, the results section shows a correlation between SRS and all three tasks. The only difference is that when looking specifically within the ADHD group, the correlation is only significant for the local task. The supplementary materials demonstrate that tests of differences between correlations present an incomplete picture. Currently they have small samples for correlations, so this is unsurprising.

      Thanks for this comment. We agree with the reviewer that the relationship between local and global processing with social communication and age needs more expirical work. Based on our results, there are only possible dissociable roles of local and global BM processing. The accumulated evidence from previous studies has converged on this dissociation, e.g., whild global BM processing is susceptible to learning and practice, local BM processing does not show a learning trend (Chang and Troje, 2009; Grossman et al., 2004), and the brain activations in response to local and global BM cues are different (Chang et al., 2018; Duarte et al., 2022). We concurred with reviewers that the evidence for such dissociation from the current study by itself is not strong enough. Therefore, we have toned down on this point and no longer emphasized the dissociation. Based on the current results, we focused our discussion on the different aspects of BM processing in children with and without ADHD. Future studies with larger sample sizes are needed to confirm this disociable relationship.

      Theoretical assumptions. The authors make some statements about local vs global biological motion processing that should still be made more tentatively. They assume that local processing is specifically genetically whereas global processing is a product of experience. These data in newborn chicks are controversial and confounded - I cannot remember the specifics but I think there an upper vs lower visual field complexity difference here.

      We appreciate the reviewer’s suggestion. We agree that the relationship between genetic factors and BM perception remains to be further examined as we didn’t perform any genetic analysis in the current study. Some speculative papers have been removed, so do the statement about newborn chicks given the controversial and confounded results. We have toned down our claims and povided a moderate interpretation of the results:

      “Sensitivity to local BM cues emerges early in life54,55 and involves rapid processing in the subcortical regions16,56-58. As a basic pre-attentive feature23, local BM cues can guide visual attention spontaneously59,60. In contrary, the ability to process global BM cues is related to slow cortical BM processing and is influenced by many factors such as attention25,26 and visual experience21,51. As mentioned above, we found a significant negative correlation between the SRS total score and the accuracy of local BM processing, specifically in the ADHD group. This could be due to decreased visual input related to atypical local BM processing, which further impairs global BM processing. According to the two-process theory of biological motion processing61, local BM cues guide visual attention towards BM stimuli55,62. Consequently, the visual input of BM stimuli increases, facilitating the development of the ability to process global BM cues through learning21,63. The latter is a prerequisite for attributing intentions to others and facilitating social interactions with other individuals20,64,65. Thus, atypical local BM processing may contribute to impaired social interaction through altered visual inputs.” (lines 413 - 427)

      “Few developmental studies have been conducted on local BM processing. The ability to process local BM cues remained stable and did not exhibit a learning trend21,25. A reasonable interpretation may be that local BM processing is a low-level mechanism, probably performed by the primary visual cortex and subcortical regions such as the superior colliculus, pulvinar, and ventral lateral nucleus14,56,61.” (lines 441- 446)

      Readability. The manuscript needs very careful proofreading and correction for grammar. There are grammatical errors throughout.

      Thank the reviewer for this feedback. We have performed thorough proofreading and corrected grammatical errors throughout the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I thank the authors for their revisions that address several of the minor points that I raised in my last review. A number of requests are still not sufficiently answered:

      L. 290 ff.: These model 'BM-local = age + gender etc ' is a pretty sloppy notation. I think what is meant that a GLM was used that uses the predictors genderetc. time appropriate beta_i values. This formulas should be corrected or one just says that a GLM was run with the predictors gender

      The same criticism applies to these other models that follow.

      This was corrected.

      However, the corrected text remains sloppy: example: 'BM-locaL = ...' What exacty is 'BM-Local' the accuracy? etc. Here a precise notation shoudl be given that clearly names which variables are used here as predictors and target variables.

      We appreciate the reviewer’s suggestion. We clarified which variables are used in our model and gived them precise notations:

      “Three linear models were built to investigate the contributing factors: (a) ACClocal = β0 + β1 * age + β2 * gender + β3 * FIQ + β4 * QbInattention, (b) ACCglobal = β0 + β1 * age + β2 * gender + β3 * FIQ + β4 * QbInattention, and (c) ACCgeneral = β0 + β1 * age + β2 * gender + β3 * FIQ + β4 * QbInattention + β5 * ACClocal + β6 * ACCglobal. ACClocal, ACCglobal and ACCgeneral refer to the response accuracies of the three tasks in the ADHD group, and QbInattention is the standardised score for sustained attention function.” (lines 337 - 343)

      All these models assume linearity of the combination of the predictors. was this assumption verified?

      We referred to the previous study of BM perception in children. They found main predictor variables, including IQ (Rutherford et al., 2012; Jones et al., 2011) and age (Annaz et al., 2010; van et al., 2016), have a linear relation with the ability of BM processing.

      This answer is insufficient and not convincing. Because a variable Y depends linearly on predictor A and B in some other study, this does not imply that is is also linear in predictor C, or does not show interactions with such predictors in the present study.

      What is needed here is the testing of models with interaction terms and verifying that such models are not better predictors. If authors do not want to do this, they need at least to clearly point out that they made the strong assumption of linearity of their model, which might be wrong and thus be a substantial limitation of their analysis.

      Thanks for the suggestion. We tried to compare each possible mode with and without relative interactions. The results showed that the change of Coefficient of Determination (R-squared, R2) between the two models was not statistically significant.

      L. 296ff.: For model (b) it looks like general BM performance is strongly driven by the predictor global BM performance in the ADHD group. Does the same observation also apply to the controls?

      The same phenomenon was not observed in TD children. We have briefly discussed this point in the Discussion section of the revised manuscript (lines 449 - 459).

      Was such a path analysis also done for the TD subjects or not? If yes, was then also predicted that the variable BM-Global largely and directedly influences the variable BM-General? (The answer refers to the general discussion section, where no such analysis is presented, as far as I understand.)

      Thank you for your comment. We also conduct a path analysis similar to that in the ADHD group. There is no statistically significant mediator effect in the TD group. Please see Figure S3 for complete statistics.

      Reviewer #2 (Recommendations For The Authors):

      (1) Please add public access to the data repository so data availability can be assessed.

      The data analyzed during the study is available at https://osf.io/37p5s/.

      (2) Lines 119-115: The differences observed in ADHD participants in the studies referenced here were relative to what group? The last sentence here also refers to two groups, and it is difficult to gather which specific groups are meant, also because the two references relate to both ADHD and ASD samples. Please clarify.

      The suggestion is well taken. We have clarified the expressions accordingly:

      “Specifically, compared with the typically developing (TD) group, children with ADHD showed reduced activity of motion-sensitive components (N200) while watching biological and scrambled motions, although no behavioural differences were observed. Another study found that children with ADHD performed worse in BM detection with moderate noise ratios than the TD group32.” (lines 100 - 105)

      (3) Line 116: I'm not sure what is meant by 'despite initial indications' - please briefly specify/summarise here why the investigation into BM processing in ADHD is warranted.

      Thank the reviewer for pointing out this issue. We rephrase this part and briefly specify “why the investigation into BM processing in ADHD is warranted”:

      “Despite initial findings about atypical BM perception in ADHD, previous studies on ADHD treated BM perception as a single entity, which may have led to misleading or inconsistent findings28. Hence, it is essential to deconstruct BM processing into multiple components and motion features.” (lines 108 -111)

      (4) Lines 290-293: Please complete the sentence.

      Thank the reviewer for pointing out this issue. Th sentence has been completed:

      “For Task 2 and 3, where children were asked to detect the presence or discriminate the facing direction of the target walker, TD group have higher accuracies than the ADHD group (Task 2 - TD: 0.70 ± 0.12, ADHD: 0.59 ± 0.12, t73 = 3.677, p < 0.001, Cohen's d = 0.861; Task 3 - TD: 0.79 ± 0.12, ADHD: 0.63 ± 0.17, t73 = 4.702, p < 0.001, Cohen's d = 1.100).” (lines 284 - 288)

      Reviewer #3 (Recommendations For The Authors):

      (1) Conclusions concerning differences between the local and global tasks wrt SRS and age (see above). I believe the authors need to reword throughout to reflect that the tests of differences between these crucial correlations did not present a clear picture.

      We have reworded throughout the paper to reflect the inconclusiveness with regard to the relationship between local and global processing with social communication based on this study only. Future studies with larger sample sizes are needed to confirm this conclusion. The mechanism for this dissociable relationship should be validated by more psychologial tests in the future studies.

      (2) I would again tone down the discussion of genetic specification of local processing, given it is highly controversial.

      We thank the reviewer for pointing out the issue. We agree the point about the genetic specification of local processing remains controversial. The interpretation of results about local BM processing has been rephrased. Please refer to our response to the point #2 mentioned.

      (3) The manuscript needs very careful proofreading and grammatical correction throughout.

      Thanks for the suggestion to check the grammar. We have carefully proofread the manuscript to correct grammatical errors

    1. Author response:

      Response to Reviewer #1 (Public Review):

      We thank the reviewer for their constructive criticism of our study, their proposed solutions, and for highlighting areas of the methodology and analytical pipeline where explanations were unclear or unsatisfactory. We will take the reviewer’s feedback into account to improve the clarity and readability of the revised manuscript. We acknowledge the importance of ruling out eye movements as a potential confound. We address these concerns briefly below, but a more detailed explanation (and a full breakdown of the relevant analyses, including the corrected and uncorrected results) will be provided in the revised manuscript.

      First, the source of EEG activity recorded from the frontal electrodes is often unclear. Without an external reference, it is challenging to resolve the degree to which frontal EEG activity represents neural or muscular responses1. Thus, as a preventative measure against the potential contribution of eye movement activity, for all our EEG analyses, we only included activity from occipital, temporal, and parietal electrodes (the selected electrodes can be seen in the final inset of Figure 3).

      Second, as suggested by the reviewer, we re-ran our analyses using the activity measured from the frontal electrodes alone. If the source of the nonlinear decoding accuracy in the AV condition was muscular activity produced by eye movements, we would expect to observe better decoding accuracy from sensors closer to the source. Instead, we found that decoding accuracy from the frontal electrodes (peak d' = 0.08) was less than half that of decoding accuracy from the more posterior electrodes (peak d' = 0.18). These results suggest that the source of neural activity containing information about stimulus position was located over occipito-parietal areas, consistent with our topographical analyses (inset of Figure 4).

      Third, we compared the average eye movements between the three main sensory conditions (auditory, visual, and audiovisual). In the visual condition, there was little difference in eye movements corresponding to the five stimulus locations, likely because the visual stimuli were designed to be spatially diffuse. For the auditory and audiovisual conditions, there was more distinction between eye movements corresponding to the stimulus locations. However, these appeared to be the same between auditory and audiovisual conditions. If consistent saccades to audiovisual stimuli had been responsible for the nonlinear decoding we observed, we would expect to find a higher positive correlation between horizontal eye position and stimulus location in the audiovisual condition than in the auditory or visual conditions. Instead, we found no difference in correlation between audiovisual and auditory stimuli, indicating that eye movements were equivalent in these conditions and unlikely to explain better decoding accuracy for audiovisual stimuli.

      Finally, we note that the stricter eye movement criterion acknowledged in the Discussion section of the original manuscript resulted in significantly better audiovisual d' than the MLE prediction, but this difference did not survive cluster correction. This is an important distinction to make as, when combined with the results described above, it seems to support our original interpretation that the stricter criterion combined with our conservative measure of (mass-based) cluster correction2 led to type 2 error.

      References

      (1) Roy, R. N., Charbonnier, S., & Bonnet, S. (2014). Eye blink characterization from frontal EEG electrodes using source separation and pattern recognition algorithms. Biomedical Signal Processing and Control, 14, 256–264.

      (2) Pernet, C. R., Latinus, M., Nichols, T. E., & Rousselet, G. A. (2015). Cluster-based computational methods for mass univariate analyses of event-related brain potentials/fields: A simulation study. Journal of Neuroscience Methods, 250, 85–93.

      Response to Reviewer #2 (Public Review):

      We thank the reviewer for their insight and constructive feedback. As emphasized in the review, an interesting question that arises from our results is that, if the neural data exceeds the optimal statistical decision (MLE d'), why doesn’t the behavioural data? We agree with the reviewer’s suggestion that more attention should be devoted to this question, and plan to provide a deeper discussion of the relationship between behavioural and neural super-additivity in the revised manuscript. We also note that while this discrepancy remains unexplained, our results are consistent with the literature. That is, both non-linear neural responses (single-cell recordings) and behavioural responses that match MLE are reliable phenomenon in multisensory integration1,2,3,4.

      One possible explanation for this puzzling discrepancy is that behavioural responses occur sometime after the initial neural response to sensory input. There are several subsequent neural processes between perception and a behavioural response5, all of which introduce additional noise that may obscure super-additive perceptual sensitivity. In particular, the mismatch between neural and behavioural accuracy may be the result of additional neural processes that translate sensory activity into a motor response to perform the behavioural task.

      Our measure of neural super-additivity (exceeding optimally weighted linear summation) differs from how it is traditionally assessed (exceeding summation of single neuron responses)2. However, neither method has yet fully explained how this neural activity translates to behavioural responses, and we think that more work is needed to resolve the abovementioned discrepancy. However, our method will facilitate this work by providing a reliable method of measuring neural super-additivity in humans, using non-invasive recordings.

      References

      (1) Alais, D., & Burr, D. (2004). The ventriloquist effect results from near-optimal bimodal integration. Current Biology, 14(3), 257–262.

      (2) Ernst, M. O., & Banks, M. S., (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415(6870), 429–433.

      (3) Meredith, M. A., & Stein, B. E. (1993). Interactions among converging sensory inputs in the superior colliculus. Science, 221, 389–391.

      (4) Stanford, T. R., & Stein, B. E. (2007). Superadditivity in multisensory integration: putting the computation in context. Neuroreport 18, 787–792.

      (5) Heekeren, H., Marrett, S. & Ungerleider, L. (2008). The neural systems that mediate human perceptual decision making. Nature Reviews Neuroscience, 9, 467–479.

    1. Reviewer #1 (Public Review):

      In this study, Gonzalez Alam et al. report a series of functional MRI results about the neural processing from the visual cortex to high-order regions in the default-mode network (DMN), compiling evidence from task-based functional MRI, resting-state connectivity, and diffusion-weighted imaging. Their participants were first trained to learn the association between objects and rooms/buildings in a virtual reality experiment; after the training was completed, in the task-based MRI experiment, participants viewed the objects from the earlier training session and judged if the objects were in the semantic category (semantic task) or if they were previously shown in the same spatial context (spatial context task). Based on the task data, the authors utilised resting-state data from their previous studies, visual localiser data also from previous studies, as well as structural connectivity data from the Human Connectome Project, to perform various seed-based connectivity analysis. They found that the semantic task causes more activation of various regions involved in object perception while the spatial context task causes more activation in various regions for place perception, respectively. They further showed that those object perception regions are more connected with the frontotemporal subnetwork of the DMN while those place perception regions are more connected with the medial-temporal subnetwork of the DMN. Based on these results, the authors argue that there are two main pathways connecting the visual system to high-level regions in the DMN, one linking object perception regions (e.g., LOC) leading to semantic regions (e.g., IFG, pMTG), the other linking place perception regions (e.g., parahippocampal gyri) to the entorhinal cortex and hippocampus.

      Below I provide my takes on (1) the significance of the findings and the strength of evidence, (2) my guidance for readers regarding how to interpret the data, as well as several caveats that apply to their results, and finally (3) my suggestions for the authors.

      (1) Significance of the results and strength of the evidence

      I would like to praise the authors for, first of all, trying to associate visual processing with high-order regions in the DMN. While many vision scientists focus specifically on the macroscale organisation of the visual cortex, relatively few efforts are made to unravel how neural processing in the visual system goes on to engage representations in regions higher up in the hierarchy (a nice precedent study that looks at this issue is by Konkle and Caramazza, 2017). We all know that visual processing goes beyond the visual cortex, potentially further into the DMN, but there's no direct evidence. So, in this regard, the authors made a nice try to look at this issue.

      Having said this, the authors' characterisation of the organisation of the visual cortex (object perception/semantics vs. place perception/spatial contexts) does not go beyond what has been known for many decades by vision neuroscience. Specifically, over the past two decades, numerous proposals have been put forward to explain the macroscale organisation of the visual system, particularly the ventrolateral occipitotemporal cortex. A lateral-medial division has been reliably found in numerous studies. For example, some researchers found that the visual cortex is organised along the separation of foveal vision (lateral) vs. peripheral vision (medial), while others found that it is structured according to faces (lateral) vs. places (medial). Such a bipartite division is also found in animate (lateral) vs. inanimate (medial), small objects (lateral) vs. big objects (medial), as well as various cytoarchitectonic and connectomic differences between the medial side and the lateral side of the visual cortex. Some more recent studies even demonstrate a tripartite division (small objects, animals, big objects; see Konkle and Caramazza, 2013). So, in terms of their characterisation of the visual cortex, I think Gonzalez Alam et al. do not add any novel evidence to what the community of neuroscience has already known.

      However, the authors' effort to link visual processing with various regions of the DMN is certainly novel, and their attempt to gather converging evidence with different methodologies is commendable. The authors are able to show that, in an independent sample of resting-state data, object-related regions are more connected with semantic regions in the DMN while place-related regions are more connected with navigation-related regions in the DMN, respectively. Such patterns reveal a consistent spatial overlap with their Kanwisher-type face/house localiser data and also concur with the HCP white-matter tractography data. Overall, I think the two pathways explanation that the authors seek to argue is backed by converging evidence. The lack of travelling wave type of analysis to show the spatiotemporal dynamics across the cortex from the visual cortex to high-level regions is disappointing though because I was expecting this type of analysis would provide the most convincing evidence of a 'pathway' going from one point to another. Dynamic caudal modelling or Granger causality may also buttress the authors' claim of pathway because many readers, like me, would feel that there is not enough evidence to convincingly prove the existence of a 'pathway'.

      (2) Guidance to the readers about interpretation of the data

      The organisation of the visual cortex and the organisation of the DMN historically have been studied in parallel with little crosstalk between different communities of researchers. Thus, the work by Gonzalez Alam et al. has made a nice attempt to look at how visual processing goes beyond the realm of the visual cortex and continues into different subregions of the DMN.

      While the authors of this study have utilised multiple methods to obtain converging evidence, there are several important caveats in the interpretation of their results:

      (1) While the authors choose to use the term 'pathway' to call the inter-dependence between a set of visual regions and default-mode regions, their results have not convincingly demonstrated a definitive route of neural processing or travelling. Instead, the findings reveal a set of DMN regions are functionally more connected with object-related regions compared to place-related regions. The results are very much dependent on masking and thresholding, and the patterns can change drastically if different masks or thresholds are used.

      (2) Ideally, if the authors could demonstrate the dynamics between the visual cortex and DMN in the primary task data, it would be very convincing evidence for characterising the journey from the visual cortex to DMN. Instead, the current connectivity results are derived from a separate set of resting state data. While the advantage of the authors' approach is that they are able to verify certain visual regions are more connected with certain DMN regions even under a task-free situation, it falls short of explaining how these regions dynamically interact to convert vision into semantic/spatial decision.

      (3) There are several results that are difficult to interpret, such as their psychophysiological interactions (PPI), representational similarity analysis, and gradient analysis. For example, typically for PPI analysis, researchers interrogate the whole brain to look for PPI connectivity. Their use of targeted ROI is unusual, and their use of spatially extensive clusters that encompass fairly large cortical zones in both occipital and temporal lobes as the PPI seeds is also an unusual approach. As for the gradient analysis, the argument that the semantic task is higher on Gradient 1 than the spatial task based on the statistics of p-value = 0.027 is not a very convincing claim (unhelpfully, the figure on the top just shows quite a few blue 'spatial dots' on the hetero-modal end which can make readers wonder if the spatial context task is really closer to the unimodal end or it is simply the authors' statistical luck that they get a p-value under 0.05). While it is statistically significant, it is weak evidence (and it is not pertinent to the main points the authors try to make).

      (3) My suggestion for the authors

      There are several conceptual-level suggestions that I would like to offer to the authors:

      (1) If the pathway explanation is the key argument that you wish to convey to the readers, an effective connectivity type of analysis, such as Granger causality or dynamic caudal modelling, would be helpful in revealing there is a starting point and end point in the pathway as well as revealing the directionality of neural processing. While both of these methods have their issues (e.g., Granger causality is not suitable for haemodynamic data, DCM's selection of seeds is susceptible to bias, etc), they can help you get started to test if the path during task performance does exist. Alternatively, travelling wave type of analysis (such as the results by Raut et al. 2021 published in Science Advances) can also be useful to support your claims of the pathway.

      (2) I think the thresholding for resting state data needs to be explained - by the look of Figure 2E and 3E, it looks like whole-brain un-thresholded results, and then you went on to compute the conjunction between these un-thresholded maps with network templates of the visual system and DMN. This does not seem statistically acceptable, and I wonder if the conjunction that you found would disappear and reappear if you used different thresholds. Thus, for example, if the left IFG cluster (which you have shown to be connected with the visual object regions) would disappear when you apply a conventional threshold, this means that you need to seriously consider the robustness of the pathway that you seek to claim... it may be just a wild goose that you are chasing.

      (3) There are several analyses that are hard to interpret and you can consider only reporting them in the supplementary materials, such as the PPI results and representational similarity analysis, as none of these are convincing. These analyses do not seem to add much value to make your argument more convincing and may elicit more methodological critiques, such as statistical issues, the set-up of your representational theory matrix, and so on.

    1. Author response:

      Thanks for the eLife assessment

      “This study employed a comprehensive approach to examining how the MT+ region integrates into a complex cognition system in mediating human visuo-spatial intelligence. While the findings are useful, the experimental evidence is incomplete and the study design, hypothesis, analyses, writing, and presentation need to be improved.” We plan to revise the manuscript according to the comments of Public Reviews.

      We are grateful for the excellent and very helpful comments, and now we address provisional author responses.

      Reviewer #1 (Public Review):

      Summary:

      The study of human intelligence has been the focus of cognitive neuroscience research, and finding some objective behavioral or neural indicators of intelligence has been an ongoing problem for scientists for many years. Melnick et al, 2013 found for the first time that the phenomenon of spatial suppression in motion perception predicts an individual's IQ score. This is because IQ is likely associated with the ability to suppress irrelevant information. In this study, a high-resolution MRS approach was used to test this theory. In this paper, the phenomenon of spatial suppression in motion perception was found to be correlated with the visuo-spatial subtest of gF, while both variables were also correlated with the GABA concentration of MT+ in the human brain. In addition, there was no significant relationship with the excitatory transmitter Glu. At the same time, SI was also associated with MT+ and several frontal cortex FCs.

      Strengths:

      (1) 7T high-resolution MRS is used.

      (2) This study combines the behavioral tests, MRS, and fMRI.

      Weaknesses:

      (1) In the intro, it seems to me that the multiple-demand (MD) regions are the key in this study. However, I didn't see any results associated with the MD regions. Did I miss something??

      Thank reviewer for pointing this out. After careful consideration, we agree with your point of view. According to the results of Melnick 2013, the motion surround suppression (SI) and the time thresholds of small and large gratings representing hMT+ functionality are correlated with Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indicators, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. This suggests that hMT+ does have the potential to become the core of MD system. However, due to our results only delving into “the GABA-ergic inhibition in human MT predicts visuo-spatial intelligence mediated by reverberation with frontal cortex”, it is not yet sufficient to prove that hMT+is the core node of the MD system, we will adjust the explanatory logic of the article, that is, emphasizing the de-redundancy of hMT+ in visual-spatial intelligence and the improvement of information processing efficiency, while weakening the significance of hMT+ in MD systems.

      (2) How was the sample size determined? Is it sufficient??

      Thank reviewer for pointing this out. We use G*power to determine our sample size. In the study by Melnick (2013), they reported a medium effect between SI and Perception Reasoning sub-ability (r=0.47). Here we use this r value as the correlation coefficient (ρ H1), setting the power at the commonly used threshold of 0.8 and the alpha error probability at 0.05. The required sample size is calculated to be 26. This ensures that our study has adequate power to yield valid statistical results. Furthermore, compared to earlier within-subject studies like Schallmo et al.'s 2018 research, which used 22 datasets to examine GABA levels in MT+ and the early visual cortex (EVC), our study includes a more extensive dataset.

      (3) In Schallmo elife 2018, there was no correlation between GABA concentration and SI. How can we justify the different results different here?

      Thank reviewer for pointing this out. There are several differences between us:

      a. While the earlier study by Schallmo et al. (2018) employed 3T MRS, we utilize 7T MRS, enhancing our ability to detect and measure GABA with greater accuracy.

      b. Schallmo elife 2018 choose to use the bilateral hMT+ as the MRS measurement region while we use the left hMT+. The reason why we focus on left hMT+ are describe in reviewer 1. (6). Briefly, use of left MT/V5 as a target was motivated by studies demonstrating that left MT/V5 TMS is more effective at causing perceptual effects (Tadin et al., 2011).

      c. The resolution of MRS sequence in Schallmo elife 2018 is 3 cm isotropic voxel, while we apply 2 cm isotropic voxel. This helps us more precisely locate hMT+ and exclude more white matter signal.

      (4) Basically this study contains the data of SI, BDT, GABA in MT+ and V1, Glu in MT+ and V1-all 6 measurements. There should be 6x5/2 = 15 pairwise correlations. However, not all of these results are included in Figure 1 and supplementary 1-3. I understand that it is not necessary to include all figures. But I suggest reporting all values in one Table.

      We thank the reviewer for the good suggestion, we are planning to make a correlation matrix to reporting all values.

      (5) In Melnick (2013), the IQ scores were measured by the full set of WAIS-III, including all subtests. However, this study only used the visual spatial domain of gF. I wonder why only the visuo-spatial subtest was used not the full WAIS-III?

      We thank the reviewer for pointing this out. The decision was informed by Melnick’s findings which indicated high correlations between Surround suppression (SI) and the Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indexes, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. It is well-established that the hMT+ region of the brain is a sensory cortex involved in visual perception processing (3D perception). Furthermore, motion surround suppression (SI), a specific function of hMT+, aligns closely with this region's activities. Given this context, the Perception Reasoning sub-ability was deemed to have the clearest mechanism for further exploration. Consequently, we selected the most representative subtest of Perception Reasoning—the Block Design Test—which primarily assesses 3D visual intelligence.

      (6) In the functional connectivity part, there is no explanation as to why only the left MT+ was set to the seed region. What is the problem with the right MT+?

      We thank the reviewer for pointing this out. The main reason is that our MRS ROI is the left hMT+, we would like to make different models’ ROI consistent to each other. Use of left MT/V5 as a target was motivated by studies demonstrating that left MT/V5 TMS is more effective at causing perceptual effects (Tadin et al., 2011). In addition, we will check the results of our localizer to confirm whether similar findings are consistently replicated.

      (7) In Melnick (2013), the authors also reported the correlation between IQ and absolute duration thresholds of small and large stimuli. Please include these analyses as well.

      We thank the reviewer for the good advice. Containing such result do help researchers compare the result between Melnick and us. We are planning to make such picture in the revised version.

      Reviewer #2 (Public Review):

      Summary:

      Recent studies have identified specific regions within the occipito-temporal cortex as part of a broader fronto-parietal, domain-general, or "multiple-demand" (MD) network that mediates fluid intelligence (gF). According to the abstract, the authors aim to explore the mechanistic roles of these occipito-temporal regions by examining GABA/glutamate concentrations. However, the introduction presents a different rationale: investigating whether area MT+ specifically, could be a core component of the MD network.

      Strengths:

      The authors provide evidence that GABA concentrations in MT+ and its functional connectivity with frontal areas significantly correlate with visuo-spatial intelligence performance. Additionally, serial mediation analysis suggests that inhibitory mechanisms in MT+ contribute to individual differences in a specific subtest of the Wechsler Adult Intelligence Scale, which assesses visuo-spatial aspects of gF.

      Weaknesses:

      (1) While the findings are compelling and the analyses robust, the study's rationale and interpretations need strengthening. For instance, Assem et al. (2020) have previously defined the core and extended MD networks, identifying the occipito-temporal regions as TE1m and TE1p, which are located more rostrally than MT+. Area MT+ might overlap with brain regions identified previously in Fedorenko et al., 2013, however the authors attribute these activations to attentional enhancement of visual representations in the more difficult conditions of their tasks. For the aforementioned reasons, It is unclear why the authors chose MT+ as their focus. A stronger rationale for this selection is necessary and how it fits with the core/extended MD networks.

      We really appreciate reviewer’s opinions. The reason why we focus on hMT+ is following: According to the results of Melnick 2013, the motion surround suppression (SI) and the time thresholds of small and large gratings representing hMT+ functionality are correlated with Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indicators, with high correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. In addition, Fedorenko et al. 2013, the averaged MD activity region appears to overlap with hMT+. Based on these findings, we assume that hMT+ does have the potential to become the core of MD system.

      (2) Moreover, although the study links MT+ inhibitory mechanisms to a visuo-spatial component of gF, this evidence alone may not suffice to position MT+ as a new core of the MD network. The MD network's definition typically encompasses a range of cognitive domains, including working memory, mathematics, language, and relational reasoning. Therefore, the claim that MT+ represents a new core of MD needs to be supported by more comprehensive evidence.

      Thank reviewer for pointing this out. After careful consideration, we agree with your point of view. Due to our results only delving into visuo-spatial intelligence, it is not yet sufficient to prove that hMT is the core node of the MD system. We will adjust the explanatory logic of the article, that is, emphasizing the de-redundancy of hMT+in visual-spatial intelligence and the improvement of information processing efficiency, while weakening the significance of hMT+ in MD systems.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript aims to understand the role of GABA-ergic inhibition in the human MT+ region in predicting visuo-spatial intelligence through a combination of behavioral measures, fMRI (for functional connectivity measurement), and MRS (for GABA/glutamate concentration measurement). While this is a commendable goal, it becomes apparent that the authors lack fundamental understanding of vision, intelligence, or the relevant literature. As a result, the execution of the research is less coherent, dampening the enthusiasm of the review.

      Strengths:

      (1) Comprehensive Approach: The study adopts a multi-level approach, i.e., neurochemical analysis of GABA levels, functional connectivity, and behavioral measures to provide a holistic understanding of the relationship between GABA-ergic inhibition and visuo-spatial intelligence.

      (2) Sophisticated Techniques: The use of ultra-high field magnetic resonance spectroscopy (MRS) technology for measuring GABA and glutamate concentrations in the MT+ region is a recent development.

      Weaknesses:

      Study Design and Hypothesis

      (1) The central hypothesis of the manuscript posits that "3D visuo-spatial intelligence (the performance of BDT) might be predicted by the inhibitory and/or excitation mechanisms in MT+ and the integrative functions connecting MT+ with the frontal cortex." However, several issues arise:

      (1.1) The Suppression Index depicted in Figure 1a, labeled as the "behavior circle," appears irrelevant to the central hypothesis.

      We thank the reviewer for pointing this out. In our study, the inhibitory mechanisms in hMT+ are conceptualized through two models: the neurotransmitter model and the behavior model. The Suppression Index is essential for elucidating the local inhibitory mechanisms within behavior model. However, we acknowledge that our initial presentation in the introduction may not have clearly articulated our hypothesis, potentially leading to misunderstandings. We plan to revise the introduction to better clarify these connections and ensure the relevance of the Suppression Index is comprehensively understood.

      (1.2) The construct of 3D visuo-spatial intelligence, operationalized as the performance in the Block Design task, is inconsistently treated as another behavioral task throughout the manuscript, leading to confusion.

      We thank the reviewer for pointing this out. We acknowledge that our manuscript may have inconsistently presented this construct across different sections, causing confusion. To address this, we plan to ensure a consistent description of 3D visuo-spatial intelligence in both the introduction and the discussion sections. But we would like to maintain 'Block Design task score' within the results section to help readers clarify which subtest we use.

      (1.3) The schematics in Figure 1a and Figure 6 appear too high-level to be falsifiable. It is suggested that the authors formulate specific and testable hypotheses and preregister them before data collection.

      We thank the reviewer for pointing this out. We are planning to revise the Figure 1a and make it less abstract and more logical. For Figure 6, the schematic represents our theoretical framework of how hMT+ works in the 3D viso-spatial intelligence, we believe the elements within this framework are grounded in related theories and supported by evidence discussed in our results and discussions section, making them specific and testable.

      (2) Central to the hypothesis and design of the manuscript is a misinterpretation of a prior study by Melnick et al. (2013). While the original study identified a strong correlation between WAIS (IQ) and the Suppression Index (SI), the current manuscript erroneously asserts a specific relationship between the block design test (from WAIS) and SI. It should be noted that in the original paper, WAIS comprises Similarities, Vocabulary, Block design, and Matrix reasoning tests in Study 1, while the complete WAIS is used in Study 2. Did the authors conduct other WAIS subtests other than the block design task?

      Thanks for pointing this out. Reviewer #1 also asked this question, we copy the answers in here “The decision was informed by Melnick’s findings which indicated high correlations between Surround suppression (SI) and the Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indexes, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. It is well-established that the hMT+ region of the brain is a sensory cortex involved in visual perception processing (3D perception). Furthermore, motion surround suppression (SI), a specific function of hMT+, aligns closely with this region's activities. Given this context, the Perception Reasoning sub-ability was deemed to have the clearest mechanism for further exploration. Consequently, we selected the most representative subtest of Perception Reasoning—the Block Design Test—which primarily assesses 3D visual intelligence.”

      (3) Additionally, there are numerous misleading references and unsubstantiated claims throughout the manuscript. As an example of misleading reference, "the human MT ... a key region in the multiple representations of sensory flows (including optic, tactile, and auditory flows) (Bedny et al., 2010; Ricciardi et al., 2007); this ideally suits it to be a new MD core." The two references in this sentence are claims about plasticity in the congenitally blind with sensory deprivation from birth, which is not really relevant to the proposal that hMT+ is a new MD core in healthy volunteers.

      Thanks for pointing this out. We have carefully read the corresponding references and considered the corresponding theories and agree with these comments. Due to our results only delving into “the GABA-ergic inhibition in human MT predicts visuo-spatial intelligence mediated by reverberation with frontal cortex”, it is not yet sufficient to prove that hMT+ is the core node of the MD system, we will adjust the explanatory logic of the article, that is, emphasizing the de redundancy of hMT+in visual-spatial intelligence and the improvement of information processing efficiency, while weakening the significance of hMT+ in MD systems. In addition, regarding the potential central role of hMT+ in the MD system, we agree with your view that research on hMT+ as a multisensory integration hub mainly focuses on developmental processes. Meanwhile, in adults, the MST region of hMT+ is considered a multisensory integration area for visual and vestibular inputs, which potentially supports the role of hMT+ in multitasking multisensory systems (Gu et al., J. Neurosci, 26(1), 73–85, 2006; Fetsch et al., Nat. Neurosci, 15, 146–154, 2012.). Further research could explore how other intelligence sub-ability such as working memory and language comprehension are facilitated by hMT+'s features.

      Another example of unsubstantiated claim: the rationale for selecting V1 as the control region is based on the assertion that "it mediates the 2D rather than 3D visual domain (Born & Bradley, 2005)". That's not the point made in the Born & Bradley (2005) paper on MT. It's crucial to note that V1 is where the initial binocular convergence occurs in cortex, i.e., inputs from both the right and left eyes to generate a perception of depth.

      Thank you for pointing this out. We acknowledge the inappropriate citation of "Born & Bradley, 2005," which focuses solely on the structure and function of the visual area MT. However, we believe that choosing hMT+ as the domain for 3D visual analysis and V1 as the control region is justified. Cumming and DeAngelis (Annu Rev Neurosci, 24:203–238.2001) state that binocular disparity provides the visual system with information about the three-dimensional layout of the environment, and the link between perception and neuronal activity is stronger in the extrastriate cortex (especially MT) than in the primary visual cortex(V1). This supports our choice and emphasizes the relevance of MT+ in our study. We will revise our reference in the revised version.

      Results & Discussion

      (1) The missing correlation between SI and BDT is crucial to the rest of the analysis. The authors should discuss whether they replicated the pattern of results from Melnick et al. (2013) despite using only one WAIS subtest.

      We thank for reviewer’s suggestion. Now the correlation result is placed in the supplemental material, we will put it back to the main text.

      (2) ROIs: can the authors clarify if the results are based on bilateral MT+/V1 or just those in the left hemisphere? Can the authors plot the MRS scan area in V1? I would be surprised if it's precise to V1 and doesn't spread to V2/3 (which is fine to report as early visual cortex).

      We thank for reviewer’s suggestion. We plan to draw the V1 ROI MRS scanning area and use the visual template to check if the scanning area contains V2/3. If it does, we will refer to it as the early visual cortex rather than specifically V1 in our reporting.

      (3) Did the authors examine V1 FC with either the frontal regions and/or whole brain, as a control analysis? If not, can the author justify why V1 serves as the control region only in the MRS but not in FC (Figure 4) or the mediation analysis (Figure 5)? That seems a little odd given that control analyses are needed to establish the specificity of the claim to MT+

      We thank for reviewer’s suggestion. We plan to do the V1 FC-behavior connection as control analysis. For mediation analysis, since V1 GABA/Glu has no correlation with BDT score, it is not sufficient to apply mediation analysis.

      (4) It is not clear how to interpret the similarity or difference between panels a and b in Figure 4.

      We thank reviewer for pointing this out. We plan to further interpret the difference between a and b in the revised version. Panels a represents BDT score correlated hMT+-region FC, which is obviously involved in frontal cortex. While panels b represents SI correlated hMT+-region FC, which shows relatively less regions. The overlap region is what we are interested in and explain how local inhibitory mechanisms works in the 3D viso-spatial intelligence. In addition, we would like to revise Figure 4 and point out the overlap region.

      (5) SI is not relevant to the authors‘ priori hypothesis, but is included in several mediation analyses. Can the authors do model comparisons between the ones in Figure 5c, d, and Figure S6? In other words, is SI necessary in the mediation model? There seem discrepancies between the necessity of SI in Figures 5c/S6 vs. Figure 5d.

      We thank the reviewer for highlighting this point. The relationship between the Suppression Index (SI) and our a priori hypotheses is elaborated in the response to reviewer 3, section (1). SI plays a crucial role in explicating how local inhibitory mechanisms function within the context of the 3D visuo-spatial task. Additionally, Figure 5c illustrates the interaction between the frontal cortex and hMT+, showing how the effects from the frontal cortex (BA46) on the Block Design Task are fully mediated by SI. This further underscores the significance of SI in our model.

      (6) The sudden appearance of "efficient information" in Figure 6, referring to the neural efficiency hypothesis, raises concerns. Efficient visual information processing occurs throughout the visual cortex, starting from V1. Thus, it appears somewhat selective to apply the neural efficiency hypothesis to MT+ in this context.

      We thank the reviewer for highlighting this point. There is no doubt that V1 involved in efficient visual information processing. However, in our result, the V1 GABA has no significant correlation between BDT score, suggesting that the V1 efficient processing might not sufficiently account for the individual differences in 3D viso-spatial intelligence. Additionally, we will clarify our use of the neural efficiency hypothesis by incorporating it into the introduction of our paper to better frame our argument.

      Transparency Issues:

      (1) Don't think it's acceptable to make the claim that "All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary information". It is the results or visualizations of data analysis, rather than the raw data themselves, that are presented in the paper/supp info.

      We thank reviewer for pointing this out. We realized that such expression will lead to confusion. We will delete this expression.

      (2) No GitHub link has been provided in the manuscript to access the source data, which limits the reproducibility and transparency of the study.

      We thank reviewer for pointing this out. We will attach the GitHub link in the revised version.

      Minor:

      "Locates" should be replaced with "located" throughout the paper. For example: "To investigate this issue, this study selects the human MT complex (hMT+), a region located at the occipito-temporal border, which represents multiple sensory flows, as the target brain area."

      We thank reviewer for pointing this out. We will revise it.

      Use "hMT+" instead of "MT+" to be consistent with the term in the literature.

      We thank reviewer for pointing this out. We agree to use hMT+ in the literature.

      "Green circle" in Figure 1 should be corrected to match its actual color.

      We thank reviewer for pointing this out. We will revise it.

      The abbreviation for the Wechsler Adult Intelligence Scale should be "WAIS," not "WASI."

      We thank reviewer for pointing this out. We will revise it.

    1. Reviewer #2 (Public Review):

      Summary:

      The authors re-analyse MEG data from a speech production and perception study and extend their previous Granger causality analysis to a larger number of cortical-cortical and in particular cortical-subcortical connections. Regions of interest were defined by means of a meta-analysis using Neurosynth.org and connectivity patterns were determined by calculating directed influence asymmetry indices from the Granger causality analysis results for each pair of brain regions. Abbasi et al. report feedforward signals communicated via fast rhythms and feedback signals via slow rhythms below 40 Hz, particularly during speaking. The authors highlight one of these connections between the right cerebellum lobule VI and auditory association area A5, where in addition the connection strength correlates negatively with the strength of speech tracking in the theta band during speaking (significant before multiple comparison correction). Results are interpreted within a framework of active inference by minimising prediction errors.

      While I find investigating the role of cortical-subcortical connections in speech production and perception interesting and relevant to the field, I am not yet convinced that the methods employed are fully suitable to this endeavour or that the results provide sufficient evidence to make the strong claim of dissociation of bottom-up and top-down information flow during speaking in distinct frequency bands.

      Strengths:

      The investigation of electrophysiological cortical-subcortical connections in speech production and perception is interesting and relevant to the field. The authors analyse a valuable dataset, where they spent a considerable amount of effort to correct for speech production-related artefacts. Overall, the manuscript is well-written and clearly structured.

      Weaknesses:

      The description of the multivariate Granger causality analysis did not allow me to fully grasp how the analysis was performed and I hence struggled to evaluate its appropriateness.<br /> Knowing that (1) filtered Granger causality is prone to false positives and (2) recent work demonstrates that significant Granger causality can simply arise from frequency-specific activity being present in the source but not the target area without functional relevance for communication (Schneider et al. 2021) raises doubts about the validity of the results, in particular with respect to their frequency specificity. These doubts are reinforced by what I perceive as an overemphasis on results that support the assumption of specific frequencies for feedforward and top-down connections, while findings not aligning with this hypothesis appear to be underreported. Furthermore, the authors report some main findings that I found difficult to reconcile with the data presented in the figures. Overall, I feel the conclusions with respect to frequency-specific bottom-up and top-down information flow need to be moderated and that some of the reported findings need to be checked and if necessary corrected.

      Major points

      (1) I think more details on the multivariate GC approach are needed. I found the reference to Schaum et al., 2021 not sufficient to understand what has been done in this paper. Some questions that remained for me are:

      (i) Does multivariate here refer to the use of the authors' three components per parcel or to the conditioning on the remaining twelve sources? I think the latter is implied when citing Schaum et al., but I'm not sure this is what was done here?

      If it was not: how can we account for spurious results based on indirect effects?

      (ii) Did the authors check whether the GC of the course-target pairs was reliably above the bias level (as Schaum et. al. did for each condition separately)? If not, can they argue why they think that their results would still be valid? Does it make sense to compute DAIs on connections that were below the bias level? Should the data be re-analysed to take this concern into account?

      (iii) You may consider citing the paper that introduced the non-parametric GC analysis (which Schaum et al. then went on to apply): Dhamala M, Rangarajan G, Ding M. Analyzing Information Flow in Brain Networks with Nonparametric Granger Causality. Neuroimage. 2008; 41(2):354-362. https://doi.org/10.1016/j.neuroimage.2008.02. 020

      (2) GC has been discouraged for filtered data as it gives rise to false positives due to phase distortions and the ineffectiveness of filtering in the information-theoretic setting as reducing the power of a signal does not reduce the information contained in it (Florin et al., 2010; Barnett and Seth, 2011; Weber et al. 2017; Pinzuti et al., 2020 - who also suggest an approach that would circumvent those filter-related issues). With this in mind, I am wondering whether the strong frequency-specific claims in this work still hold.

      (3) I found it difficult to reconcile some statements in the manuscript with the data presented in the figures:

      (i) Most notably, the considerable number of feedforward connections from A5 and STS that project to areas further up the hierarchy at slower rhythms (e.g. L-A5 to R-PEF, R-Crus2, L CB6 L-Tha, L-FOP and L-STS to R-PEF, L-FOP, L-TOPJ or R-A5 as well as R-STS both to R-Crus2, L-CB6, L-Th) contradict the authors' main message that 'feedback signals were communicated via slow rhythms below 40 Hz, whereas feedforward signals were communicated via faster rhythms'. I struggled to recognise a principled approach that determined which connections were highlighted and reported and which ones were not.

      (ii) "Our analysis also revealed robust connectivity between the right cerebellum and the left parietal cortex, evident in both speaking and listening conditions, with stronger connectivity observed during speaking. Notably, Figure 4 depicts a prominent frequency peak in the alpha band, illustrating the specific frequency range through which information flows from the cerebellum to the parietal areas." There are two peaks discernible in Figure 4, one notably lower than the alpha band (rather theta or even delta), the other at around 30 Hz. Nevertheless, the authors report and discuss a peak in the alpha band.

      (iii) In the abstract: "Notably, high-frequency connectivity was absent during the listening condition." and p.9 "In contrast with what we reported for the speaking condition, during listening, there is only a significant connectivity in low frequency to the left temporal area but not a reverse connection in the high frequencies."<br /> While Fig. 4 shows significant connectivity from R-CB6 to A5 in the gamma frequency range for the speaking, but not for the listening condition, interpreting comparisons between two effects without directly comparing them is a common statistical mistake (Makin and Orban de Xivry). The spectrally-resolved connectivity in the two conditions actually look remarkably similar and I would thus refrain from highlighting this statement and indicate clearly that there were no significant differences between the two conditions.

      (iv) "This result indicates that in low frequencies, the sensory-motor area and cerebellum predominantly transmit information, while in higher frequencies, they are more involved in receiving it."<br /> I don't think that this statement holds in its generality: L-CB6 and R-3b both show strong output at high frequencies, particularly in the speaking condition. While they seem to transmit information mainly to areas outside A5 and STS these effects are strong and should be discussed.

      (4) "However, definitive conclusions should be drawn with caution given recent studies raising concerns about the notion that top-down and bottom-up signals can only be transmitted via separate frequency channels (Ferro et al., 2021; Schneider et al., 2021; Vinck et al., 2023)."

      I appreciate this note of caution and think it would be useful if it were spelled out to the reader why this is the case so that they would be better able to grasp the main concerns here. For example, Schneider et al. make a strong point that we expect to find Granger-causality with a peak in a specific frequency band for areas that are anatomically connected when the sending area shows stronger activity in that band than the receiving one, simply because of the coherence of a signal with its own linear projection onto the other area. The direction of a Granger causal connection would in that case only indicate that one area shows stronger activity than the other in the given frequency band. I am wondering to what degree the reported connectivity pattern can be traced back to regional differences in frequency-specific source strength or to differences in source strength across the two conditions.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      • Again, in Figure 5, were FoxP3/CD4+ cells enumerated? Author Response: Fig 5 showed that the inflammatory score, and activation of CD4 and CD8 cells, were lower in the intestine of DSS-treated mice transplanted with Jag1Ndr/Ndr lymphocytes than in those transplanted with Jag1+/+ lymphocytes. However, in Figure 5 we had not quantified the number of FoxP3/CD4+ cells (Tregs). We agree that it would be interesting to know whether the dampened intestinal inflammation (in response to a classical inflammatory disease model (DSS-treatment)) is also mediated by excess Tregs. We will therefore now quantify Foxp3+ cells on the intestinal sections of experimental animals used for acquisition of data in Fig 5.

      • *

      Description of the revisions that have already been incorporated in the transferred manuscript.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Reviewer 1 comment: This is an interesting study that examines defects in the Jag1ndr/ndr mouse model of Alagille syndrome. The novel aspects of this manuscript are the comparisons, at many levels, between the mouse model and ALG patient samples, including an examination of immune profiles. The conclusions that the Jag1ndr/ndr mouse model is an accurate representation of the human ALG syndrome appear valid. However the reported differences in immune profiles, particularly in the Jag1ndr/ndr mouse model are difficult to understand. The data presented indicate a reduction in CD4+ cells in the Jag1ndr/ndr mouse at day P3 in both liver and spleen. Additionally, the authors report differences between the the Jag1ndr/ndr mouse and controls at day P30 in the relative percentages of DN, DP and SP CD4 and CD8 cells in the thymus. When examining the peripheral lymphoid system, CD4+ numbers are the same in both the Jag1ndr/ndr animals and controls however CD8+ numbers are reduced and FoxP3/CD4+ cells are increased in both the spleen and the thymus. FoxP3/CD4+ T cells are usually assumed to be regulatory T cells that dampen the inflammatory responses of T cells. Therefore, the increase in this population in an animal model of what is assumed to be an inflammatory disease is confusing and confounding. The authors do not present a clear analysis of how they feel an increase of Tregs would lead to this disease. One possibility is that this population is not functioning as conventional Tregs and rather are promoting inflammation but this conclusion would require a functional analysis of this population of cells, at the very least in an in vitro analysis of T cell suppression. From an immunologist's point of view, their data are antithetical to what one would expect to find in an inflammatory disease. Perhaps this reviewer is missing an important point but if I am missing it, then other who read this manusgcript also may be confused.

      Author Response: *We thank the reviewer for carefully assessing our work, and for noting which aspects of the immune analyses should be more thoroughly explained. We apologize for any confusion, which a clearer introduction will help to avoid. *

      *Alagille syndrome is not thought of as an inflammatory disorder, it is a congenital disorder affecting bile duct development (Kohut et al 2021, Semin Liver Dis). During normal bile duct development, JAG1+ portal fibroblasts signal to NOTCH2+ hepatoblasts to instruct bile duct development. In the context of low JAG1 signaling, hepatoblasts either fail to adopt a cholangiocyte fate, or fail to undergo bile duct morphogenesis, resulting in bile duct paucity and cholestasis. This cholestasis should activate inflammatory processes leading to fibrosis, which is the subject of this study. *

      • *

      We agree with the reviewer that Tregs would be expected to suppress inflammation, and our data are consistent with Treg suppression of inflammation. We show, for the first time, that Tregs are enriched in Jag1Ndr/Ndr mice (Fig 4) and present evidence that they suppress inflammation (Fig 5) and fibrosis (Fig 6), which could explain the atypical fibrosis seen in patients with ALGS.

      • *

      *To clarify that ALGS is a genetic liver disease affecting bile duct formation, we: *

      1. Modified and extended the following text in the Introduction (Page 2, lines 14-17): “ALGS is mainly caused by mutations in the Notch ligand JAGGED1 (JAG1, 94%) (Mašek & Andersson, 2017; Oda et al, 1997), affecting bile duct development and morphogenesis, resulting in bile duct paucity and cholestasis. Immune dysregulation has also been described (Tilib Shamoun et al, 2015), but how this might interact with liver disease in ALGS to affect fibrosis is not known.
      2. *Introduce the disease, the animal model, and the scientific question in a schematic in new Fig 1A. *
      3. * Reviewer 1 comment: Minor points that should be addressed include: • The source cells used in the transfer experiments reported in Figure 5 is unclear. Are they using total spleen cells with T, B and myeloid cells or are they using purified T cells. And if it is the latter, have they assessed the ratio of CD4+ versus FoxP3/CD4+ cells in the transferred cells?

      Author Response: *Total spleen cells including all lymphocytes were transplanted, as described in Materials and Methods. The constituent T-cell populations are characterized and shown in Fig 4F. To clarify this, we: *

      1. *added the text “Adoptive transfer of lymphocytes” to the schematic in Fig 5A, FigS5A, and Fig 6A, and *
      2. modified the opening paragraph related to results presented in Fig.5 and FigS5 in the following way (page 8, line 209): “To investigate Jag1Ndr/Ndr T cell function, we performed adoptive transfer of the splenic lymphocytes into Rag1-/- mice, which lack mature B- and T cell populations, but provide a host environment with normal Jag1 (Mombaerts et al, 1992).
      3. *

      *To acknowledge that B-cells and innate lymphoid cells might contribute to the observed results, we include a following sentence in the Discussion: *

      (page 12, lines 369-371) “Finally, our experimental setup does not exclude an additional contribution of other lymphocytes (B-cells or innate lymphoid cells) to the BDL-induced fibrosis, and selective testing of the individual subpopulations would be an intriguing follow up to this study.”

      Reviewer 1 comment: In the DSS experiments in Figure 5, there does not appear to be a no DSS control. What does the architecture look like without DSS?

      Author Response: The intestinal architecture and phenotype of mice transplanted with Jag1+/+ or Jag1Ndr/Ndr lymphocytes, not treated with DSS, are presented in Supplementary Figure 5. In the absence of DSS, Jag1+/+- or Jag1Ndr/Ndr -transplanted mice exhibit no overt differences in survival or weight gain/loss. The intestinal inflammatory score was not different in the two conditions and was *2.29 +/-0.44 and 2.03 +/-0.92 for Jag1+/+- or Jag1Ndr/Ndr -transplanted mice, respectively. *

      To compare the results with and without DSS, we added the following text to the results section, when describing the DSS results (Page 9, lines 223-226):

      As expected, histological scoring of intestinal and colonic inflammation revealed elevated inflammation in Jag1+/+→Rag1-/- mice treated with DSS (Fig. 5C,D) compared to Jag1+/+→Rag1-/- mice not treated with DSS (Fig. S5). However, there was significantly less inflammation in Jag1Ndr/Ndr→Rag1-/- mice than in Jag1+/+→Rag1-/- mice (Fig. 5C,D)."

      Reviewer 1 comment: The authors noted that splenomegaly was observed in the Jag1ndr/ndr mouse model. Again this is antithetical to what one would expect when one sees an increase in FoxP3/CD4+ T regs.

      Author Response: *We thank the reviewer for pointing at a possible discrepancy, related to Fig1 in which we report the presence of splenomegaly. Although there can be multiple causes of splenomegaly, it is one of the hallmarks of portal hypertension (as also corroborated by Reviewer 2), tightly connected with liver fibrosis, present in patients with ALGS and we report it as such in the manuscript. To clarify this, we added the following text sections: *

      1. Results (page 2, lines 37,38) “Liver fibrosis compresses blood vessels and reduces their blood flow, leading to portal hypertension, a serious consequence of liver disease which can manifest as splenomegaly.
      2. Discussion (page 13, line 394-401): “Splenomegaly has been described as a consequence of portal hypertension in ALGS (Kamath et al, 2020), but could also be attributed to immune-related pathology. Jag1Ndr/Ndr mice exhibit splenomegaly as early as P10, and is exacerbated at P30 ( 1E,F). Patients with other liver diseases display portal hypertension and cirrhosis, with both splenomegaly and hypersplenism associated with a high CD4+/CD8+ ratio, but a low Treg+/CD4+ ratio (Nomura et al, 2014). However, Jag1Ndr/Ndr mice present with splenomegaly but not hypersplenism. An overactive spleen (hypersplenism) would remove red blood cells which are instead enriched in Jag1Ndr/Ndr mice, and Tregs were enriched in Jag1Ndr/Ndr mice, not depleted as seen in cirrhosis/hypersplenism. These data are thus consistent with portal hypertension-induced splenomegaly rather than hypersplenism.*” *

      Reviewer #1 (Significance (Required)):

      Reviewer 1 comment: The strengths of this paper are the careful comparisons between the mouse model and the human ALG syndrome. These comparisons are valuable and worth publication.

      Author Response: We thank the reviewer for these comments.

      Reviewer 1 comment: Weaknesses are stated above. Needs a clearer explanation for their immune analysis.

      Author Response: *We thank the reviewers for highlighting points requiring clarification and hope the proposed text changes and additional data presented in response to the comments of all three reviewers lead to a significant clarification of the immunological aspect of our study. *

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Reviewer 2 comment:

      Summary: Masek and colleagues use multi-pronged studies on the Jag1[Ndr/Ndr] mouse model of Alagille syndrome (ALGS) combined with transcriptomic analysis on livers from patients with ALGS to elucidate the potential mechanisms regulating liver fibrosis in this disease. The authors first show that Jag1[Ndr/Ndr] animals develop pericellular and perisinusoidal fibrosis and exhibit evidence for portal hypertension, similar to patients with ALGS. Single-cell RNA-sequencing indicated more hepatoblasts and less hepatocytes, relatively speaking, in Jag1[Ndr/Ndr] P3 livers, which suggested hampering of hepatoblast differentiation to hepatocytes. Deconvolution of previously generated bulk RNA-seq data from Jag1[Ndr/Ndr] P10 livers and GESA on RNAseq data from livers of these mice and patients with ALGS confirmed the P3 scRNA-seq observations and indicated mild pro-inflammatory activation of immature hepatocytes in ALGS livers. GESA also suggested an inability of Jag1[Ndr/Ndr] livers to attract T cells upon cholestatic injury. Indeed, 25-color flow cytometry on liver and spleen from mutant and control mice indicated a defect in T cell response to cholestasis in this model. The authors then examined the effects of the Ndr mutation on T-cell development and function. They found that the Ndr/Ndr thymi were significantly smaller than control thymi. Moreover, Ndr/Ndr thymi showed an increase in CD4+ T-cells and Tregs at the expense of double-positive T-cells. The authors then performed lymphocyte transplantation studies and concluded that Ndr/Ndr T-cells fail to mount an adequate response to inflammation in a DSS model of ulcerative colitis. The authors tested the contribution of Ndr/Ndr immune cells to liver fibrosis in a model of experimentally induced cholestasis (bile duct ligation; BDL). Ndr/Ndr T-cells did not show any defects in migrating into the liver upon BDL. However, the periportal fibrosis observed in BDL model was reduced in animals receiving Ndr/Ndr immune cells compared to those receiving Jag1+/+ immune cells. This was accompanied by significantly less aSMA staining in these livers. Finally, reanalysis of bulk RNAseq data from liver samples from ALGS and other liver diseases suggested that the presence of FOXP3+ T-reg cells in the liver is associated with higher liver fibrosis in non-ALGS liver diseases but lower liver fibrosis in ALGS livers. The authors have used an impressive combination of single-cell RNA-sequencing, reanalysis of previous bulk RNA-sequencing data from their group and others, 25-color FACS analysis, and adoptive immune transfer experiments in this manuscript, and systematically provide quantification and statistical analysis for their data. Overall, this is an interesting and important study. Prior studies are referenced appropriately. The text and figures are clear and accurate. I don't think any additional experiments are essential. However, the issues listed under Major comments should be discussed and clarified in the manuscript, especially the first item.

      Author Response: *We sincerely thank the reviewer for the comprehensive and insightful assessment of our manuscript. We are particularly gratified to note your acknowledgment of the thoroughness of our experimental approach and the clarity of our presentation. We are pleased that no further experiments would be required, and will address the points raised under Major comments which enhance our study's quality and accessibility. *

      Reviewer 2 comment:

      Major comments:

      • Only a small fraction of the cells in scRNA-seq experiments have been assigned to hepatocytes/hepatoblast clusters, with the majority of these cells allocated to Hepato-Ery cluster. This suggests that many hepatocytes and potentially hepatoblasts have been lost during sample preparation. The authors should discuss this issue and its potential implications on the interpretation of the cell ratios and gene expression conclusions of scRNA-seq data. Author Response: We agree with the reviewer regarding this aspect of our study. We mentioned this limitation in the supplementary methods section: ”Liver parenchymal cells constituted ~6.5% of cells at E16.5, and ~7.5% of cells at P3 and included mesenchymal cells, endothelial cells, hepatoblasts and hepatocytes (Fig. S1D), this parenchymal proportion is lower than in vivo, but consistent with ex vivo liver digest (Guilliams et al, 2022).” We recognize it may be too inaccessible there, and we thus added the following text to the Discussion section of the manuscript: (Pages 11-12, lines 330-337) “A limitation of this study is the underrepresentation of the hepatoblast/cyte parenchymal cells in the scRNA-seq dataset (Fig. 2A-D), which constituted ~6.5% of analyzed cells at E16.5, and ~7.5% of cells at P3 (Fig. S1D). This parenchymal proportion is lower than in vivo, but is consistent with scRNA seq datasets obtained with ex vivo liver digest (Guilliams et al, 2022). One risk is that cell stress as a result of dissociation could result in further loss of injured Jag1Ndr/Ndr hepatocytes, impacting the interpretation of cell type abundance. Nuclear scRNAseq can overcome cell type-dependent dissociation sensitivity bias (Guilliams et al, 2022), and could provide further insights into Jag1Ndr/Ndr livers at the single cell level. Nonetheless, both bulk RNA seq deconvolution and histological analyses confirmed that patients and Jag1Ndr/Ndr mice exhibit hepatoblast enrichment and less differentiated hepatocytes.

      Reviewer 2 comment: The Jag1[Ndr/Ndr] strain is an excellent model for various aspects of ALGS phenotypes. However, when it comes to linking the effects of this mutation to the function of a specific cell type, it is worth considering that Jag1[Ndr/Ndr] might not recapitulate the effects of loss of one copy of JAG1 observed in most patients with ALGS. This is especially important given the sensitivity of various cellular and organ-level processes to the degree of Notch pathway activation. In the context of the present manuscript, it is possible that what the authors have observed in Jag1[Ndr/Ndr] lymphocytes does not mirror how a JAG1-heterozygous human lymphocyte behaves. This is not a major concern, but it is worth considering.

      Author Response: We agree and thus added the following discussion paragraph (page 11, lines 315-321) “In patients with ALGS, who have a single mutation in either JAG1 or NOTCH2, the remnant healthy allele(s) could be expected to mediate signaling. However, some JAG1 mutations exhibit dominant negative effects (Ponio et al, 2007; Xiao et al, 2013; Guan et al, 2023), which could entail further repression of JAG1/NOTCH2 signaling. In this context, it is important to note that the Jag1Ndr/Ndr mice are homozygous for the missense mutation, but retain some JAG1 activity, and it is not clear to which degree this mimics JAG1 heterozygosity in humans. It would be of interest to test whether Jag1 potency affects hepatoblast differentiation or injury-induced reversion of hepatocytes in patients as a function of their genotype.

      Reviewer 2 comment: •The basis for the opposite type of correlation between COL1A1 expression and POXP3 level in ALGS versus non-ALGS liver disease is not clear.

      Author Response: We thank the reviewer for pointing out the unclear interpretation of the patient data. In patients with ALGS, the extent of fibrosis is likely to be highly multifactorial, involving (as we show) hepatocyte immaturity, dampened inflammation, and immune system dysregulation (possibly involving more than T-cells). Since human patients ARE so heterogeneous, teasing apart the relative contribution of each is currently outside the scope of our study, but will be an important area of future research. Nonetheless we thought it was important and interesting to show these patterns in supplementary Fig 6, now extended with further data, and analyses, and described in the following manner:

      • *

      Results section: (page 10, lines 267-275) “Liver damage in non-ALGS liver disease (using liver injury marker LGALS3BP) (Yang et al, 2021), was positively correlated with recruitment of lymphocytes (including CD8A+,and FOXP3+ populations of T cells), as well as the extent of fibrosis (COL1A1 abundance) (Fig. S6G). However, in ALGS, the extent of liver damage, lymphocyte recruitment and fibrosis were unlinked (Fig. S6G). These data are in line with the observation that liver stiffness (a proxy for fibrosis) in ALGS is independent of biomarkers of liver disease (Leung et al, 2023). While Treg infiltration in ALGS was independent of liver damage, it exhibited a tendency towards a negative correlation with fibrosis (Fig. S6G), corroborating that elevated levels of Tregs may limit fibrosis in ALGS. Altogether, these data suggest that the liver and lymphocytes may be differentially affected in different patients with ALGS, a disorder that is well known for its heterogenous presentation.

      Minor comments:

      • Page 2, last paragraph of Introduction, Page 12 last sentence, and Supplementary Methods: Please use "adoptive immune transfer" instead of "adaptive immune transfer". • Pages 3 and 4: Reference is made to Figures 3E-O, which appears to be Figure 2E-O. • Figure 3 legend: "Analysis in (E) is one-way ANOVA with Dunnett's multiple comparison test". Panel E compares two means, so ANOVA is not the appropriate statistical analysis for these data. Is this sentence related to panel D? • Page 9: Please correct misspelling: "response to intestinal insult (Fig. 5). W therefore". • The Science Translation Medicine references lack page number. Author Response: *We thank the reviewer deeply for taking the time to meticulously note and convey these errors, helping us to correct these. The suggested corrections have been implemented. Science Transl Med is an online journal and does not have page numbers – we have added an issue number to facilitate retrieval of these references. *

      • *

      Additionally, we noticed that the image of a consecutive liver section with CYP1A2 staining from Jag1Ndr/Ndr liver in Fig 2 L was accidentally flipped along the horizontal axis, which we have now corrected. We also changed the scRNAseq cell cluster naming from Hepatoblasts/cytes, Hepato_Ery, and Kupffer cells, Kuffer cells_Ery to Hepatoblasts/cytes I, and II, and Kupffer cells I and II, respectively, to match the Neutrophil progenitors I and II naming convention. Names were subsequently also changed in Fig S1 and methods.

      **Referees cross-commenting**

      To my knowledge, ALGS is not considered to be an inflammatory disorder. Furthermore, the splenomagaly observed in the mouse model could be due to portal hypertension rather than a primary immune disturbance. Having said that, I agree with the other reviewers that the manuscript will benefit from further discussion and clarification on the immune-related observations.

      Author Response: We thank Reviewer 2 for indicating to Reviewer 1 that ALGS is not considered an inflammatory disorder, which we agree with. It was not our intention to convey this idea. To avoid confusion, we now:

      1. *Added a schematic in Fig 1A. *
      2. Modified and extended the following text in the Introduction: (Page 2, lines 14-17): “ALGS is mainly caused by mutations in the Notch ligand JAGGED1 (JAG1, 94%) (Mašek & Andersson, 2017; Oda et al, 1997), affecting bile duct development and morphogenesis, resulting in bile duct paucity and cholestasis. Immune dysregulation has also been described (Tilib Shamoun et al, 2015), but how this might interact with liver disease in ALGS to affect fibrosis is not known. *Furthermore, we have addressed or will address all comments from reviewer 1 to clarify the immune-related observations. *

      Reviewer #2 (Significance (Required)):

      Despite severe cholestasis, ALGS patients do not show as much fibrosis as other cholestatic diseases, including biliary atresia (BA). A previous study had suggested that this phenomenon could be due to the difference in the nature of reactive hepatobiliary cells in ALGS compared to BA (Fabris et al, 2007). Moreover, a number of studies have suggested a role for Notch pathway activation in several cell types in the liver in the development of liver fibrosis (for example, Sawitza et al, Hepatology, 2009; Chen et al, Plos One, 2012; Duan et al, Hepatology, 2018; Yu et al, Science Translational Medicine, 2021). However, although a role for Notch signaling in T-cells is well established, it was not known whether impaired T-cell development/function contributes to reduced fibrosis in ALGS liver disease. Accordingly, the current manuscript provides novel insight into the mechanism of fibrosis in this disease. Moreover, the observation that Jag1-mutant T-cells do not confer as much protection as control T-cells to immunodeficient mice subjected to DSS-induced ulcerative colitis provides strong evidence for impaired T-cell immunity in this ALGS model and might help explain other aspects of ALGS phenotypes.

      The manuscript will be of interest to broad audience (Notch signaling, cholestatic liver disease, mechanisms of liver fibrosis, T-cell development).

      I have expertise in Notch signaling and in using animal models of human developmental disorders.

      __Author Response: __We thank the reviewer for the balanced assessment of our manuscript in light of the current knowledge, and for highlighting its importance in the context of not only Notch and ALGS, but also other cholestatic and fibrotic liver diseases.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The article entitled "Jag1 Insufficiency Disrupts Neonatal T Cell Differentiation and Impairs Hepatocyte Maturation, Leading to Altered Liver Fibrosis" by Mašek et al described the role of Notch ligand JAGGED1 (JAG1) in the T-cell differentiation contributing to liver fibrosis and immune system development in ALGS. This article is well written and has important preliminary findings that could establish Jag1 and its downstream signaling pathways as potential therapeutic targets to attenuate liver fibrosis.

      Author Response: We thank the reviewer for recognizing our work and pointing out the therapeutical implications of our findings.

      Reviewer 3 comment 1: Minor comments: In page 4, they mentioned that "the hepatoblast marker alpha fetoprotein (AFP) was 3.1-fold enriched (Fig. 3J,K), while the mature hepatocyte marker CYP1A2 protein was 1.7-fold less expressed (Fig. 3L-M)", the figure numbers should be changed to 2J, K, L-M etc.

      Author Response:* We thank the reviewer for identifying these errors. The suggested corrections have been implemented. *

      Reviewer 3 comment 2: In liver fibrosis the Th17 cells play crucial roles. Please show the level of IL17A mRNA level in the liver in the Jag1Ndr/Ndr mice compared to the Jag1+/+ mice.

      Author Response: We thank the reviewer for the insightful comments. We indeed investigated the Th17 vs Treg immune response, however we detect neither Th17-expressed Il17, Il17a, Il17f, nor Il21 and Il22 mRNA in the bulk RNA data, suggesting their expression is either masked or they are not present in significant numbers within the liver tissue at P10, preventing us from drawing any conclusions about this cell population.

      Reviewer ____3 comment 3: Also, please show the expression level of pro-inflammatory molecules, for example, TNFα, IL1β, MCP1 etc and the level of MMPs (especially MMP2, MMP8, MMP9) in the livers of the mice models used.

      Author Response: *The expression of Il10, Il1b, Mcp1(Ccl2), was presented in the manuscript Fig. 2O, and we attach in the response to reviewers *

      *a full list together with the expression levels of Mmp2/8/9, Tnfa, Ifng, Il17 receptor family and Tgfb1-3. Out of these, Mmp8 (0.9 Log2fold change = 1.9-fold), Ccl2 (2.2 Log2fold change = 4.7-fold), and Tl17rb (1.1 Log2fold change = 2.1-fold) were significantly upregulated, but do not indicate any specific leukocyte population’s response. This is in line with data in Fig S2E, demonstrating a dominance of myeloid over adaptive immune response in the GSEA of the immune KEGGs. *

      *Since lymphocytes are underrepresented in the bulk transcriptomics, and individual genes might report activity of many different cell types, we chose to focus on the list of genes shown to be markers of activated hepatocytes, to avoid over interpretation of the RNA sequencing data. Instead, the immune analyses were based on flow cytometry data, which we expect should accurately report cell type abundance across organ systems. *

      Reviewer 3 comment____ 4. Authors have shown significant alterations in the Treg population in their Jag1Ndr/Ndr mice of ALGS. Please also show the expression of IL10 and TGFβ in the liver and whether they are correlated with the level of Treg populations.

      Author response:* IL10 and Tgfb mRNA levels in liver are shown in the heatmap in the response to reviewers, and were not significantly different between genotypes at P10. They were also not correlated with Foxp3 levels, as shown in the correlation matrices below (Pearson’s R values in top row, significance values in bottom row). *

      Reviewer 3 comment 5. It would be interesting to know whether the IFNγ mRNA expression in the livers were altered in the Jag1Ndr/Ndr mice with altered populations of CD8 T cells.

      Author Response: There was no significant difference in IFNγ mRNA expression levels between Jag1+/+ and Jag1Ndr/Ndr *livers at P10 (please see the heatmap in response to comment no.3, above). *

      Reviewer #3 (Significance (Required)): Strength: This article is well written and has important preliminary findings that could establish Jag1 and its downstream signaling pathways as potential therapeutic targets to attenuate liver fibrosis.

      Author Response: Thank you for these comments and pointing out the wider implications of our findings.


      Reviewer 3____ Limitations: This study lacked the detailed molecular pathways which could explain how the Jag1 altered the T-cell recruitment, development and hepatocyte maturation in the development of liver fibrosis in the ALGS model.

      Author Response: We agree that this study does not focus on molecular pathways. The intention of this study was to identify which cell populations contribute to atypical neonatal fibrosis in ALGS. Because we expected this process to be multifactorial, Jag1Ndr/Ndr mice, carrying a systemic mutation, present both advantages (Jag1 abrogation in all cells --> ALGS-like organ interactions) and limitations (inability to identify contributions of individual cell types). However, by identifying maturing hepatocytes and Tregs as dysregulated, and demonstrating that Jag1Ndr/Ndr lymphocytes behave abnormally and suppress inflammation and fibrosis in Rag1-/- mice (with normal Jag1 expression), we establish a biological framework that can now be further investigated with conditional genetic tools and in vitro systems, to elucidate specific molecular pathways, that were beyond the scope of the current study.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity (Required):

      Au et al. used two fly models to study how mitochondrial defects are implicated in C9ALS, the most common familial ALS type. They found that in these flies, mitochondrial, but not cytosolic, ROS is upregulated, accompanied by locomotion defects agreeing with previous publications. Consistent with these data, sod2, but not sod1, rescues the behavioral defects in these flies. Also, manipulating mitochondrial dynamics or mitophagy does not rescue these defects. Furthermore, the authors showed that the Nrf2 activity is upregulated, likely due to oxidative stress, and genetically or pharmacologically suppressing the Keap1 function, which activates Nrf2 and thereby its downstream antioxidative genes, suppresses behavior defects in these flies. This part is generally solid and convincing, with minor issues that need some revision. Finally, the authors showed that mitochondrial ROS and nuclear Nrf2 are both upregulated in C9 iPS neurons, both of which are suppressed by the Keap1 inhibitor DMF, or a known antioxidant. For this part, the data are convincing but insufficient to support a good translation of their fly data.

      __Major concerns: __

      1a. The authors really need a phenotypic readout for their iPS experiments, either cell death or some sort of toxicity, to support the translatability of their fly data.

      • We agree and appreciate the value of having such as phenotypic readout for the iPSC experiments but, unfortunately, within the context of the current work we did not obvious any clear phenotype of toxicity or diminished viability under basal, unchallenged conditions. To support this, we have added our analysis of cell viability at the time of imaging, shown in new Supplementary Figure 3C and mentioned in the text (line 620-621).

      1b. The authors also need to test the toxicity of DMF in iPS neurons.

      • As above, we found that treatment with DMF conferred no overt toxicity within the time-course of our experiments. These data are shown in new Supplementary Figure 3D and mentioned in the text (line 626-628).

      The authors should use genetic ways, e.g., knocking down Keap1, to activate Nrf2 and test whether this suppresses ROS and neurodegeneration phenotype in iPS neurons, as they did in flies.

      They need to better characterize the Nrf2 activity in iPS neurons (see Minor Concern #1).

      • Regarding these two points, we agree that it would be interesting to further investigate the Keap1/Nrf2 pathway in these cells, but time, personnel and resource constraints preclude additional investigations on this occasion. It is important to note that the cell models were used specifically to validate that elevated mitochondrial oxidative stress and increased nuclear Nrf2 localisation also occurred in patient-derived neurons, and whether DMF treatment could reverse the oxidative stress. This was the extent to which the cell models were used in this instance and the current data are sufficient to support the conclusions made based on this. We regret that it was not possible to delve deeper into this at the current time but will be possible in future work.

      __Minor concerns: __

      1a. Fig 4A and B are hard to comprehend. Can the authors show images with more obvious differences?

      • We have now revised these figure panels replacing with alternative images. We hope that the new images show more appreciable differences. We understand that the differences can sometimes be subtle which is why we rely on the quantification for unbiased interpretation.

      1b. Also, Gst-D1 is the only Nrf2 downstream gene tested. Can the authors use RT-PCR to test multiple genes? These will strengthen the point that Nrf2 is activated. Similar things should be done in iPS neurons.

      • Thanks for this suggestion. To complement the immunoblots of the genomic GstD1-GFP reporter, we have now performed qRT-PCR on flies treated with or without DMF for additional Keap1/Nrf2 pathway targets, including GstD1, Gclc, GstD2 and Cyp6a2. These data show that the degree of transcriptional activation was variable between different targets, but DMF treatment caused a general upregulation of CncC targets in G4C2x36 flies (new Fig. 6A).

      What about cytosolic ROS in C9 iPS neurons? Is it similar to the fly models?

      • We agree that this would be interesting to analyse. Unfortunately, given time and resource constraints we did not have the capacity to also explore this out of curiosity. Again, the specific focus for the iPSC neuron work was to validate the mitochondrial ROS aspect and action of DMF.

      Unless the authors confirm that mitochondrial dynamics or mitophagy are not contributing to neurodegeneration in iPS neurons, I wouldn't emphasize their related negative data in flies. Overall, the authors need to tone down their arguments if the findings are not verified in iPS or other mammalian models.

      • On reflection, we agree that the iNeuron data was given an overly prominent status within the study and we have adjusted the text accordingly throughout, including removing a specific mention of this in the title. That said, we still consider that the negative results regarding the lack of rescue of organism-scale phenotypes (e.g., locomotion) by manipulating mitochondrial dynamics or mitophagy to be important indicators of the relative mechanistic contribution of these processes to the organism-scale pathology (most closely reflecting the clinical condition). As discussed above (major point 1a), within the context of the current work we did not obvious any clear phenotype of toxicity or diminished viability in the patient iNeurons. Therefore, it is not readily possible to test the relative contribution of mitochondrial dynamics vs mitophagy vs ROS to the survival of these cells, so we have based our interpretations of this on the in vivomodels. In summary, we have toned down our statements relating to and stemming from data arising from the iNeuron work but our interpretation of the negative results in flies remains the same.

      Can the authors measure the activities of OXPHOS complexes and ATP synthase/complex V?

      • The intention of this study was to explore mechanisms that could alleviate pathological phenotypes in vivo. We have characterised a wide-range of cellular defects relating to mitochondrial dysfunction including overall OXPHOS function by OCR. Analysing individual OXPHOS complexes from animal tissue is not a trivial undertaking and, other than providing a little more granularity to the nature of the respiratory defect, we considered that this would be a distraction from the main focus of the study.

      5a. Edavarone is one of the only two effective drugs for general ALS, and it's believed to work as an antioxidant. The authors should discuss it along with relating their findings to therapeutic development.

      • A statement on Edaravone being an FDA-approved treatment for ALS and an antioxidant (ROS scavenger) were included in the text (lines 628-629). We have added further comment on this in the Discussion (lines 686-690). Since edaravone was used as a comparator in this study, and to maintain the focus on DMF, we prefer to not elaborate on this further in the discussion.

      5b. Also, the discussion on SOD1 aggregation sounds somewhat farfetched. Plus, it's not directly related to the central message of this paper. I would remove it.

      • Fair enough. We have removed these statements from the text.

      __Significance (Required): __

      C9orf72-mediated ALS is the most common familial ALS type and also accounts for a fraction of sporadic ALS cases. Its pathomechanism is incompletely understood. Previous studies have linked mitochondrial defects and ROS to pathogenesis in fly, iPS, mouse, etc. models, and antioxidants can suppress some neurodegenerative features in these models. Consistent with these findings, one of the only two effective drugs for general ALS, edaravone, is believed to mitigate oxidative stress in motor neurons. Hence, oxidative stress is a critical pathogenic contributor that holds great potential as a therapeutic target. However, our understanding of its cause and consequence in ALS is limited. This paper includes at least two novel points: 1) identifying mitochondrial, but not cytosolic, ROS is upregulated and contributes to neurodegeneration in C9ALS models; 2) discovering that the Keap1/Nrf2 is altered and activating Nrf2 suppresses neurodegeneration. The first point presents an incremental advance in the field, but the second one is potentially critical, especially from a translational aspect. That being said, the novelty of the second point is somewhat dampened by a recently published paper (Jiménez-Villegas, et al. 2022), which showed that Nrf2/Keap1 is altered in C9 patient leukocytes and NSC cells overexpressing or treated with C9-DPRs. However, these cells/models are remotely related to the disease. The current manuscript still provided evidence in an in vivo neuronal model for the first time. If the authors could make their iPS part comprehensive, this could still be a major advance towards translation.

      This paper could be interesting to a broad audience beyond the ALS field.

      Another strength of this paper is that the fly analyses are comprehensive, the data are convincing, and the conclusions are solid. However, the major weakness is that the iPSN part is incomplete to support the translatability of their findings in flies. Current data only suggest that DMF and EDV are functional in iPSNs.

      Reviewer #2

      __Evidence, reproducibility and clarity (Required): __

      the study of ALS uses almost exclusively drosophila larvae and adults and has a few expts with iNeurons (human) at the end. THe results are interesting and relevant to human disease and do suggest potential ways to treat disease. Not all the effect sizes are large, but nonetheless this is publishable material. More expts would of course strengthen their case. None of what I suggest is essential, but this depends in part on where they eventually want to publish their work.

      __Some comments below: __

      All are overexpression models with strong phenotypes. This has to be mentioned.

      • The nature of the genetic models is clearly delineated in the manuscript. To highlight this further in the text, we have added comments at the start of the Results section stating that Drosophila do not have an orthologue of C9orf72, so we use previously established transgenic models (lines 372-376). In fact, it is incorrect to call these 'overexpression' models because there isn't a C9orf72 orthologue to be overexpressed. Formally, they are ectopic expression models.

      Furthermore, in any ageing model every aspect of cell biology is affected.

      • Agreed.

      In fig 1E to the non-expert it is hard to work out what is a mitochondrion. Some higher res imaging might help.

      • It is indeed difficult to discern individual mitochondria with this particular approach. We have a lot of experience in this kind of analysis and higher resolution imaging does not resolve the problem. The challenges with imaging mitochondria in such tiny cell bodies is the reason that we have adopted a categorical scoring system.

      Line 390 comments on morphology but fig s1b-c is survival. Do they have morphology data? If not then they should rephrase the text

      • This is a misunderstanding. The brief mention of mitochondrial morphology at the start of the paragraph ("Mitochondrial morphology is known to respond to changes in reactive oxygen species (ROS) levels as well as other physiological stimuli." - lines 414-415) is to provide as a segue from the preceding section describing the morphology defects to the following sections that investigate the possible mechanisms affecting this.

      Line 441. Can they provide reference for 1000 being physiologically relevant? 36 is certainly pathological in humans. In my opinion the only genuinely physiologicall relevant model is a genetically faithful knockin without codon alteration.

      • We have rephrased this to be 'more physiologically relevant repeat length' and provided a reference.

      Line 482 - they say mitophagy is downstream, but isn't that obvious in a C9 transgenic model?

      • We appreciate that this statement was confusing. We are referring to 'upstream' or 'downstream' in the cascade of events that ensuing from expression of DPRs, not upstream or downstream with respect to C9 mutations themselves, so we have rephrased this as "not a primary contributor to C9orf72 pathology" (lines 502-503).

      7a. Line 502 - they indicate 'exploring the basis', but I am a little unclear what they are saying. What is the reason for the reduced SOD1 in x36 v x3 flies? Are they simply killing cells that have the most SOD1 and therefore their qPCRs/blots only represent those cells with less SOD1? There is still SOD1 being expressed there of course.

      • Thanks for allowing us to clarify this point. We have not been able to clarify the mechanism for why Sod1 appears to be downregulated upon G4C2x36 expression, which we acknowledge is a limitation. So, we have decided to adjust the language from 'exploring the basis', to now simply report this as an associated observation (line 527).

      7b. In the text it would help if they clarified if the genes overexpressed are human or fly. If human, it might be worth overexpressing mutant ALS SOD1 if they are able.

      • In general, when reporting on experiments with a model organism such as Drosophila, we work on the assumption that genetic manipulations will typically be that of the host species, i.e., transgenic expression with be of Drosophila genes, unless specifically stated otherwise. In any case, all the necessary details of all genetic strains used in this study are laid out in Methods.

      Line 521 - this para should perhaps be in intro section, not results.

      • Agreed. We have now edited the start of this section (lines 543-546).

      In Fig5, do they have CnnC IHC to back up their conclusion that keap1 mutation is affecting this process?

      • Thank you for this suggestion. We have now analysed CncC localisation in C9 models {plus minus} Keap1 mutation. As before, we saw that G4C2x36 caused an increase in CncC nuclear localisation, although there was a trend towards an increase with Keap1 heterozygosity this was not consistent enough to be significant. These data are presented in new Fig. 5D, E and discussed in the text (lines 579-581). Although these results do not show an additional increase of nuclear CncC by this treatment of DMF, we also performed qRT-PCR analysis of CncC target genes GstD1, GstD2, Gclc and Cyp6a2,from flies treated with or without DMF. These data show that the degree of transcriptional activation was variable between different targets, but DMF treatment caused a general upregulation of CncC targets in G4C2x36 flies (new Fig. 6A).

      The Induced neuron results are interesting. What kind of neurons are they? Have they been confirmed to be so with ICC? The figures in 6 are poor. They should make the point that correction of the mutation to ensure isogenicity would be an additional confirmatory measure. Isogenic lines are available from JAX and the UK MND Institute.

      • Agreed. We now provide further characterisation of the iNeurons that was done at the time of the original experiments but not presented. These analyses include immunostaining with neuronal marker antibodies against β-III Tubulin, MAP2 and NeuN. These data are shown in new Supplementary Figure 3A, B. We also report the relative viability of these neurons at the point of analysis (new Supplementary Figure 3C, D). We have added mention of this in the text (lines 620-621 and 627-628). Of note, these patient cell lines have been used and reported before (Reference 53) which we cite on line 618. We also acknowledge the limitations of using these lines, and that future work would be better done with isogenic controls (lines 690-692) as the reviewer indicates.

      Suppl fig 3 - interesting observation with edaravone, but do they have any survival/motility data in neurons/flies? Also, would be good to compare with another drug that works on a different mechamism E.g. riluzole.

      • Since edaravone is a known therapeutic for ALS and was used as a comparator, rather than being the primary focus, we do not have additional data on edaravone.

      Overall, the conclude they have done a comprehensive analysis of mito function, but I would argue that while a good analysis there are plenty of other studies they could have done e.g. assess mitochondrial respiratory chain.

      • We agree that additional studies can always be envisaged.

      13a. I also think the imaging of mitochondria could be better, and much work needs to be done on the iNeurons to characterise them.

      • As mentioned above, we have provided additional characterisation of the iNeurons in this revision.

      13b. Sentence line 674 - needs rephrasing.

      • Thanks for prompting this. We have now rewritten these sentences (now lines 700-701).

      In their final paragraph what do you they mean by oxidative stress being upstream? I would argue it is downstream of the C9 expansion, right?

      • We apologise that this was confusingly written. As per the comment above (response to point 6), we were referring to events 'upstream' or 'downstream' in the cascade of events that ensuing from expression of DPRs. We have now rephrased this to be a "proximal" pathogenic mechanism (lines 708-710). We hope that our intended meaning is now clearer in the text.

      __Significance (Required): __

      A good study, modest degree of advancement in the field.

      Reviewer #3

      __Evidence, reproducibility and clarity (Required): __

      In the present paper the authors focused on the hyper-production of ROS in a C9orf72 fly model. they the sought to rescue the observed fly phenotype by manipulating mitochondria dysfunctions or pathways downstream these dysfunctions.

      __Majors: __

      Given the wide varieties of statistical tests used a rationale should be given to why a certain test (one way anova) was used in one experiment (WB, qPCR) and another for another (Chi square) experiment (mitochondria morphology)

      • In all cases, the choice of statistical test is dictated by the nature of the data being analysed - a principal that should be well-understood by all experienced researchers - and so may vary between experiments but will be consistent between different data sets of the same type of experiment. For instance, for those data sets consisting of two groups, an unpaired t-test would be appropriate. Most other experiments consist of three or more experimental groups and so will need an appropriate test with additional post-hoc test to correct for multiple comparisons, such as one-way ANOVA with Bonferroni's post-hoc correction. Where data sets are not normally distributed, such as generated by our climbing assay, a non-parametric analysis is required, such as the Kruskal-Wallis test. Here we also use a Dunn's post-hoc correction for multiple comparisons. In some assays of multiple groups, there are also multiple variables, such as the different drug concentrations tested on control and C9 iNeurons, a two-way ANOVA with an appropriate post-hoc correction test is used. Finally, some assays employ a categorical scored system, such as the mitochondrial morphology analysis, which will require a different type of statistical analysis such as Chi squared test.
            These types of analysis are in no way unusual or 'cherry-picked' to give the most desirable outcomes but are selected simply based on the type of the data to be analysed following standard rules of statistical analysis. For this reason, we do not feel that any more elaborate explanation is necessary in the manuscript text itself, but we hope that the explanation given here will satisfy the reviewer of the rationale for employing different statistical tests for different data sets.
        

      The entire second part of the paper, and most important one to the authors (given the tile), rely mostly on a supposed loss in protection against antioxidant. I feel the experiment in support of this hypothesis are not strong. It is true that there is an overproduction of ROS (as evaluated in the first figures) but the loss in protection stated based on Fig 4H does not hold much. I think more experiment are needed to support this hypothesis.

      • This is a fair comment and on reflection we also agree that our claim that the response to oxidative stress is blunted in the C9 models is based almost exclusively on the data from (old) Fig. 4H, and so is not strong. On reflection, prompted by the reviewer's comment, we have removed this interpretation from the manuscript and revised our comments accordingly. Consequently, we have also removed Fig. 4H.

      Moreover, I counter intuitive that to rescue a phenotype the authors over expressed that is already high in C9orf72 flies (nrf). I would suggest to match this results with downregulation of nrf, to effectively proof that nrf decrease is detrimental to counteract ROS species in C9orf72 flies (further reducing protection against ROS). I believe this experiment is quite critical for the entire manuscript.

      • We appreciate the thinking behind this suggestion, but this experiment can't be performed because loss of CncC function is lethal, as expected from a master regulator of a major cell-protection mechanism.

      Also to me there is a little bit of disconnection between the first three figures and the last three. The authors also find a reuse effect over expressing SOD2 etc as shown in figure 3 where they actually show rescue in mitochondrial dysfunction (morphology etc). The only piece of data that shows rescue in mitochondrial dysfunction upon nrf over expression is figure 5H. More extensive characterization of mitochondrial dysfunction recur should be performed if the title want to kept focused on keep/nrf mechanism. Otherwise a broader title like "modulation of the mitochondria damage rescue C9orf72 phenotype." could help the reader understanding the overarching message of the paper

      • We do not see a disconnect between the first part of the paper and the second. To be clear, the first part was documenting mitochondria-related defects (morphology, ROS, mitophagy) and determining their causative hierarchy and mechanistic impact on organismal phenotypes (we found only certain antioxidants rescued locomotor deficits and could reverse mitochondrial morphology and mitophagy defects). As stated, these results strongly implicated oxidative stress as a major driver in organismal pathology. The second part of the study was characterising whether a major antioxidant defence pathway (Keap1/Nrf2) could be manipulated to provide phenotypic rescue on the organismal scale (i.e., locomotor behaviours). On reflection of the original title, we agree that this was too focussed on the mitochondrial dysfunction angle (and also gave too much prominence to the iNeuron part of the study). Therefore, we have now modified the title to reflect a greater focus on oxidative stress and locomotor behaviours across the study. We hope this the reviewer feels that this better represents the study but will be happy to consider suggested alternatives.

      __Minors: __

      Figure 1n does each for represent a cell? or is an average of more cells and each dot represent an animal? I could not find this information anywhere, but if each dots is a single cells, I would recommend scaling up to at least 10 cells. Same concern for Figure 3F

      We agree that this point needs clarification. Each dot represents data for one animal. The quantification per animal is based on at least 10 cells from one image. This has been added to the Methods section for clarification (lines 220-221).

      Line 550-1-2 I do not agree with the statement. I do not think that the data shown that the protection against ross is less efficient. The only difference is the starting point. But the final point is the same so why should protection against ROS be less efficient in G4C2x36 drosophilas?

      - This comment relates to point 2 above. As stated there, we agree that the data are not compelling enough to make this interpretation, so we have revised our comments accordingly.

      There are some concerns about the neurons in figure 3: they do not appear to have axons and dendrites. I'd suggest containing with neuronal marker.

      - The reviewer may be unfamiliar with the specific tissue in question; the larval ventral ganglion. As a complex, mature tissue there are multiple cell types (e.g., neurons and glia) very closely packed. Neuronal processes are very thin in this tissue, and they are squeezed between neighbouring cells. Thus, microscopy of neuronal cell biology within such a complex tissue does not look like in vitro cultured neurons. In the specific context of Figure 3, we are looking at markers for mitochondria or mitophagy. The reviewer may also be aware that mitochondria and mitolysosomes are most abundant in the cell bodies and have very limited abundance in neuronal processes. Thus, we do not generally try to observe these organelles in processes because there would be very little to see. We know that the signal is within neurons because the markers are transgenically expressed exclusively by a neuronal driver system i.e. nSyb-GAL4. In summary, there is no problem with how these cells or how they look. This is quite normal.

      iNeurons were only used to confirm the second part of the paper. Would be interesting to also confirm some of the results in the first part, like SOD2 over expression etc etc.

      • We appreciate this suggestion, which is similar to a comment from Reviewer 1, but, as replied above, time, personnel and resource constraints preclude additional investigations on this occasion. Just to reiterate, it is worth noting that the cell models were used specifically to validate that elevated mitochondrial oxidative stress and increased nuclear Nrf2 localisation also occurred in patient-derived neurons, and whether DMF treatment could reverse the oxidative stress. This was the extent to which the cell models were used in this instance and the current data are sufficient to support the conclusions made based on this. We regret that it was not possible to delve deeper into this at the current time but would be the focus of future work.

      __Significance (Required): __

      The present work while not extremely novel in the hypothesis, it is well performed with state-of-the-art techniques, some of them also very novel to the field. The concept of oxidative stress as an important in ALS pathogenesis is not new in the field, but the identification of Nrf as an important players might pave the way for more human related studies and possibly to therapeutic interventions.

      I think the work is technically sounded and well performed; certain evidence are solidly demonstrated with multiple different techniques. other evidences instead need a little more work to prove their solidity to widen the audience which will appreciate the content of this paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      We thank the reviewers for the detailed assessment of our work as well as their praise and constructive feedback which helped us to significantly improve our manuscript.

      Reviewer #1 (Public Review):

      The inferior colliculus (IC) is the central auditory system's major hub. It integrates ascending brainstem signals to provide acoustic information to the auditory thalamus. The superficial layers of the IC ("shell" IC regions as defined in the current manuscript) also receive a massive descending projection from the auditory cortex. This auditory cortico-collicular pathway has long fascinated the hearing field, as it may provide a route to funnel "high-level" cortical signals and impart behavioral salience upon an otherwise behaviorally agnostic midbrain circuit.

      Accordingly, IC neurons can respond differently to the same sound depending on whether animals engage in a behavioral task (Ryan and Miller 1977; Ryan et al., 1984; Slee & David, 2015; Saderi et al., 2021; De Franceschi & Barkat, 2021). Many studies also report a rich variety of non-auditory responses in the IC, far beyond the simple acoustic responses one expects to find in a "low-level" region (Sakurai, 1990; Metzger et al., 2006; Porter et al., 2007). A tacit assumption is that the behaviorally relevant activity of IC neurons is inherited from the auditory cortico-collicular pathway. However, this assumption has never been tested, owing to two main limitations of past studies:

      (1) Prior studies could not confirm if data were obtained from IC neurons that receive monosynaptic input from the auditory cortex.

      (2) Many studies have tested how auditory cortical inactivation impacts IC neuron activity; the consequence of cortical silencing is sometimes quite modest. However, all prior inactivation studies were conducted in anesthetized or passively listening animals. These conditions may not fully engage the auditory cortico-collicular pathway. Moreover, the extent of cortical inactivation in prior studies was sometimes ambiguous, which complicates interpreting modest or negative results.

      Here, the authors' goal is to directly test if auditory cortex is necessary for behaviorally relevant activity in IC neurons. They conclude that surprisingly, task relevant activity in cortico-recipient IC neuron persists in absence of auditory cortico-collicular transmission. To this end, a major strength of the paper is that the authors combine a sound-detection behavior with clever approaches that unambiguously overcome the limitations of past studies.

      First, the authors inject a transsynaptic virus into the auditory cortex, thereby expressing a genetically encoded calcium indicator in the auditory cortex's postsynaptic targets in the IC. This powerful approach enables 2-photon Ca2+ imaging from IC neurons that unambiguously receive monosynaptic input from auditory cortex. Thus, any effect of cortical silencing should be maximally observable in this neuronal population. Second, they abrogate auditory cortico-collicular transmission using lesions of auditory cortex. This "sledgehammer" approach is arguably the most direct test of whether cortico-recipient IC neurons will continue to encode task-relevant information in absence of descending feedback. Indeed, their method circumvents the known limitations of more modern optogenetic or chemogenetic silencing, e.g. variable efficacy.

      I also see three weaknesses which limit what we can learn from the authors' hard work, at least in the current form. I want to emphasize that these issues do not reflect any fatal flaw of the approach. Rather, I believe that their datasets likely contain the treasure-trove of knowledge required to completely support their claims.

      (1) The conclusion of this paper requires the following assumption to be true: That the difference in neural activity between Hit and Miss trials reflects "information beyond the physical attributes of sound." The data presentation complicates asserting this assumption. Specifically, they average fluorescence transients of all Hit and all Miss trials in their detection task. Yet, Figure 3B shows that mice's d' depends on sound level, and since this is a detection task the smaller d' at low SPLs presumably reflects lower Hit rates (and thus higher Miss rates). As currently written, it is not clear if fluorescence traces for Hits arise from trials where the sound cue was played at a higher sound level than on Miss trials. Thus, the difference in neural activity on Hit and Miss trials could indeed reflect mice's behavior (licking or not licking). But in principle could also be explained by higher sound-evoked spike rates on Hit compared to Miss trials, simply due to louder click sounds. Indeed, the amplitude and decay tau of their indicator GCaMP6f is non-linearly dependent on the number and rate of spikes (Chen et al., 2013), so this isn't an unreasonable concern.

      (2) The authors' central claim effectively rests upon two analyses in Figures 5 and 6. The spectral clustering algorithm of Figure 5 identifies 10 separate activity patterns in IC neurons of control and lesioned mice; most of these clusters show distinct activity on averaged Hit and Miss trials. They conclude that although the proportions of neurons from control and lesioned mice in certain clusters deviates from an expected 50/50 split, neurons from lesioned mice are still represented in all clusters. A significant issue here is that in addition to averaging all Hits and Miss trials together, the data from control and lesioned mice are lumped for the clustering. There is no direct comparison of neural activity between the two groups, so the reader must rely on interpreting a row of pie charts to assess the conclusion. It's unclear how similar task relevant activity is between control and lesioned mice; we don't even have a ballpark estimate of how auditory cortex does or does not contribute to task relevant activity. Although ideally the authors would have approached this by repeatedly imaging the same IC neurons before and after lesioning auditory cortex, this within-subjects design may be unfeasible if lesions interfere with task retention. Nevertheless, they have recordings from hundreds to thousands of neurons across two groups, so even a small effect should be observable in a between-groups comparison.

      (3) In Figure 6, the authors show that logistic regression models predict whether the trial is a Hit or Miss from their fluorescence data. Classification accuracy peaks rapidly following sound presentation, implying substantial information regarding mice's actions. The authors further show that classification accuracy is reduced, but still above chance in mice with auditory cortical lesions. The authors conclude from this analysis task relevant activity persists in absence of auditory cortex. In principle I do not disagree with their conclusion.

      The weakness here is in the details. First, the reduction in classification accuracy of lesioned mice suggests that auditory cortex does nevertheless transmit some task relevant information, however minor it may be. I feel that as written, their narrative does not adequately highlight this finding. Rather one could argue that their results suggest redundant sources of task-relevant activity converging in the IC. Secondly, the authors conclude that decoding accuracy is impaired more in partially compared to fully lesioned mice. They admit that this conclusion is at face value counterintuitive, and provide compelling mechanistic arguments in the Discussion. However, aside from shaded 95% CIs, we have no estimate of variance in decoding accuracy across sessions or subjects for either control or lesioned mice. Thus we don't know if the small sample sizes of partial (n = 3) and full lesion (n = 4) groups adequately sample from the underlying population. Their result of Figure 6B may reflect spurious sampling from tail ends of the distributions, rather than a true non-monotonic effect of lesion size on task relevant activity in IC.

      Our responses to the ‘recommendations for the authors’ below lay out in detail how we addressed each comment and concern. Besides filling in key information about how our original analysis aimed at minimizing any potential impact of differences in sound level distributions - namely that trials used for decoding were limited to a subset of sound levels - and which was accidentally omitted in the original manuscript, we have now carried out several additional analyses.

      We would like to highlight one of these because it supplements both the clustering and decoding analysis that we conducted to compare hit and miss trial activity, and directly addresses what the reviewer identified as our work’s main weakness (a possible confound between animal behavior and sound level distributions) and the request for an analysis that operates at the level of single units rather than the population level. Specifically, we assessed, separately for each recorded neuron, whether there was a statistically significant difference in the magnitude of neural activity between hit and miss trials. This approach allowed us to fully balance the numbers of hit and miss trials at each sound level that were entered into the analysis. The results revealed that a large proportion (close to 50%) of units were task modulated, i.e. had significantly different response magnitudes between hit and miss trials, and that this proportion was not significantly different between lesioned and non-lesioned mice. We hope that this, together with the rest of our responses, convincingly demonstrates that the shell of the IC encodes mouse sound detection behavior even when top-down input from the auditory cortex is absent.

      Reviewer #2 (Public Review):

      Summary:

      This study takes a new approach to studying the role of corticofugal projections from auditory cortex to inferior colliculus. The authors performed two-photon imaging of cortico-recipient IC neurons during a click detection task in mice with and without lesions of auditory cortex. In both groups of animals, they observed similar task performance and relatively small differences in the encoding of task-response variables in the IC population. They conclude that non-cortical inputs to the IC provide can substantial task-related modulation, at least when AC is absent. Strengths:

      This study provides valuable new insight into big and challenging questions around top-down modulation of activity in the IC. The approach here is novel and appears to have been executed thoughtfully. Thus, it should be of interest to the community.

      Weaknesses: There are, however, substantial concerns about the interpretation of the findings and limitations to the current analysis. In particular, Analysis of single unit activity is absent, making interpretation of population clusters and decoding less interpretable. These concerns should be addressed to make sure that the results can be interpreted clearly in an active field that already contains a number of confusing and possibly contradictory findings.

      Our responses to the ‘recommendations for the authors’ below lay out in detail how we addressed each comment and concern. Several additional analyses have now been carried out including ones that operate at the level of single units rather than the population level, as requested by the reviewer. We would like to briefly highlight one here because it supplements both the clustering and decoding analysis that we conducted to compare hit and miss trial activity and directly addresses what the other reviewers identified as our work’s main weakness (a possible confound between animal behavior and sound level distributions). Specifically, we assessed, separately for each recorded neuron, whether there was a statistically significant difference in the magnitude of neural activity between hit and miss trials. This approach allowed us to fully balance the numbers of hit and miss trials at each sound level that were entered into the analysis. The results revealed that a large proportion (close to 50%) of units were task modulated, i.e. had significantly different response magnitudes between hit and miss trials, and that this proportion was not significantly different between lesioned and non-lesioned mice. We hope that this, together with the rest of our responses, convincingly demonstrates that the shell of the IC encodes mouse sound detection behavior even when top-down input from the auditory cortex is absent.

      Reviewer #3 (Public Review):

      Summary:

      This study aims to demonstrate that cortical feedback is not necessary to signal behavioral outcome to shell neurons of the inferior colliculus during a sound detection task. The demonstration is achieved by the observation of the activity of cortico-recipient neurons in animals which have received lesions of the auditory cortex. The experiment shows that neither behavior performance nor neuronal responses are significantly impacted by cortical lesions except for the case of partial lesions which seem to have a disruptive effect on behavioral outcome signaling. Strengths:

      The experimental procedure is based on state of the art methods. There is an in depth discussion of the different effects of auditory cortical lesions on sound detection behavior. Weaknesses:

      The analysis is not documented enough to be correctly evaluated. Have the authors pooled together trials with different sound levels for the key hit vs miss decoding/clustering analysis? If so, the conclusions are not well supported, as there are more misses for low sound levels, which would completely bias the outcome of the analysis. It would possible that the classification of hit versus misses actually only reflects a decoding of sound level based on sensory responses in the colliculus, and it would not be surprising then that in the presence or absence of cortical feedback, some neurons responds more to higher sound levels (hits) and less to lower sound levels (misses). It is important that the authors clarify and in any case perform an analysis in which the classification of hits vs misses is done only for the same sound levels. The description of feedback signals could be more detailed although it is difficult to achieve good temporal resolution with the calcium imaging technique necessary for targeting cortico-recipient neurons.

      Our responses to the ‘recommendations for the authors’ below lay out in detail how we addressed each comment and concern. Besides filling in key information about how our original analysis aimed at minimizing any potential impact of differences in sound level distributions - namely that trials used for decoding were limited to a subset of sound levels - and which was accidentally omitted in the original manuscript, we have now carried out several additional analyses to directly address what the reviewer identified as our work’s main weakness (a possible confound between animal behavior and sound level distributions). This includes an analysis in which we were able to demonstrate for one imaging session with a sufficiently large number of trials that limiting the trials entered into the decoding analysis to those from a single sound level did not meaningfully impact decoding accuracy. We would like to highlight another new analysis here because it supplements both the clustering and decoding analyses that we conducted to compare hit and miss trial activity and addresses the other reviewers’ request for an analysis that operates at the level of single units rather than the population level. Specifically, we assessed, separately for each recorded neuron, whether there was a statistically significant difference in the magnitude of neural activity between hit and miss trials. This approach allowed us to fully balance the numbers of hit and miss trials at each sound level that were entered into the analysis. The results revealed that a large proportion (close to 50%) of units were task modulated, i.e. had significantly different response magnitudes between hit and miss trials, and that this proportion was not significantly different between lesioned and non-lesioned mice. We hope that this, together with the rest of our responses, convincingly demonstrates that the shell of the IC encodes mouse sound detection behavior even when top-down input from the auditory cortex is absent.

      Reviewer #1 (Recommendations For The Authors):

      Thank you for the opportunity to read your paper. I think the conclusion is exciting. Indeed, you indicate that perhaps contrary to many of our (untested) assumptions, task-relevant activity in the IC may persist in absence of auditory cortex.

      As mentioned in my public review: Despite my interest in the work, I also think that there are several opportunities to significantly strengthen your conclusions. I feel this point is important because your work will likely guide the efforts of future students and post-docs working on this topic. The data can serve as a beacon to move the field away from the (somewhat naïve) idea that the evolved forebrain imparts behavioral relevance upon an otherwise uncivilized midbrain. This knowledge will inspire a search for alternative explanations. Indeed, although you don't highlight it in your narrative, your results dovetail nicely with several studies showing task-relevant activity in more ventral midbrain areas that project to the IC (e.g., pedunculopontine nuclei; see work from Hikosaka in monkeys, and more recently in mice from Karel Svoboda's lab).

      Thanks for the kind words.

      These studies, in particular the work by Inagaki et al. (2022) outlining how the transformation of an auditory go signal into movement could be mediated via a circuit involving the PPN/MRN (which might rely on the NLL for auditory input) and the motor thalamus, are indeed highly relevant.

      We made the following changes to the manuscript text.

      Line 472:”...or that the auditory midbrain, thalamus and cortex are bypassed entirely if simple acousticomotor transformations, such as licking a spout in response to a sound, are handled by circuits linking the auditory brainstem and motor thalamus via pedunculopontine and midbrain reticular nuclei (Inagaki et al., 2022).”

      The beauty of the eLife experiment is that you are free to incorporate or ignore these suggestions. After all, it's your paper, not mine. Nevertheless, I hope you find my comments useful.<br /> First, a few suggestions to address my three comments in the public review.

      Suggestion for public comment #1: An easy way to address this issue is to average the neural activity separately for each trial outcome at each sound level. That way you can measure if fluorescence amplitude (or integral) varies as a function of mice's action rather than sound level. This approach to data organization would also open the door to the additional analyses for addressing comment #2, such as directly comparing auditory and putatively non-auditory activity in neurons recorded from control and lesioned mice.

      We have carried out additional analyses for distinguishing between the two alternative explanations of the data put forward by the reviewer: That the difference in neural activity between hit and miss trials reflects a) behavior or b) sound level (more precisely: differences in response magnitude arising from a higher proportion of high-sound-level trials in the hit trial group than in the miss trial group). If the data favored b), we would expect no difference in activity between hit and miss trials when plotted separately for each sound level. The new Figure 4 - figure supplement 1 indicates that this is not the case. Hit and miss trial activity are clearly distinct even when plotted separately for different sound levels, confirming that this difference in activity reflects the animals’ behavior rather than sensory information.

      Changes to manuscript.

      Line 214: “While averaging across all neurons cannot capture the diversity of responses, the averaged response profiles suggest that it is mostly trial outcome rather than the acoustic stimulus and neuronal sensitivity to sound level that shapes those responses (Figure 4 – figure supplement 1).”

      Additionally, we assessed for each neuron separately whether there was a significant difference between hit and miss trial activity and therefore whether the activity of the neuron could be considered “task-modulated”. To achieve this, we used equal numbers of hit and miss trials at each sound level to ensure balanced sound level distributions and thus rule out any potential confound between sound level distributions and trial outcome. This analysis revealed that the proportion of task-modulated neurons was very high (close to 50%) and not significantly different between lesioned and non-lesioned mice (Figure 6 - figure supplement 3).

      Changes to the manuscript.

      Line 217: “Indeed, close to half (1272 / 2649) of all neurons showed a statistically significant difference in response magnitude between hit and miss trials…”

      Line 307: “Although the proportion of individual neurons with distinct response magnitudes in hit and miss trials in lesioned mice did not differ from that in non-lesioned mice, it was significantly lower when separating out mice with partial lesions (Figure 6 – figure supplement 3).”

      Differences in the distributions of sound levels in the different trial types could also potentially confound the decoding into hit and miss trials. Our original analysis was actually designed to take this into account but, unfortunately, we failed to include sufficient details in the methods section.

      Changes to the manuscript.

      Line 710: “Rather than including all the trials in a given session, only trials of intermediate difficulty were used for the decoding analysis. More specifically, we only included trials across five sound levels, comprising the lowest sound level that exceeded a d’ of 1.5 plus the two sound levels below and above that level. That ensured that differences in sound level distributions would be small, while still giving us a sufficient number of trials to perform the decoding analysis.“

      In this context, it is worth bearing in mind that a) the decoding analysis was done on a frame-byframe basis, meaning that the decoding score achieved early in the trial has no impact on the decoding score at later time points in the trial, b) sound-driven activity predominantly occurs immediately after stimulus onset and is largely over about 1 s into the trial (see cluster 3, for instance, or average miss trial activity in Figure 4 – figure supplement 1), c) decoding performance of the behavioral outcome starts to plateau 500-1000 ms into the trial and remains high until it very gradually begins to decline after about 2 s into the trial. In other words, decoding performance remains high far longer than the stimulus would be expected to have an impact on the neurons’ activity. Therefore, we would expect any residual bias due to differences in the sound level distribution that our approach did not control for to be restricted to the very beginning of the trial and not to meaningfully impact the conclusions derived from the decoding analysis.

      Finally, we carried out an additional decoding analysis for one imaging session in which we had a sufficient number of trials to perform the analysis not only over the five (59, 62, 65, 68, 71 dB SPL) original sound levels, but also over a reduced range of three (62, 65, 68 dB SPL) sound levels, as well as a single (65 dB SPL) sound level (Figure 6 - figure supplement 1). The mean sound level differences between the hit trial distributions and miss trial distributions for these three conditions were 3.08, 1.01 and 0 dB, respectively. This analysis suggests that decoding performance is not meaningfully impacted by changing the range of sound levels (and sound level distributions), other than that including fewer sound levels means fewer trials and thus noisier decoding.

      Changes to manuscript.

      Line 287: ”...and was not meaningfully affected by differences in sound level distributions between hit and miss trials (Figure 6 – figure supplement 1).”

      Suggestion for public comment #2: Perhaps a solution would be to display example neuron activity in each cluster, recorded in control and lesioned mice. The reader could then visually compare example data from the two groups, and immediately grasp the conclusion that task relevant activity remains in absence of auditory cortex. Additionally, one possibility might be to calculate the difference in neural activity between Hit and Miss trials for each task-modulated neuron. Then, you could compare these values for neurons recorded in control and lesion mice. I feel like this information would greatly add to our understanding of cortico-collicular processing.

      I would also argue that it's perhaps more informative to show one (or a few) example recordings rather than averaging across all cells in a cluster. Example cells would give the reader a better handle on the quality of the imaging, and this approach is more standard in the field. Finally, it would be useful to show the y axis calibration for each example trace (e.g. Figure 5 supp 1). That is also pretty standard so we can immediately grasp the magnitude of the recorded signal.

      We agree that while the information we provided shows that neurons from lesioned and nonlesioned groups are roughly equally represented across the clusters, it does not allow the reader to appreciate how similar the activity profiles of neurons are from each of the two groups. However, picking examples can be highly subjective and thus potentially open to bias. We therefore opted instead to display, separately for lesioned and non-lesioned mice, the peristimulus time histograms of all neurons in each cluster, as well as the cluster averages of the response profiles (Figure 5 - figure supplement 3). This, we believe, convincingly illustrates the close correspondence between neural activity in lesioned and non-lesioned mice across different clusters. All our existing and new figures indicate the response magnitude either on the figures’ y-axis or via scale/color bars.

      Changes to manuscript.

      Line 254: “Furthermore, there was a close correspondence between the cluster averages of lesioned and non-lesioned mice (Figure 5 – figure supplement 3).”

      Furthermore, we’ve now included a video of the imaging data which, we believe, gives the reader a much better handle on the data quality than further example response profiles would.

      Changes to manuscript.

      Line 197: ”...using two-photon microscopy (Figure 4B, Video 1).”

      Suggestion for public comment #3: In absence of laborious and costly follow-up experiments to boost the sample size of partial and complete lesion groups, it may be more prudent to simply tone down the claims that lesion size differentially impacts decoding accuracy. The results of this analysis are not necessary for your main claims.

      Our new results on the proportions of ‘task-modulated’ neurons (Figure 6 - figure supplement 3) across different experimental groups show that there is no difference between non-lesioned and lesioned mice as a whole, but mice with partial lesions have a smaller proportion of taskmodulated neurons than the other two groups. While this corroborates the results of the decoding analysis, we certainly agree that the small sample size is a caveat that needs to be acknowledged.

      Changes to manuscript.

      Line 477: ”Some differences were observed for mice with only partial lesions of the auditory cortex.

      Those mice had a lower proportion of neurons with distinct response magnitudes in hit and miss trials than mice with (near-)complete lesions. Furthermore, trial outcomes could be read out with lower accuracy from these mice. While this finding is somewhat counterintuitive and is based on only three mice with partial lesions, it has been observed before that smaller lesions…”

      A few more suggestions unrelated to public review:

      Figure 1: This is somewhat of an oddball in this manuscript, and its inclusion is not necessary for the main point. Indeed, the major conclusion of Fig 1 is that acute silencing of auditory cortex impairs task performance, and thus optogenetic methods are not suitable to test your hypothesis. However, this conclusion is also easily supported from decades of prior work, and thus citations might suffice.

      We do not agree that these data can easily be substituted with citations of prior published work. While previous studies (Talwar et al., 2001, Li et al., 2017) have demonstrated the impact of acute pharmacological silencing on sound detection in rodents, pharmacological and optogenetic silencing are not equivalent. Furthermore, we are aware of only one published study (Kato et al., 2015) that investigated the impact of optogenetically perturbing auditory cortex on sound detection (others have investigated its impact on discrimination tasks). Kato et al. (2015) examined the effect of acute optogenetic silencing of auditory cortex on the ability of mice to detect the offsets of very long (5-9 seconds) sounds, which is not easily comparable to the click detection task employed by us. Furthermore, when presenting our work at a recent meeting and leaving out the optogenetics results due to time constraints, audience members immediately enquired whether we had tried an optogenetic manipulation instead of lesions. Therefore, we believe that these data represent a valuable piece of information that will be appreciated by many readers and have decided not to remove them from the manuscript.

      A worst case scenario is that Figure 1 will detract from the reader's assessment of experimental rigor. The data of 1C are pooled from multiple sessions in three mice. It is not clear if the signed-rank test compares performance across n = 3 mice or n = 13 sessions. If the latter, a stats nitpicker could argue that the significance might not hold up with a nested analysis considering that some datapoints are not independent of one another. Finally, the experiment does not include a control group, gad2-cre mice injected with a EYFP virus. So as presented, the data are equally compatible with the pessimistic conclusion that shining light into the brain impairs mice's licking. My suggestion is to simply remove Figure 1 from the paper. Starting off with Figure 3 would be stronger, as the rest of the study hinges upon the knowledge that control and lesion mice's behavior is similar.

      Instead of reporting the results session-wise and doing stats on the d’ values, we now report results per mouse and perform stats on the proportions of hits and false alarms separately for each mouse. The results are statistically significant for each mouse and suggest that the differences in d’ are primarily caused by higher false alarm rates during the optogenetic perturbation than in the control condition.

      Changes to manuscript.

      New Figure 1.

      We agree that including control mice not expressing ChR2 would be important for fully characterizing the optogenetic manipulation and that the lack of this control group should be acknowledged. However, in the context of this study, the outcome of performing this additional experiment would be inconsequential. We originally considered using an optogenetic approach to explore the contribution of cortical activity to IC responses, but found that this altered the animals’ sound detection behavior. Whether that change in behavior is due to activation of the opsin or simply due to light being shone on the brain has no bearing on the conclusion that this type of manipulation is unsuitable for determining whether auditory cortex is required for the choice-related activity that we recorded in the IC.

      Changes to manuscript.

      Line 106: ”Although a control group in which the auditory cortex was injected with an EYFP virus lacking ChR2 would be required to confirm that the altered behavior results from an opsindependent perturbation of cortical activity, this result shows that this manipulation is also unsuitable… ”

      Figure 2, comment #1: The micrograph of panel B shows the densest fluorescence in the central IC. You interpret this as evidence of retrograde labeling of central IC neurons that project to the shell IC. This is a nice finding, but perhaps a more relevant micrograph would be to show the actual injection site in the shell layers. The rest of Figure 2 documents the non-auditory cortical sources of forebrain feedback. Since non-auditory cortical neurons may or may not target distinct shell IC sub-circuits, it's important to know where the retrograde virus was injected. Stylistic comment: The flow of the panels is somewhat unorthodox. Panel A and B follow horizontally, then C and D follow vertically, followed by E-H in a separate column. Consider sequencing either horizontally or vertically to maximize the reader's experience.

      Figure 2, comment # 2: It would also be useful to show more rostral sections from these mice, perhaps as a figure supplement, if you have the data. I think there is a lot of value here given a recent paper (Olthof et al., 2019 Jneuro) arguing that the IC receives corticofugal input from areas more rostral to the auditory cortex. So it would be beneficial for the field to know if these other cortical sources do or do not represent likely candidates for behavioral modulation in absence of auditory cortex.

      Figure 2, comment #3: You have a striking cluster of retrogradely labeled PPC neurons, and I'm not sure PPC has been consistently reported as targeting the IC. It would be good to confirm that this is a "true" IC projection as opposed to viral leakage into the SC. Indeed, Figure 2, supplement 2 also shows some visual cortex neurons that are retrogradely labeled. This has bearing on the interpretations, because choice-related activity is rampant in PPC, and thus could be a potential source of the task relevant activity that persists in your recordings. This could be addressed as the point above, by showing the SC sections from these same mice.

      All IC injections were made under visual guidance with the surface of the IC and adjacent brain areas fully exposed after removal of the imaging window. Targeting the IC and steering clear of surrounding structures, including the SC, was therefore relatively straightforward.

      We typically observed strong retrograde labeling in the central nucleus after viral injections into the dorsal IC and, given the moderate injection volume (~50 nL at each of up to three sites), it was also typical to see spatially fairly confined labeling at the injection sites. For the mouse shown in Figure 2, we do not have further images of the IC. This was one of the earliest mice to be included in the study and we did not have access to an automatic slide scanner at the time. We had to acquire confocal images in a ‘manual’ and very time-consuming manner and therefore did not take further IC images for this mouse. We have now included, however, a set of images spanning the whole IC and the adjacent SC sections for the mouse for which we already show sections in Figure 2 - figure supplement 2. These were added as Figure 2 - figure supplement 3A to the manuscript. These images show that the injections were located in the caudal half of the IC and that there was no spillover into the SC - close inspection of those sections did not reveal any labeled cell bodies in the SC. Furthermore, we include as Figure 2 - figure supplement 3B a dozen additional rostral cortical sections of the same mouse illustrating corticocollicular neurons in regions spanning visual, parietal, somatosensory and motor cortex. Given the inclusion of the IC micrographs in the new supplementary figure, we removed panel B from Figure 2. This should also make it easier for the reader to follow the sequencing of the remaining panels.

      Changes to manuscript.

      New Figure 2 - figure supplement 3.

      Line 159: “After the experiments, we injected a retrogradely-transported viral tracer (rAAV2-retrotdTomato) into the right IC to determine whether any corticocollicular neurons remained after the auditory cortex lesions (Figure 2, Figure 2 – figure supplement 2, Figure 2 – figure supplement 3). The presence of retrogradely-labeled corticocollicular neurons in non-temporal cortical areas (Figure 2) was not the result of viral leakage from the dorsal IC injection sites into the superior colliculus (Figure 2 – figure supplement 3).”

      Line 495: “...projections to the IC, such as those originating from somatosensory cortical areas (Lohse et al., 2021; Lesicko et al., 2016) and parietal cortex may have contributed to the response profiles that we observed.

      Figure 5 (see also public review point #2): I am not convinced that this unsupervised method yields particularly meaningful clusters; a grain of salt should be provided to the reader. For example, Clusters 2, 5, 6, and 7 contain neurons that pretty clearly respond with either short latency excitation or inhibition following the click sound on Hits. I would argue that neurons with such diametrically opposite responses should not be "classified" together. You can see the same issue in some of Namboodiri/Stuber's clustering (their Figure 1). It might be useful to make it clear to the reader that these clusters can reflect idiosyncrasies of the algorithm, the behavior task structure, or both.

      We agree.

      Changes to manuscript.

      Line 666: “While clustering is a useful approach for organizing and visualizing the activity of large and heterogeneous populations of neurons, we need to be mindful that, given continuous distributions of response properties, the locations of cluster boundaries can be somewhat arbitrary and/or reflect idiosyncrasies of the chosen method and thus vary from one algorithm to another. We employed an approach very similar to that described in Namboodiri et al. (2019) because it is thought to produce stable results in high-dimensional neural data (Hirokawa et al. 2019).”

      Methods:

      How was a "false alarm" defined? Is it any lick happening during the entire catch trial, or only during the time period corresponding to the response window on stimulus trials?

      The response window was identical for catch and stimulus trials and a false alarm was defined as licking during the response window of a catch trial.

      Changes to manuscript.

      Line 598: “During catch trials, neither licking (‘false alarm’) during the 1.5-second response window …”

      L597 and so forth: What's the denominator in the conversion from the raw fluorescence traces into DF/F? Did you take the median or mode fluorescence across a chunk of time? Baseline subtract average fluorescence prior to click onset? Similarly, please provide some more clarification as to how neuropil subtraction was achieved. This information will help us understand how the classifier can decode trial outcome from data prior to sound onset.

      Signal processing did not involve the subtraction of a pre-stimulus period.

      Changes to manuscript.

      Line 629: ”Neuropil extraction was performed using default suite2p parameters (https://suite2p.readthedocs.io/en/latest/settings.html), neuropil correction was done using a coefficient of 0.7, and calcium ΔF/F signals were obtained by using the median over the entire fluorescence trace as F0. To remove slow fluctuations in the signal, a baseline of each neuron’s entire trace was calculated by Gaussian filtering in addition to minimum and maximum filtering using default suite2p parameters. This baseline was then subtracted from the signal.”

      Was the experimenter blinded to the treatment group during the behavior experiments? If not, were there issues that precluded blinding (limited staffing owing to lab capacity restrictions during the pandemic)? This is important to clarify for the sake of rigor and reproducibility.

      Changes to manuscript.

      Line 574: “The experimenters were not blinded to the treatment group, i.e. lesioned or non-lesioned, but they were blind to the lesion size both during the behavior experiments and most of the data processing.”

      Minor:

      L127-128: "In order to test...lesioned the auditory cortex bilaterally in 7 out of 16 animals". I would clarify this by changing the word animals to "mice" and 7 out of 16 by stating n = 9 and n = 7 are control and lesion groups, respectively.

      Agreed.

      Changes to manuscript.

      Line 129: “...compared the performance of mice with bilateral lesions of the auditory cortex (n = 7) with non-lesioned controls (n = 9)”

      L225-226: You rule out self-generated sounds as a likely source of behavioral modulation by citing Nate Sawtell's paper in the DCN. However, Stephen David's lab suggested that in marmosets, post sound activity in central IC may in fact reflect self-generated sounds during licking. I suggest addressing this with a nod to SVD's work (Singla et al., 2017; but see Shaheen et al., 2021).

      Agreed.

      Changes to manuscript.

      Line 243: “(Singla et al., 2017; but see Shaheen et al., 2021)”

      Line 238 - 239: You state that proportions only deviate greater than 10% for one of the four statistically significant clusters. Something must be unclear here because I don't understand: The delta between the groups in the significant clusters of Fig 5C is (from left to right) 20%, 20%, 38%, and 12%. Please clarify.

      Our wording was meant to convey that a deviation “from a 50/50 split” of 10% means that each side deviates from 50 by 10% resulting in a 40/60 (or 60/40) split. We agree that that has the potential to confuse readers and is not as clear as it could be and have therefore dropped the ambiguous wording.

      Changes to manuscript.

      Line 253: ”,..the difference between the groups was greater than 20% for only one of them.”

      L445: I looked at the cited Allen experiment; I'd be cautious with the interpretation here. A monosynaptic IC->striatum projection is news to me. I think Allen Institute used an AAV1-EGFP virus for these experiments, no? As you know, AAV1 is quite transsynaptic. The labeled fibers in striatum of that experiment may reflect disynaptic labeling of MGB neurons (which do project to striatum).

      Agreed. We deleted the reference to this Allen experiment.

      L650: Please define "network activity". Is this the fluorescence value for each ROI on each frame of each trial? Averaged fluorescence of each ROI per frame? Total frame fluorescence including neuropil? Depending on who you ask, each of these measures provides some meaningful readout of network activity, so clarification would be useful.

      Changes to manuscript.

      Line 707: “Logistic regression models were trained on the network activity of each session, i.e., the ΔF/F values of all ROIs in each session, to classify hit vs miss trials. This was done on a frame-by-frame basis, meaning that each time point (frame) of each session was trained separately.

      Figure 3 narrative or legend: Listing the F values for the anova would be useful. There is pretty clearly a main effect of training session for hits, but what about for the false alarms? That information is important to solidify the result, and would help more specialized readers interpret the d-prime plot in this figure.

      Agreed. There were significant main effects of training day for both hit rates and false alarm rates (as well as d’).

      Changes to manuscript.

      Line 165: “The ability of the mice to learn and perform the click detection task was evident in increasing hit rates and decreasing false alarm rates across training days (Figure 3A, p < 0.01, mixed-design ANOVAs).”

      In summary, thank you for undertaking this work. Your conclusions are provocative, and thus will likely influence the field's direction for years to come.

      Thank you for those kind words and valuable and constructive feedback, which has certainly improved the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      MAJOR CONCERNS

      (1) (Fig. 5) What fraction of individual neurons actually encode task-related information in each animal group? How many neurons respond to sound? The clustering and decoding analyses are interesting, but they obscure these simple questions, which get more directly at the main questions of the study. Suggested approach: For a direct comparison of AC-lesioned and -non-lesioned animals, why not simply compare the mean difference between PSTH response for each neuron individually? To test for trial outcome effects, compare Hit and Miss trials (same stimulus, different behavior) and for sound response effects, compare Hit and False alarm trials (same behavior, different response). How do you align for time in the latter case when there's no stimulus? Align to the first lick event. The authors should include this analysis or explain why their approach of jumping right to analysis of clusters is justified.

      We have now calculated the fraction of neurons that encode trial outcome by comparing hit and miss trial activity. That fraction does not differ between non-lesioned animals and lesioned animals as a whole, but is significantly smaller in mice with partial lesions. The author’s suggestion of comparing hit and false alarm trial activity to assess sound responsiveness is problematic because hit trials involve reward delivery and consumption. Consequently, they are behaviorally very different from false alarm trials (not least because hit trials tend to contain much more licking). Therefore, we calculated the fraction of neurons that respond to the acoustic stimulus by comparing activity before and after stimulus onset in miss trials. We found no significant difference between the non-lesioned and lesioned mice or between subgroups.

      We have addressed these points with the following changes to the manuscript:

      Line 217: “Indeed, close to half (1272 / 2649) of all neurons showed a statistically significant difference in response magnitude between hit and miss trials, while only a small fraction (97 / 2649) exhibited a significant response to the sound.”

      Line 307: “Although the proportion of individual neurons with distinct response magnitudes in hit and miss trials in lesioned mice did not differ from that in non-lesioned mice, it was significantly lower when separating out mice with partial lesions (Figure 6 – figure supplement 3).”

      Line 648: “Analysis of task-modulated and sound-driven neurons. To identify individual neurons that produced significantly different response magnitudes in hit and miss trials, we calculated the mean activity for each stimulus trial by taking the mean activity over the 5 seconds following stimulus presentation and subtracting the mean activity over the 2 seconds preceding the stimulus during that same trial. A Mann-Whitney U test was then performed to assess whether a neuron showed a statistically significant difference (Benjamini-Hochberg adjusted p-value of 0.05) in response magnitude between hit and miss trials. The analysis was performed using equal numbers of hit and miss trials at each sound level to ensure balanced sound level distributions. If, for a given sound level, there were more hit than miss trials, we randomly selected a sample of hit trials (without substitution) to match the sample size for the miss trials and vice versa. Sounddriven neurons were identified by comparing the mean miss trial activity before and after stimulus presentation. Specifically, we performed a Mann-Whitney U test to assess whether there was a statistically significant difference (Benjamini-Hochberg adjusted p-value of 0.05) between the mean activity over the 2 seconds preceding the stimulus and the mean activity over the 1 second period following stimulus presentation.”

      Some more specific concerns about focusing only on cluster-level and population decoding analysis are included below.

      (2) (L 234) "larger field of view". Do task-related or lesion-dependent effects depend on the subregion of IC imaged? Some anatomists would argue that the IC shell is not a uniform structure, and concomitantly, task-related effects may differ between fields. Did coverage of IC subregions differ between experimental groups? Is there any difference in task related effects between subregions of IC? Or maybe all this work was carried out only in the dorsal area? The differences between lesioned and non-lesioned animals are relatively small, so this may not have a huge impact, but a more nuanced discussion that accounts for observed or potential (if not tested) differences between regions of the IC.

      The specific subregion coverage could also impact the decoding analysis (Fig 6), and if possible it might be worth considering an interaction between field of view and lesion size on decoding.

      Each day we chose a new imaging location to avoid recording the same neurons more than once and aimed to sample widely across the optically accessible surface of the IC. We typically stopped the experiment only when there were no more new areas to record from. In terms of the depth of the imaged neurons, we were limited by the fact that corticorecipient neurons become sparser with depth and that the signal available from the GCaMP6f labeling of the Ai95 mice becomes rapidly weaker with increasing distance from the surface. This meant that we recorded no deeper than 150 µm from the surface of the IC. Consequently, while there may have been some variability in the average rostrocaudal and mediolateral positioning of imaging locations from animal to animal due to differences between mice in how much of the IC surface was visible, cranial window positioning, and in neuronal labeling etc, our dataset is anatomically uniform in that all recorded neurons receive input from the auditory cortex and are located within 150 µm of the surface of the IC. Therefore, we think it highly unlikely that small sampling differences across animals could have a meaningful impact on the results.

      Given that there is no consensus as to where the border between the dorsal and external/lateral cortices of the IC is located and that it is typically difficult to find reliable anatomical reference points (the location of the borders between the IC and surrounding structures is not always obvious during imaging, i.e. a transition from a labeled area to a dark area near the edge of the cranial window could indicate a border with another structure, but also the IC surface sloping away from the window or simply an unlabeled area within the IC), we made no attempt to assign our recordings from corticorecipient neurons to specific subdivisions of the IC.

      Changes to manuscript.

      Line 195: “We then proceeded to record the activity of corticorecipient neurons within about 150 µm of the dorsal surface of the IC using two-photon microscopy (Figure 4B, Video 1).”

      Line 375: “We imaged across the optically accessible dorsal surface of the IC down to a depth of about 150 µm below the surface. Consequently, the neurons we recorded were located predominantly in the dorsal cortex. However, identifying the borders between different subdivisions of the IC is not straightforward and we cannot rule out the possibility that some were located in the lateral cortex.”

      (3) (L 482-483) "auditory cortex is not required for the task-related activity recording in IC neurons of mice performing a sound detection task". Most places in the text are clearer, but this statement is confusing. Yes, animals with lesions can have a "normal"-looking IC, but does that mean that AC does not strongly modulate IC during this behavior in normal animals? The authors have shown convincingly that subcortical areas can both shape behavior and modulate IC normally, but AC may still be required for IC modulation in non-lesioned animals. Given the complexity of this system, the authors should make sure they summarize their results consistently and clearly throughout the manuscript.

      The reviewer raises an important point. What we have shown is that corticorecipient dorsal IC neurons in mice without auditory cortex show neural activity during a sound detection task that is largely indistinguishable from the activity of mice with an intact auditory cortex. In lesioned mice, the auditory cortex is thus not required. Whether the IC activity of the non-lesioned group can be shaped by input from the auditory cortex in a meaningful way in other contexts, such as during learning, is a question that our data cannot answer.

      Changes to manuscript.

      Line 508: "While modulation of IC activity by this descending projection has been implicated in various functions, most notably in the plasticity of auditory processing, we have shown in mice performing a sound detection task that IC neurons show task-related activity in the absence of auditory cortical input."

      LESSER CONCERNS

      (L. 106-107) "Optogenetic suppression of cortical activity is thus also unsuitable..." It appears that behavior is not completely abolished by the suppression. One could also imagine using a lower dose of muscimol for partial inactivation of AC feedback. When some behavior persists, it does seem possible to measure task-related changes in the IC. This may not be necessary for the current study, but the authors should consider how these transient methods could be applied usefully in the Discussion. What about inactivation of cortical terminals in the IC? Is that feasible?

      Our argument is not that acute manipulations are unsuitable because they completely abolish the behavior, but because they significantly alter the behavior. Although it would not be trivial to precisely measure the extent of pharmacological cortical silencing in behaving mice that have been fitted with a midbrain window, it should be possible to titrate the size of a muscimol injection to achieve partial silencing of the auditory cortex that does not fully abolish the ability to detect sounds. However, such an outcome would likely render the data uninterpretable. If no effect on IC activity was observed, it would not be possible to conclude whether this was due to the fact that the auditory cortex was only partially silenced or that projections from the auditory cortex have no influence on the recorded IC activity. Similarly, if IC activity was altered, it would not be possible to say whether this was due to altered descending modulation resulting from the (partially) silenced auditory cortex or to the change in behavior, which would likely be reflected in the choice-related activity measured in the IC.

      Silencing of corticocollicular axons in the IC is potentially a more promising approach and we did devote a considerable amount of time and effort to establishing a method that would allow us to simultaneously image IC neurons while silencing corticocollicular axons, trying both eNpHR3.0 and Jaws with different viral labeling approaches and mouse lines. However, we ultimately abandoned those attempts because we were not convinced that we had achieved sufficient silencing or that we would be able to convincingly verify this. Furthermore, axonal silencing comes with its own pitfalls and the interpretation of its consequences is not straightforward. Given that our discussion already contains a section (line 421) on axonal silencing, we do not feel there would be any benefit in adding to that.

      (Figure 1). Can the authors break down the performance for FA and HR, as they do in Fig. 3? It would be helpful to know what aspect of behavior is impaired by the transient inactivation.

      Good point. Figure 1 has been updated to show the results separately for hit rates, false alarms and d’. The new figure indicates that the change in d’ is primarily a consequence of altered false alarm rates. Please also see our response to a related comment by reviewer #1.

      Changes to manuscript.

      New figure 1.

      (Figure 4 legend). Minor: Please clarify, what is time 0 in panel C? Time of click presentation?

      Yes, that is correct.

      Changes to manuscript.

      Line 209: ”Vertical line at time 0 s indicates time of click presentation.”

      (L. 228-229). There has been a report of lick and other motor related activity in the IC - e.g., see Shaheen, Slee et al. (J Neurosci 2021), the timing of which suggests that some of it may be acoustically driven.

      Thanks for pointing this out. Shaheen et al., 2021 should certainly have been cited by us in this context as well as in other parts of the manuscript.

      Changes to manuscript.

      Line 243: “(Singla et al., 2017; but see Shaheen et al., 2021)”

      Also, have the authors considered measuring a peri-lick response? The difference between hit and miss trials could be perceptual or it could reflect differences in motor activity. This may be hard to tease apart, but, for example, one can test whether activity is stronger on trials with many licks vs. few licks?

      (L. 261) "Behavior can be decoded..." similar or alternative to the previous question of evoked activity, can you decode lick events from the population activity?

      The difference between hit and miss trial activity almost certainly partially reflects motor activity associated with licking. This was stated in the Discussion, but to make that point more explicitly, we now include a plot of average false alarm trial activity, i.e. trials without sound (catch trials) in which animals licked (but did not receive a reward).

      Given a sufficient number of catch trials, it should be possible to decode false alarm and correct rejection trials. However, our experiment was not designed with that in mind and contains a much smaller number of catch trials than stimulus trials (approximately one tenth the number of stimulus trials), so we have not attempted this.

      Changes to manuscript.

      New Figure 4 - figure supplement 1.

      (L. 315) "Pre-stimulus activity..." Given reports of changes in activity related to pupil-indexed arousal in the auditory system, do the authors by any chance have information about pupil size in these datasets?

      Given that all recordings were performed in the dark, fluctuations in pupil diameter were relatively small. Therefore, we have not made any attempt to relate pupil diameter to any of the variables assessed in this manuscript.

      (L. 412) "abolishes sound detection". While not exactly the same task, the authors might comment on Gimenez et al (J Neurophys 2015) which argued that temporary or permanent lesioning of AC did not impair tone discrimination. More generally, there seems to be some disagreement about what effects AC lesions have on auditory behavior.

      Thank you for this suggestion. Gimenez et al. (2015) investigated the ability of freely moving rats to discriminate sounds (and, in addition, how they adapt to changes in the discrimination boundary). Broadly consistent with later reports by Ceballo et al. (2019) (mild impairment) and O’Sullivan et al. (2019) (no impairment), Gimenez et al. (2015) reported that discrimination performance is mildly impaired after lesioning auditory cortex. Where the results of Gimenez et al. (2015) stand out is in the comparatively mild impairments that were seen in their task when they used muscimol injections, which contrast with the (much) larger impairments reported by others (e.g. Talwar et al., 2001; Li et al., 2017; Jaramillo and Zador, 2014).

      Changes to manuscript.

      Line 433: ”However, transient pharmacological silencing of the auditory cortex in freely moving rats (Talwar et al., 2001), as well as head-fixed mice (Li et al., 2017), completely abolishes sound detection (but see Gimenez et al., 2015).”

      (L. 649) "... were generally separable" Is the claim here that the clusters are really distinct from each other? This is unexpected, and it might be helpful if the authors could show this result in a figure.

      The half-sentence that this comment refers to has been removed from the methods section. Please also see a related comment by reviewer #1 which prompted us to add the following to the methods section.

      Changes to manuscript.

      Line 666: “While clustering is a useful approach for organizing and visualizing the activity of large and heterogeneous populations of neurons we need to be mindful that, given continuous distributions of response properties, the locations of cluster boundaries can be somewhat arbitrary and/or reflect idiosyncrasies of the chosen method and thus vary from one algorithm to another. We employed an approach very similar to that described in Namboodiri et al. (2019) because it is thought to produce stable results in high-dimensional neural data (Hirokawa et al. 2019).”

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors must absolutely clarify if the hit versus misses decoding and clustering analysis is done for a single sound level or for multiple sound levels (what is the fraction of trials for each sound leve?). If the authors did it for multiple sound levels they should redo all analyses sound-level by sound-level, or for a single sound level if there is one that dominates. No doubt that there is information about the trial outcome in IC, but it should not be over-estimated by a confound with stimulus information.

      This is an important point. The original clustering analysis was carried out across different sound levels. We have now carried out additional analysis for distinguishing between two alternative explanations of the data, which were also raised by reviewer #1. – that the difference in neural activity between hit and miss trials could reflect a) the animals’ behavior or b) relatively more hit trials at higher sound levels, which would be expected to produce stronger responses. If the data favored b), we would expect no difference in activity between hit and miss trials when plotted separately for different sound levels. The new figure 4 - figure supplement 1 indicates that that is not the case. Hit and miss trial activity are clearly distinct even when plotted separately for different sound levels, confirming that this difference in activity reflects the animals’ behavior rather than sensory information.

      We made the following changes to manuscript.

      Line 214: “While averaging across all neurons cannot capture the diversity of responses, the averaged response profiles suggest that it is mostly trial outcome rather than the acoustic stimulus and neuronal sensitivity to sound level that shapes those responses (Figure 4 – figure supplement 1).”

      Differences in the distributions of sound levels in the different trial types could also potentially confound the decoding into hit and miss trials. Our analysis actually aimed to take this into account but, unfortunately, we failed to include sufficient details in the methods section.

      Changes to manuscript.

      Line 710: “Rather than including all the trials in a given session, only trials of intermediate difficulty were used for the decoding analysis. More specifically, we only included trials across five sound levels, comprising the lowest sound level that exceeded a d’ of 1.5 plus the two sound levels below and above that level. That ensured that differences in sound level distributions would be small, while still giving us a sufficient number of trials to perform the decoding analysis.“

      In this context, it is worth bearing in mind that a) the decoding analysis was done on a frame-byframe basis, meaning that the decoding score achieved early in the trial has no impact on the decoding score at later time points in the trial, b) sound-driven activity predominantly occurs immediately after stimulus onset and is largely over about 1 s into the trial (see cluster 3, for instance, or average miss trial activity in figure 4 - figure supplement 1), c) decoding performance of the behavioral outcome starts to plateau 500-1000 ms into the trial and remains high until it very gradually begins to decline after about 2 s into the trial. In other words, decoding performance remains high far longer than the stimulus would be expected to have an impact on the neurons’ activity. Therefore, we would expect any residual bias due to differences in the sound level distribution that our approach did not control for to be restricted to the very beginning of the trial and not to meaningfully impact the conclusions derived from the decoding analysis.

      Furthermore, we carried out an additional decoding analysis for one imaging session in which we had a sufficient number of trials to perform the analysis not only over the five (59, 62, 65, 68, 71 dB SPL) original sound levels, but also over a reduced range of three (62, 65, 68 dB SPL) sound levels, as well as a single (65 dB SPL) sound level (Figure 6 - figure supplement 1). The mean sound level difference between the hit trial distributions and miss trial distributions for these three conditions were 3.08, 1.01 and 0 dB, respectively. This analysis suggests that decoding performance is not meaningfully impacted by changing the range of sound levels (and sound level distributions) other than that including fewer sound levels means fewer trials and thus noisier decoding.

      Changes to manuscript.

      Line 287: ”...and was not meaningfully affected by differences in sound level distributions between hit and miss trials (Figure 6 – figure supplement 1).”

      Finally, in order to supplement the decoding analysis, we determined for each individual neuron whether there was a significant difference between the average hit and average miss trial activity. Note that this was done using equal numbers of hit and miss trials at each sound level to ensure balanced sound level distributions and to rule out any potential confound of sound level. This revealed that the proportion of neurons containing “information about trial outcome” was generally very high, close to 50% on average, and not significantly different between lesioned and non-lesioned mice.

      Changes to manuscript.

      Line 307: “Although the proportion of individual neurons with distinct response magnitudes in hit and miss trials in lesioned mice did not differ from that in non-lesioned mice, it was significantly lower when separating out mice with partial lesions (Figure 6 – figure supplement 3).”

      Line 648: “Analysis of task-modulated and sound-driven neurons. To identify individual neurons that produced significantly different response magnitudes in hit and miss trials, we calculated the mean activity for each stimulus trial by taking the mean activity over the 5 seconds following stimulus presentation and subtracting the mean activity over the 2 seconds preceding the stimulus during that same trial. A Mann-Whitney U test was then performed to assess whether a neuron showed a statistically significant difference (Benjamini-Hochberg adjusted p-value of 0.05) in response magnitude between hit and miss trials. The analysis was performed using equal numbers of hit and miss trials at each sound level to ensure balanced sound level distributions. If, for a given sound level, there were more hit than miss trials we randomly selected a sample of hit trials (without substitution) to match the sample size for the miss trials and vice versa. ”

      (2) I have the feeling that the authors do not exploit fully the functional data recorded with two-imaging. They identify several cluster but do not describe their functional differences. For example, cluster 3 is obviously mainly sensory driven as it is not modulated by outcome. This could be mentioned. This could also be used to rule out that trial outcome is the results of insufficient sensory inputs. Could this cluster be used to predict trial outcome at the onset response? Could it be used to predict the presence of the sound, and with which accuracy. The authors discuss a bit the different cluster type, but in a very elusive manner. I recognize that one should be careful with the use of signal analysis methods in calcium imaging but a simple linear deconvolution of the calcium dynamic who help to illustrate the conclusions that the authors propose based on peak responses. It would also be very interesting to align the clusters responses (deconvolved) to the timing of licking and rewards event to check if some clusters do not fire when mice perform licks before the sound comes. It would help clarify if the behavioral signals described here require both the presence of the sound and the behavioral action or are just the reflection of the motor command. As noted by the authors, some clusters have late peak responses (2 and 5). However, 2 and 5 are not equivalent and a deconvolution would evidence that much better. 2 has late onset firing. 5 has early onset but prolonged firing.

      We agree with the reviewer’s statement that “cluster 3 is obviously mainly sensory driven”. In the Discussion we refer to cluster 3 as having a “largely behaviorally invariant response profile to the auditory stimulus” (line X), which is consistent with the statement of the reviewer. With regard to the reviewer’s suggestion to describe the “functional differences” between the clusters, we would like to refer to the subsequent three sentences of the same paragraph in which we speculate on the cognitive and behavioral variables that may underlie the response profiles of different clusters. Given the limitations imposed by the task structure, we do not think it is justified to expand on this.

      We have added an additional analysis in order to explicitly address the question of which neurons are sound responsive (please also see response to point 3 below and to point 1 of reviewer #2). That trial outcome could be predicted on the basis of only the sound-responsive neurons’ activity during the initial period of the trial (“predict trial outcome at the onset response”) is unlikely given their small number (only 97 of 2649 neurons show a statistically significant sound-evoked response) and given that only a minority (42/98) of those sound-driven neurons are also modulated by trial outcome within that initial trial period (i.e. 0-1s after stimulus onset; data not shown).

      Changes to manuscript.

      Line 219: “..., while only a small fraction (97 / 2649) exhibited a significant response to the sound.”

      Line 658: “Sound-driven neurons were identified by comparing the mean miss trial activity before and after stimulus presentation. Specifically, we performed a Mann-Whitney U test to assess whether there was a statistically significant difference (Benjamini-Hochberg adjusted p-value of 0.05) between the mean activity over the 2 seconds preceding the stimulus and the mean activity over the 1 second period following stimulus presentation. This analysis was performed using miss trials with click intensities from 53 dB SPL to 65 dB SPL (many sessions contained very few or no miss trials at higher sound levels).”

      While calcium traces represent an indirect measure of neural activity, deconvolution does not necessarily provide an accurate picture of the spiking underlying those traces and has the potential to introduce additional problems. For instance, deconvolution algorithms tend to perform poorly at inferring the spiking of inhibited neurons (Vanwalleghem et al., 2021). Given that suppression is such a prominent feature of IC activity and is evident both in our calcium data as well as in the electrophysiology data of others (Franceschi and Barkat, 2021), we decided against using deconvolved spikes in our analyses. See also the side-by-side comparison below of the hit and miss trial activity of one example neuron based on either the calcium trace (left) or deconvolved spikes (right) (extracted using the OASIS algorithm (Friedrich et al., 2017) incorporated into suite2p (Pachitariu et al., 2016).

      Author response image 1.

      (3) Along the same line, the very small proportion of really sensory driven neurons (cluster 3) is not discussed. Is it what on would expect in typical shell or core IC neurons?

      As requested by reviewer #2 and mentioned in response to the previous point, we have now quantified the number of neurons in the dataset that produced significant responses to sound (97 / 2649). For a given imaging area, the fraction of neurons that show a statistically significant change in neural activity following presentation of a click of between 53 dB SPL and 65 dB SPL rarely exceeded ten percent. While that number is low, it is not necessarily surprising given the moderate intensity and very short duration of the stimuli. For comparison: Using the same transgenics, labeling approach and imaging setup and presenting 200-ms long pure tones at 60 dB SPL with frequencies between 2 kHz and 64 kHz, we typically find that between a quarter and a third of neurons in a given imaging area exhibit a statistically significant response (data not shown).

      Changes to manuscript.

      Line 219: “..., while only a small fraction (97 / 2649) exhibited a significant response to the sound.”

      Line 658: “Sound-driven neurons were identified by comparing the mean miss trial activity before and after stimulus presentation. Specifically, we performed a Mann-Whitney U test to assess whether there was a statistically significant difference (Benjamini-Hochberg adjusted p-value of 0.05) between the mean activity over the 2 seconds preceding the stimulus and the mean activity over the 1 second period following stimulus presentation. This analysis was performed using miss trials with click intensities from 53 dB SPL to 65 dB SPL (many sessions contained very few or no miss trials at higher sound levels).”

      Line 220: “While the number of sound-responsive neurons is low, it is not necessarily surprising given the moderate intensity and very short duration of the stimuli. For comparison: Using the same transgenics, labeling approach and imaging setup and presenting 200-ms long pure tones at 60 dB SPL with frequencies between 2 kHz and 64 kHz, we typically find that between a quarter and a third of neurons in a given imaging area exhibit a statistically significant response (data not shown).”

      (4) In the discussion, the interpretation of different transient and permanent cortical inactivation experiment is very interesting and well balanced given the complexity of the issue. There is nevertheless a comment that is difficult to follow. The authors state:

      If cortical lesioning results in a greater weight being placed on the activity in spared subcortical circuits for perceptual judgements, we would expect the accuracy with which trial-by-trial outcomes could be read out from IC neurons to be greater in mice without auditory cortex. However, that was not the case.

      However, there is no indication that the activity they observe in shell IC is causal to the behavioral decision and likely it is not. There is also no indication that the behavioral signals seen by the authors reflect the weight put on the subcortical pathway for behavior. I find this argument handwavy and would remove it.

      While we are happy to amend this section, we would not wish to remove it because a) we believe that the point we are trying to make here is an important and reasonable one and b) because it is consistent with the reviewer’s comment. Hopefully, the following will make this clearer: In order for the mouse to make a perceptual judgment and act upon it - in the context of our task, hearing a sound and then licking a spout - auditory information needs to be read out and converted into a motor command. If the auditory cortex normally plays a key role in such perceptual judgments, cortical lesions would require the animal to base its decisions on the information available from the remaining auditory structures, potentially including the auditory midbrain. This might result in a greater correspondence between the mouse’s behavior and the neural activity in those structures. That we did not observe this outcome for the IC could mean that the auditory cortex did not contribute to the relevant perceptual judgments (sound detection) in the first place. Therefore, no reweighting of signals from the other structures is necessary. Alternatively, greater weight might be placed exclusively on structures other than the auditory midbrain, e.g. the thalamus. The latter would imply that the contribution of the IC remains the same. This includes the possibility that the IC shell does not play a causal role in the behavioral decision – in either control mice or mice with cortical lesions – as suggested by the reviewer.

      Changes to manuscript.

      Line 471: “This could imply that, following cortical lesions, greater weight is placed on structures other than the IC, with the thalamus being the most likely candidate, ..”

      (5) In Fig. 5 the two colors used in B and C are the same although they describe different categories.

      The dark green and ‘deep orange’ we used to distinguish between non-lesioned and lesioned in Figure 5C are slightly lighter than the colors used to distinguish between these two categories in other figures and therefore might be more easily confused with the blue and red in Figure 5B. This has been changed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We appreciate your comments and suggestions on our manuscript.

      In particular, we have measured the affinity between the middle tail domain of myosin-5a (Myo5a-MTD) and the actin-binding domain of melanophilin (Mlph-ABD) using microscale thermophoresis, and obtained the Kd of ~0.56 uM, which is similar to the Kd of the globular tail domain of myosin-5a (Myo5a-GTD) to the GTD-binding motif of melanophilin (Mlph-GTBM). Moreover, we have performed Western blot of the lysate of transfected cells, showing that the proteins of the dominant negative construct and the negative control were expressed at similar lever without noticeable degradation.

      We appreciate the editors’ and reviewers’ comment on how melanophilin might be regulated in binding to the exon-G of myosin-5 and to actin filaments. Phosphorylation of melanophilin by protein kinase A is one possible mechanism. We will investigate this issues in our future study.

      We also took this opportunity to correct several minor errors in the manuscript. Textual alterations can be viewed in the “tracked change” version of the manuscript. Below is the comments from the editors and the two reviewers together with our point-by-point responses.

      eLife assessment

      This study represents a useful description of a third interaction site between melanophilin and myosin-5a which is important in regulating the distribution of pigment granules in melanocytes. While much of the data forms a solid case for this interaction, the inclusion of important controls for the cellular studies and measurement of interaction affinities would have been helpful.

      Public Reviews:

      Reviewer #1 (Public Review):

      Interactions known to be important for melanosome transport include exon F and the globular tail domain (GTD) of MyoVa with Mlph. Motivated by a discrepancy between in vitro and cell culture results regarding necessary interactions for MyoVa to be recruited to the melanosome, the authors used a series of pull-down and pelleting assays experiments to identify an additional interaction that occurs between exon G of MyoVa and Mlph. This interaction is independent of and synergistic with the interaction of Mlph with exon F. However, the interaction of the actin-binding domain of Mlph can occur either with exon G or with the actin filament, but not both simultaneously. These data lead to a modified recruitment model where both exon F and exon G enhance the binding of Mlph to auto-inhibited MyoVa, and then via an unidentified switch (PKA?) the actin-binding domain of Mlph dissociates from MyoVa and interacts with the actin filament to enhance MyoVa processivity.

      The only weakness noted is that the authors could have had a more complete story if they pursued whether PKA phosphorylation/dephosphorylation of Mlph is indeed the switch for the actin-binding domain of Mlph to interact with exon G versus the actin filament.

      We thank Reviewer #1 for careful reading of the manuscript and appreciation of the study. We agree with the Reviewer that it is important to understand how the actin-binding domain of Mlph switch its interaction with the exon-G of Myo5a and actin filament. We would like to pursue this direction in our future research.

      Reviewer #2 (Public Review):

      The authors identify a third component in the interaction between myosin Va and melanophilin- an interaction between a 32-residue sequence encoded by exon-g in myosin Va and melanophilin's actin-binding domain. This interaction has implications for how melanosome motility may be regulated.

      While this work is largely well done and certainly publishable following needed revisions (e.g. some affinity measurements, necessary controls for the dominant negative experiments), I believe that additional work would be required to make a more compelling case. First, the study provides just one more piece to a well-developed story (the role of exon-F and the GTD in myosin Va: melanophilin (Mlph) interaction), much of which was published 20 years ago by several labs. Second, the study does not demonstrate a physiological significance for their findings other than that exon-G plays an auxiliary role in the binding of myosin Va to Mlph. For example, what dictates the choice between Mlph's actin binding domain (ABD) binding to actin or to exon-G. Is it a PTM or local actin concentration? It is unlikely to be alternative splicing as exon-G is present in all spliced isoforms of myosin Va. And what changes re melanosome dynamics in cells between these two alternatives? Similarly, the paper does not provide any in vitro evidence that binding to exon-G instead of actin effects the processivity of a Rab27a/Myosin Va/Mlph transport complex. For example, if the ABD sticks to exon-G instead of actin, does that block Mlph's ability to promote processivity through its interaction with the actin filament during transport? In summary, given that the authors did not directly test their model either in vitro or in cells, I do not think this story represent a significant conceptual advance.

      We thank Reviewer #2 for careful reading of the manuscript and the suggestions of improving the manuscript. As suggested by the reviewer, we have measured the affinity between the middle tail domain of Myo5a (Myo5a-MTD) and Mlph-ABD (Kd ~0.562 uM), which is similar to that between the globular tail domain of Myo5a (Myo5a-GTD) and the GTBM of Mlph. In addition, we have performed additional experiments showing the integrity and the expression level of the dominant negative constructs in the transfected cells.

      We believe more extensive experiments are required to address other questions raised by the reviewer. For example, what dictates the choice between Mlph's actin binding domain (ABD) binding to actin or to exon-G is an open question. As we proposed, phosphorylation by protein kinase A is only one possible mechanism. We would like to pursue them in our future research.

      Recommendations for the authors:

      The reviewing editor feels strongly that addressing some of the points raised by the reviewers would make this a more compelling manuscript. In particular, a measurement of the affinity of the relevant fragments from melanophilin and myosin-5a would indicate that the interaction might be physiologically relevant. Concerning the dominant negative experiments, the lack of effect of an expressed fragment could be that the expressed fragments were simply degraded or expressed at too low of a level to be competing. The reviewer gives guidelines on how to address this. Reviewer #2 made a point that it would be compelling if the effect of phosphorylation as suggested in the model was tested, but we all agree that this could well be the subject of a later study. In addition, the authors make a very interesting proposal for how protein kinase A could be involved in this regulation as has been suggested previously. Perhaps the use of phosphomimetic mutations could give some insight into this. Such experiments, if consistent with the proposed model would certainly raise the impact of this study. Finally, a very clear periodicity in hydrophobic amino acids is apparent in the interacting sequences of both Myo5 (yrisLykrMidLmeqLekqdktVrkLkkqLkvFakkIgeLevgqmen) and Mlph (tdeeLseMedrVamtAseVqqAeseIsdIesrIaaLra). This is strongly suggesting a leucine-zipper-like coiled coil, rather than an interaction mediated solely by charge. Recent softwares (and easily accessible too) like AlphaFold multimer might yield important structural insight into the binding configuration and might help rationalize the effect of the mutations herein.

      We thank the editors and the reviewers for their suggestions of improving the manuscript. We have performed the several essential experiments to address the concerns raised by the reviewers.

      (1) Regarding the affinity of the relevant fragments from melanophilin and myosin-5a. We have measured the affinity between Mlph-ABD and Myo5a-MTD using MST (Kd ~562 nM) (see revised Figure 3A).

      (2) Regarding the concerns on the dominant negative experiments. We have examined the molecular sizes and expression levels of  Mlph or Myo5a constructs by Western blots. First, we show that all constructs have correct molecular size in transfected cells (see revised Figure 6C and 7D), indicating that the inability of Myo5a or Mlph truncations to generate dilute-like phenotypes was not due to the intracellular degradation of the EGFP fusion protein. Second, by correcting for the percentage of transfected cells, we show that the overall expression levels of the wild-type construct and the mutants are roughly equal. Third, we categorized the expression levels into high and low, and calculated percentage of the DN phenotype in high and low expression levels. The results are consistent with the percentage of DN phenotype in total EGFP fusion protein cells.

      (3) Regarding the suggestion to investigate the effect of phosphorylation by protein kinase A on Mlph-ABD’s interaction with Myo5a and actin filament. We understand that it is important to elucidate the mechanism by which the actin-binding domain of Mlph switch its interaction with the exon-G of Myo5a and actin filament. However, as we proposed, phosphorylation by protein kinase A is one possible mechanism, and more extensive experiments are required to address this question. Therefore, we would like to pursue it in our future research.

      (4) Regarding the suggestion to predict the interaction between the exon-G of myosin-5a and Mlph-ABD using AlphaFold. We have used AlphaFold multimer to predict the Myo5a-MTD/Mlph-ABD interaction. Remarkably, the AlphaFold predicted that the binding of Myo5a-MTD with Mlph-ABD is mediated by an antiparallel coiled-coil formed by Myo5a (1430-1467) and Mlph (450-481), just as predicted by the editors. This prediction is also consistent with our finding that the exon-G of Myo5a interacts with Mlph-ABD. However, the predicted model cannot explain our mutagenesis results. We will pursue this point in the future research. Nevertheless, we are grateful to the editors for bringing this idea to our attention, because it will help us to design experiments to investigate the nature of Myo5a-exon-G/Mlph-ABD interaction.

      Reviewer #1 (Recommendations For The Authors):

      Specific minor comments

      Q1: In figs 6-7 an overlay between DAPI and EGFP would be helpful for the reader to see perinuclear distribution.

      As suggested, we have added the merged images of DAPI and EGFP in the revised Figure 6 and 7.

      Q2: The delta symbol in the pdf text was corrupted.

      The corrupted delta symbol has been fixed in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Q1: Please explain in detail early in the text what exon-G is - length, position in the tail, and evidence that it is a coiled coil (CC). Of note, is it only long enough for about 4 heptad repeats? Has it been shown biochemically to form a CC? Is the CC irreversible? What would be the consequence of removing the exon-G CC on the ability of surrounding regions to bind Mlph (exon-F and the GTD)?

      We thank the reviewer for this suggestion. In the revision, we added a new paragraph (the first paragraph in the results section) and revised Figure 1A to introduce the middle tail domain and alternatively spliced exons of Myo5a.

      Exon-G is 32 amino acids in length, located at the C-terminal region of the middle tail domain, immediately before the globular tail domain. Exon-G region was predicted to form a short coiled-coil by using on-line tools (such as paircoil), and this prediction has not been tested biochemically. Moreover, we do not know whether the exon-G coiled-coil is reversible or not.

      We have not examined the effect of removing the whole exon-G on the interaction between the GTD and Mlph-GTBM. The exon-G (residues 1436-1467) and the GTD core (residues 1498-1877) are separated by a long loop of 31 residues. We therefore expect that the removing the exon-G will not affect the GTD/Mlph-GTBM interaction.

      Physically, exon-F is immediately followed by exon-G, and those two regions might interfere with each other. In our preliminary study, we found that removing the whole exon-G abolished the interaction between exon-F and Mlph-EFBD. On the other hand, removing the C-terminal half (residues 1454-1467) of exon-G had little effect the interaction between exon-F and Mlph-EFBD (see Figure 2C). In this work, we intentionally selected the later construct for functional analysis of the exon-G/Mlph-ABD interaction, because removing the C-terminal half of exon-G abolishes the interaction with Mlph-ABD, but does not affect the exon-F/Mlph-EFBD interaction.

      Q2: Figures 1-3. While the pulldown experiments demonstrating an interaction between Mlph-ABD residues 446-571 and Myo5a-MTD are a good start, one would like to see affinity measurements to gauge the likelihood that this interaction is physiologically relevant. The same goes for the pulldown experiments demonstrating an interaction between (i) the C-terminal half of exon-G (residues 1453-1467) and the Mlph-ABD, (ii) between residues 1411-1467 (a short peptide containing exon-F and exon-G) and the Mlph-ABD, and (iii) between residues 1436-1467 (a short peptide containing exon-G) and the Mlph-ABD. This would also apply to the pulldowns in 3C-3E where versions of the proteins with charge residue changes were tested.

      We agree the reviewer’s opinion that determination of the affinities between Mlph-ABD and Myo5a-MTD and their variants will be helpful in understanding the physiological relevance of Exon-G/Mlph-ABD interaction. However, the extensive experiments suggested by the reviewer require many high quality, purified proteins, which are not trivial.

      Nevertheless, we think it is important to know the affinity between Myo5a-MTD and Mlph-ABD (both wild-type), as this parameter can be used for the comparison of the three interactions between Myo5a and Mlph. Therefore, we have obtained the affinity between Myo5a-MTD and Mlph-ABD using microscale thermophoresis (MST). The dissociation constant (Kd) of Myo5a-MTD to Mlph-ABD is 0.562±0.169 uM, which is similar to that between Myo5a-GTD and Mlph-GTBM (~1 uM) (Geething & Spudich (2007) JBC 282:21518). Consistent with GST pulldown results, MST shows that deletion of C-terminal half of exon-G (1453-1467) greatly decreases the MST signals (see revised Figure 3A).

      Q3: While the domain negative (DN) approach to testing functional significance is OK, rescuing dilute/myosin Va null melanocytes with full-length myosin Va containing the various deletions would have been more convincing. Also, the authors must show (i) that the DN constructs are the correct size in transfected cells (i.e. are not degraded), and (ii) that they are expressed at roughly equal levels (either by doing Westerns and correcting for the percent of transfected cells, or by measuring total cellular fluorescence in transfected cells). Without this information, it remains possible that constructs not exhibiting a DN effect are simply degraded or poorly expressed. This applies to all the DN data in Figures 6 and 7.

      We agree with the reviewer that Myo5a null melanocytes is ideal for investigating exon G function. Unfortunately, we do not have Myo5a null melanocytes derived from dilute mice.

      To confirm the integrity of the overexpressed proteins in the transfected cells, we performed Western blot of those proteins, including  EGFP-Mlph-RBD (wild-type and two mutants) and Myo5a-Tail (wild-type and G mutant), in the lysate of the transfected cells. Western blots show that all those proteins have correct molecular masses, indicating no degradation of those overexpressed proteins (see revised Figure 6C and 7C). Moreover, by correcting for the percentage of transfected cells, we show that the overall expression levels in each transfected cell of the wild-type construct and the mutants are roughly equal. This information is included in the revised manuscript (Line 222-225; 237-241).

      Q4: The authors scored the DN phenotype as yes/no but it mostly likely varies depending on the degree of over-expression. Showing that the degree of melanosome centralization scales with the degree of overexpression, and that the correlation between expression level and phenotype varies depending on the construct would strengthen the results.

      We agree with the reviewer’s prediction that the degree of DN phenotype should depend on the of over-expression level. We analyzed the EGFP signals of transfected cells and found very few cells with medium expression level. Therefore, we simply categorized the expression levels into high and low, and calculated the DN phenotype in each categories as shown in the table below. These results are consistent with the expectation that the degree of DN phenotype depends on the over-expression level of the transfected constructs.

      Author response table 1.

      Percentage of the EGFP-expressing cells with perinuclear aggregation of melanosomes

      Q5: The conclusion from the data in Figure 8A- "the presence of both exon-F and exon-G is insufficient for binding to the Mlph occupied by Myo5a, but sufficient for binding to the unoccupied Mlph"- should be verified by also doing the experiment in myosin Va knockdown cells.

      We agree. Unfortunately, our RNAi knockdown of Myo5a in melanocytes by RNAi is not ideal and we do not have Myo5a knockout melanocytes. We will pursue this point in the future.

      Q6: Line 213 "three Mlph-binding regions, i.e., exon-F, exon-F, and GTD (Figure 7A)" has a typo.

      This typo has been corrected.

      Q7: The authors should provide high mag insets for the images in Figure 8.

      As suggested, we have revised Figure 8 by including high mag insets for the images.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In 'Systems analysis of miR-199a/b-5p and multiple miR-199a/b-5p targets during chondrogenesis', Patel et al. present a variety of analyses using different methodologies to investigate the importance of two miRNAs in regulating gene expression in a cellular model of cartilage development. They first re-analysed existing data to identify these miRNAs as one of the most dynamic across a chondrogenesis development time course. Next, they manipulated the expression of these miRNAs and showed that this affected the expression of various marker genes as expected. An RNA-seq experiment on these manipulations identified putative mRNA targets of the miRNAs which were also supported by bioinformatics predictions. These top hits were validated experimentally and, finally, a kinetic model was developed to demonstrate the relationship between the miRNAs and mRNAs studied throughout the paper.

      I am convinced that the novel relationships reported here between miR-199a/b-5p and target genes FZD6, ITGA3, and CAV1 are likely to be genuine. It is important for researchers working on this system and related diseases to know all the miRNA/mRNA relationships but, as the authors have already published work studying the most dynamic miRNA (miR-140-5p) in this biological system I was not convinced that this study of the second miRNA in their list provided a conceptual advance on their previous work.

      We believe this study is an enhancement on our previous work for two reasons, which have been alluded to in new text within the introduction. Firstly, our previous work used experimental and bioinformatic analysis to identify microRNAs with significant regulatory roles during chondrogenesis. This new manuscript additionally uses  a systems biology approaches to identify novel miRNA-mRNA interactions and capture these within an in silico model. Secondly, this work was initiated by the analysis of our previously generated data – using a novel tool we developed for this type of data (Bioconductor - TimiRGeN).  

      I was also concerned with the lack of reporting of details of the manipulation experiments. The authors state that they have over-expressed miR-199a-5p (Figure 2A) and knocked down miR-199b-5p (Figure 2B) but they should have reported their proof that these experiments had worked as predicted, e.g. showing the qRT-PCR change in miRNA expression. Similarly, I was concerned that one miRNA was over-expressed while the other was knocked down - why did the authors not attempt to manipulate both miRNAs in both directions? Were they unable to achieve a significant change in miRNA expression or did these experiments not confirm the results reported in the manuscript?

      We agree with the reviewer that some additional data were needed to demonstrate the effective regulation of miR-199-5p.  Hence, Supplementary Figure 1 is now included which provides validation of the effects of miR-199a-5p overexpression (Supplementary Figure 1A) and inhibition of miR-199a/b-5p (Supplementary Figure 1B). Within the main manuscript, Figure 2B has been amended to include the consequences of inhibition of miR-199a-5p, with 2C showing the consequences of miR-199b-5p inhibition. Further, we include new data with regards to miR-199a/b-5p inhibition on CAV1 (Figure 4A). 

      I had a number of issues with the way in which some of the data was presented. Table 1 only reported whether a specific pathway was significant or not for a given differential expression analysis but this concealed the extent of this enrichment or the level of statistical significance reported. Could it be redrawn to more similarly match the format of Figure 3A? The various shades of grey in Figure 2 and Figure 4 made it impossible to discriminate between treatments and therefore identify whether these data supported the conclusions made in the text. It also appeared that the same results were reported in Figure 3B and 3C and, indeed, Figure 3B was not referred to in the main text. Perhaps this figure could be made more concise by removing one of these two sets of panels.

      We agree with all points made here and have amended these within the manuscript. Figure 1A is now pathway enrichment plots from the TimiRGeN R Bioconductor package, and the table which previously showed the pathways enriched at each time point is now in the supplementary materials (supp. Table 1). Figure 2 and 4 now have color instead of shades of grey. Figure 3C has now been moved to supplementary materials (Supplementary Figure 2) and is referenced in the text. 

      Overall, while I think that this is an interesting and valuable paper, I think its findings are relatively limited to those interested in the role of miRNAs in this specific biomedical context.

      Reviewer #2 (Public review):

      Summary:

      This study represents an ambitious endeavor to comprehensively analyze the role of miR199a/b-5p and its networks in cartilage formation. By conducting experiments that go beyond in vitro MSC differentiation models, more robust conclusions can be achieved.

      Strengths:

      This research investigates the role of miR-199a/b-5p during chondrogenesis using bioinformatics and in vitro experimental systems. The significance of miRNAs in chondrogenesis and OA is crucial, warranting further research, and this study contributes novel insights.

      Weaknesses:

      While miR-140 and miR-455 are used as controls, these miRNAs have been demonstrated to be more relevant to Cartilage Homeostasis than chondrogenesis itself. Their deficiency has been genetically proven to induce Osteoarthritis in mice. Therefore, the results of this study should be considered in comparison with these existing findings.

      We agree with the reviewers comments. miR-455-null mice develop normally but miR-140-null (or mutated) mice and humans do have skeletal abnormalities (e.g. Nat Med. 2019 Apr;25(4):583-590. doi: 10.1038/s41591-019-0353-2), indicating a role in chondrogenesis.  We have made an addition in the description to point towards the need to assess the roles miR-199a/b-5p may play during skeletogenesis and OA. We anticipate miR-199a/b-5p to be relevant in OA and have ongoing additional work for this – but this beyond the scope of this manuscript. 

      Recommendations to Authors:

      Reviewer #1 (Recommendations to authors):

      Beyond the issues raised in the public review, I had a few minor recommendations that are largely designed to help improve the understanding of the manuscript as it is currently written.

      (1) Please provide the statistical tests used to obtain p-values in the Figure 2 and 4 legends.

      We have now added statistical test information to the figure legends of figures 2 and 4.

      (2) It is stated on p. 9 that both miRNAs may share a functional repertoire because 25 and 341 genes are interested between their inhibition experiments. Please provide statistical support that this overlap is an enrichment over the null background in this experiment. Total DE genes – chi squared. Expected / Observed. 

      A chi-squared test is now presented in the manuscript which shows that the number of significant genes which were found in common between miR-199a-5p knockdown and miR-199b-5p knockdown were significantly more than expected for day 0 or day 1 of the experiments. 

      (3) The final sentence on p. 12 (beginning 'Size of the points reflect...') seemed out of place - is it part of a legend?

      Thank you for pointing out this mistake - it was part of figure 3C and now is in the supplementary materials.

      (4) A sentence on p. 14 reads that 'FZD6 and ITGA3 levels increased significantly' but this should read decreased, rather than increased. Quite an important typo!

      Thank you for pointing this error out. It has been corrected.

      (5) Theoretical transcripts are mentioned in the legend of Figure 5A but these were not present in the figure. Please include these or remove them from the legend.

      This error has been removed form Figure 5A.

      (6) On p 20, the references 22 and 27 should I think be moved to earlier in the sentence (after 'miR-199a-5p-FZD6 has been predicted previously'). Currently, it reads as if these references support your luciferase assays which you claim are the first evidence for this target relationship.

      We agree with this change and have corrected the manuscript.

      (7) The reference to Figure 5D on p. 20 should be a reference to Figure 5C.

      Thank you for pointing this error out – this has been corrected.

      Reviewer #2 (Recommendations to authors):

      (1) The paper is based on the importance of miR-140 and miR-455 as miRNAs in chondrogenesis, citing only Barter, M. J. et al. Stem Cells 33, (2015). Considering the scope and results of this study, this citation is insufficient.

      We agree with this reviewers comments. For many year miR-140 and miR-455 have been experimented on and their importance in OA research has become apparent. We included additional references within the introduction to address this.

      (2) Analyzing chondrogenesis solely through differentiation experiments from MSCs is inadequate. It is essential to perform experiments involving the network within normal cartilage tissue and/or the generation of knockout mice to understand the precise role of miR199a/b-5p in chondrogenesis.

      We have added an additional paragraph in the discussion to state this, and do believe it is highly important that miR-199a/b-5p be tested in OA samples – however this would be beyond the intended scope of this article.

      (3) In light of the above points, it is imperative to investigate the role of miR-199a/b-5p beyond the in vitro differentiation model from MSCs, encompassing mouse OA models or human disease samples.

      In tangent with the previous address, we agree with the pretense and believe additional experiments should be performed to gain more insight to the mechanism of how miR-199a/b-5p regulate OA. But development of a new mouse line to investigate this is not in the scope of this manuscript.

    1. The argument/ideology that pins down Barthes’ deconstruction of the Eiffel Tower is very Nietzchian. Much like Nietzsche’s popular argument that art is the only truth because it allows one to live in a personal abstraction and intuition, the tower being art means it surpasses our rationalization, deconstruction, and assimilation of it into one side of binary schemas. It exists to emphasize its inability to be known by us and to serve almost “mythical” purposes that transcend rational rules of the world. In other words, “Barthes’ phenomenological approach brings us to the focus of our investigation: an architectural structure’s capacity to simultaneously be understood as agent and object, a capacity we regard as a peculiar oscillation between function and symbol in the case of the Eiffel Tower” (Steiner).

      There is a lot to unpack with the contradictory qualities of the “utterly useless monument”, which we actually learn is pretty useful (Barthes 5). The point that stands is that, physically, the tower is an uncontainable object that we try to domesticate. One way we do this is through “the installation of a restaurant [...or other] means of leisure” in the tower itself (Barthes 16). The fact that the tower is an open construction makes us uncomfortable when we are used to typical tourist hotspots (like museums, for instance) being enclosed for us to feel like we entered, experienced, and “owned” some of it. The tower doesn’t do that for us. So, we have to create a mini world surrounding the tower in order to make it feel normal. In our conception of the order of the world, the Eiffel tower is unique to us because it is simultaneously a representation of the inside and of the outside world. This quality, that the tower is somehow both sides of an opposite binary, is too far outside of the social contract, and Nietzche would say (and Barthes points to it) that we often try to tackle this discomfort by trying to reduce the tower. We do this by turning the tower into a sight of projection. It becomes a symbol of industrialism, of Paris, of travel, of art, of Paris itself–whatever one may choose. But it is in this choice, that we strip the tower of the other symbols it projects equally as strongly. And this is where the problem lies. We must look at the tower as the embodiment of all the opposites it may be: inside/outside, industry/art, ugly/beautiful, all at the same time.

      Barthes asks us to consider why the tower makes us so uncomfortable in this binary presentation. Perhaps it is because this makes the tower oddly more powerful than us. The tower can be a spectacle and an object, useless and useful, inside and outside. We cannot be those things. If we are looking at the tower, we can't be in it, for example; but the tower can be both an empty base space outside, and an indoor restaurant as well, for example. None of our relations to the tower can come together at the same time, while the tower can be opposites at the same time. We can only perceive the tower as one of its opposite meanings at a time, and we have to kind of deal with the impossibility of bringing together two things that are true and simultaneous but also cannot co occur logically. I think one way we do this is by glossing over it all and pretending everything can occur at the same time–a comforting thought facilitated by the constructed surrounding environment.

      However, by doing this, what simultaneously happens is that the tower becomes a signifier of basically an infinite sight of projection. It is reduced to a symbol of Paris, of travel, of industrialism, of some kind of focal point in France. The tower being a signifier for everything really just makes it nothing. And when we come face-to-face with this (structural and symbolic) emptiness, we rush to find ways to create more perceived “somethingness”(we add restaurants, shops, carts of food, and other community experiences all around the tower) to fit into our schemas and orders.

      Barthes, Roland. The Eiffel Tower - Roland Barthes - LANTB, lantb.net/uebersicht/wp-pdf/eiffelTower.pdf. Accessed 13 May 2024. Steiner, Henriette, and Kristin Veel. “Towering invisibilities: A cultural-theoretical reading of the Eiffel Tower and the One World Trade Center.” Qualitative Inquiry, vol. 25, no. 4, 5 Aug. 2018, pp. 407–416, https://doi.org/10.1177/1077800418790297.

    2. Pentadic criticism can be used to analyze the Eiffel Tower as well. It requires of us that we identify 5 items of the pentad: * Agent (who is performing the act) * Act (what is happening) * Scene (where and when the artifact was produced) * Purpose (why) * Agency (how/what means does the agent use) Pentadic criticism allows us to assign various different characteristics or details to each “item”, resulting in various interpretations of the same artifact. (Foss 356) Here is one possible pentad and interpretation of the Eiffel Tower * Agent: Gustave Eiffel * Act: constructing the Eiffel Tower * Scene: Paris during 1889 * Purpose: to introduce the value of engineers as creative artisans and mathematical intellectuals in a climate heavily dominated by artists only Agency: by submitting the design for the Eiffel Tower to the World Fair contest, Eiffel found a means through which he could gain extreme popularity for his cause. In construction of the tower itself, the use of metal and open structures contribute to the “engineer” aspect of the monument. Next is to analyze the artifact using some combination of these five characteristics. For instance, one might argue that agency and scene are the most important qualities of this artifact: By using the World Fair contest as a way to bring attention to his industrial artifact, Gustave Eiffel also shattered a perception of Paris as a classy and elegant city. He took attention away from the belle epoque and forced people to think of how structures could also serve useful purposes. The message behind the Eiffel Tower may not have come across this way had Eiffel, an engineer, not submitted his work to an art competition like this one.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Dear editor and reviewers,

      we thank you very much for your constructive comments, criticisms and suggestions for improvement of our manuscript. We have addressed all points raised by you and have added our point-by-point response to your comments below.

      With best regards on behalf of all authors,

      Andreas Wodarz

      1. Point-by-point description of the revisions

      Reviewer #1

      Evidence, reproducibility and clarity

      Baz/Par3 is an important conserved protein acting as a master regulator of cell polarity in a wide range of cell types. This study focuses on re-assessing the subcellular localisation of Baz/Par3 in a range of Drosophila tissues. This is an important study with respect to our understanding of Baz/Par3, as there have been conflicting reports on the localisation of Par complex members - while the majority show localisation to cell cortex and intercellular junctions, several reports have claimed that Par complex members localise at additional subcellular sites including the nucleus, nuclear envelope and neuromuscular junction. In this study the authors re-assess this issue for Baz/Par3 in a comprehensive and thorough manner.

      We thank the reviewer for this overall positive assessment of our work.

      *1. They used a variety of antibodies raised in different host animals against different epitopes of Baz 2. They tested the specificity of these antisera using mosaic analysis with null mutant baz alleles and tissue-specific RNAi against baz 3. They used a GFP-tagged Baz under control of its endogenous promoter in a baz null mutant background to compare with the subcellular localisation of the respective GFP-Baz fusion proteins to the staining results with anti-Baz antisera

      The data from each of these experiments are very clear and convincing. Comprehensive methods are included which means that each of the experiments with specific anti-sera/RNAi lines/GFP-tagged conditions could be reproduced. There are a couple of experiments which were performed in support of the conclusions (extra RNAi lines and stronger expression of Gal4) listed as (data not shown). I would strongly suggest including these data as extra supplemental figures. Together, their results clearly show that Baz/Par3 localises to the cortex and intercellular junctions, but that anti-sera staining at the NMJs and nuclear envelope appear to be a staining artifact, likely due to staining with an unidentified epitope.

      Minor comments 1. Many of the figures have overlays of red and green which will be indistinguishable from each other to colour-blind readers. Please alter to make colour-blind friendly (eg magenta-green)*

      We have changed all figures in the following way: All single channel images have been converted to inverted grayscale to improve the visibility of weak fluorescence signals. In all multicolor overlay images, red has been omitted and instead green, magenta, blue and grayscale have been used to improve the visibility for color-blind readers.

      2. In Fig 2D please indicate where the epidermis and neuroblasts are

      We assume that the reviewer refers to Fig. S2D. In the revised version of the manuscript, this figure is now Fig. S2A. We have marked epidermal cells and neuroblasts by different symbols.

      *3. In the following two places there are experiments describe where the data is listed as not shown. Please show the data as additional supplemental data. They are P8 - This result was confirmed using the CY2::Gal4 driver line expressed in the follicular epithelium and with three different RNAi lines against baz (data not shown). *

      We have deleted this sentence because expression of CY2::Gal4 in our hands was weaker and thus the RNAi effects less reproducible than with tj::Gal4.

      P11 - We also did not see any downregulation of Baz or a-spectrin upon baz-RNAi in M12 at 29°C, when the UAS-Gal4 system is maximally active (data not shown).

      We now show these results in the new Fig. S8.

      4. Figure 3 - this would be easier to interpret with a few arrows/arrowheads indicating the NMJs

      We have added arrows pointing to NMJs and arrowheads pointing to nuclei.


      Significance

      It will be important to publish these results as it means that findings for a function of Baz/Par3 at the NJM and the nuclear envelope should be regarded with caution, and it may save researchers chasing for functions for Baz/Par3 in places where they are simply not expressed. As much of our fundamental understanding of how Par3 works in vertebrates has its roots in studies in Drosophila, this is likely to be of wide relevance.


      Reviewer #2

      Evidence, reproducibility and clarity

      *Evidence, reproducibility and clarity

      1.1 Summary

      This reviewer acknowledges the expertise and contributions of Prof. Wodarz and his research group in the field of development, cell polarity regulation and Drosophila genetics.


      Manuscript summary:

      Kim S. et al. explored the localisation of Bazooka, the Drosophila homolog of the polarity protein Par-3, at two non-canonical positions for a cell polarity factor: the nuclear envelope in epithelial tissues and the postsynaptic membrane of the neuromuscular junction (NMJ). Previous work has shown the detection of Par-3/Baz at the nuclear envelope and the NMJ using antibodies against Par-3/Baz. Here, the authors used a combination of genetic perturbations (baz RNAi and generation of genetic mosaics for baz) and GFP-labelled Bazooka lines to test if the antibody-mediated detection of Baz at the nuclear envelope and NMJ is artifactual. The data provided by the authors strongly suggest both the nuclear envelope and NMJ detection of Baz using antibodies is non-specific.

      1.2 Major comments

      The manuscript is written in a clear manner, easy to be followed by readers. However, there are some important experimental details that should be provided as the authors advance over previous work regarding Baz localization (points 1.2.1 and 1.2.2). Furthermore, if possible, this reviewer considers that performing the experiment in 1.2.3 would strengthen the authors main message of their manuscript.

      1.2.1 Methodology information is missing, and would be necessary to be included for: image acquisition (Objectives, Airyscan mode), image processing (projections, details on linear -e.g. brightness, contrast- or non-linear adjustments of signal -e.g. gamma-). For image processing information, please include it within each figure legend. *

      We have added the information regarding objectives and imaging modes to the Materials and Methods section. There it now reads: "Tissues were imaged on a Zeiss LSM880 Airyscan confocal microscope using 25x LCI Plan Neofluar NA 0.8 and 63x Plan Apochromat NA 1.4 oil immersion objectives. If not stated otherwise in the figure legend, all confocal images are single optical sections taken at a pinhole setting of 1 Airy unit. Images were processed with Zen black software (Zeiss) without contrast enhancement. Figures were assembled with Inkscape 1.2 (Inkscape.org) and Powerpoint (Microsoft)."

      RNAi experiments lines, temperature for each target and tissue (a table would be helpful) and number of heat shocks performed for FRT/FLP clones.

      We have added a table in the Supplementary information giving the precise genotypes for each figure. We have furthermore added the following sentences to the Materials and Methods section: "Crossings for RNAi experiments were set up at 25°C if not indicated otherwise. For generating follicle cell clones in ovaries by Flipase-mediated mitotic recombination of the FRT sites flies were heat shocked for 1h at 37°C 5-7 days prior to preparation of the ovaries. For generation of germ line clones by Flipase-mediated mitotic recombination of the FRT sites flies were heat shocked twice for 2 h at 37°C on two consecutive days in late 2nd, early 3rd instar larval stages."

      1.2.2 For each experiment it is unclear the number of specimens (experimental units) and independent experiments that were analysed. It is unclear if the Baz localisation phenotypes are fully penetrant or not as judged by the data provided.

      We have added the following section to the Materials and Methods: "Images were analyzed for the presence or absence of a fluorescence signal at the nuclear envelope or the NMJ compared to negative or positive controls, either in the same tissue (mutant clones in the follicular epithelium, RNAi in a specific body wall muscle, junctional versus nuclear signal, anti-Baz staining versus Baz-GFP signal) or in samples processed in parallel (ovaries with follicle cell and germ line clones). Fluorescence intensities were not quantified because the results were obvious and fully penetrant. Therefore, no statistical analysis of the results was required."

      1.2.3 This reviewer agrees the data provided strongly suggests the detection of Baz along the nuclear envelope and NMJ is artifactual in the Drosophila tissues that have been studied. However, the nature of the bazEH747 mutant allele is not a deletion of the Baz gene, but instead a nonsense mutation, which, as the authors describe, could potentially generate a small product of 51 aminoacids, corresponding to the N-terminal part of Baz, which is also the target of Baz rabbit antibody ('rb Baz 1-297'). Thus: • Would it be possible to complement the FRT/FLP analyses in the FE using a deficiency that uncovers the baz locus? A persistent detection of Baz signal at the nuclear compartment after complete removal of baz gene products would be an ideal experiment, if feasible.

      We agree with the reviewer that the use of a clean deletion allele of the whole baz locus would be the ideal tool for the clonal analysis. However, such an allele does not exist according to our knowledge.

        • Would the authors comment on the possibility the rb Baz antibody 1-297 detect a 51 aminoacids peptide? We consider this possibility very unlikely for two reasons: 1) RNAi affects the baz mRNA and thus should knock down all epitopes to the same degree. However, we see a complete loss of junctional Baz signal but no reduction of the signal at the nuclear envelope or the NMJ upon RNAi targeting baz. 2) The GFP-Baz fusion proteins do not show any signal at the NMJ or the nuclear envelope upon imaging of the native GFP fluorescence or upon antibody staining with an anti GFP antibody, although both the Baz-GFP BAC line and the GFP-Baz protein trap line express full-length Baz including the N-terminal epitope that is potentially still expressed in the bazEH747* allele. We have added a passage summarizing these considerations to the Discussion section.

      *1.3 Minor comments

      This manuscript is largely based on imaging data. Therefore, it would be beneficial for the ease of comprehension of figure panels:

      1.3.1 More general use of insets to show with larger magnification and clarity the data indicated with arrows and arrowheads.*

      We have added arrowheads, arrows and additional symbols to point to features of interest in all figure panels where this is helpful.

      1.3.2 Using negative grayscale either for insets or single channel data.

      We have changed all single channel image panels to negative (inverted) grayscale.

      1.3.3 For coloured-overlays please bear in mind using colors that would be suitable for colour-blinded readers.

      In all multicolor overlay images, red has been omitted and instead green, magenta, blue and grayscale have been used to improve the visibility for color-blind readers.

      1.3.4 Figures showcasing the clonal analyses (both MARCM and FRT/FLP): might be worth indicating the boundaries of clones in single channel data with a dotted line.

      We have marked the clone boundaries of the MARCM clones by dashed lines in Fig. 2D, E and have added a high magnification inset to show the clone boundaries (Fig. 2D', E').

      Significance

      *2 Significance

      The findings provided by this manuscript will be of importance for researchers in the field of cell polarity, conducting research on Bazooka/Par-3 and associated proteins, both within the Drosophila field and other model organisms. The present study presents an advance towards a specific and most likely artifactual observation of Par-3/Bazooka. It will help to re-think the tools used for detecting Par-3/Bazooka in different animal models, and in this regard, will be helpful for the community.*

      We thank the reviewer for appreciating the importance of this work.

      *This work does not focus on Par-3/Bazooka biology, nor provides new insights into Par-3/Bazooka function, however, it is clear for this reviewer the later is not the aim of this manuscript.

      Reviewer expertise:

      • Drosophila genetics
      • Developmental cell biology and morphogenesis
      • Cytoskeleton, cell cell adhesion and cell polarity*

      Reviewer #3 *(Evidence, reproducibility and clarity (Required)):__

      __Kim et al. address a common but frequently neglected problem in molecular and cellular biology: sophisticated tests for the specificity of antibodies. The protein Bazooka (Baz) is a member of the Par complex that usually resides in apicocortical regions of epithelial cells. Several publications, however, report expression in other subcellular compartments or cell types, such as the nuclear lamina or neuromuscular junction (NMJ). The authors have used a panel of polyclonal antibodies, genetic constructs and mutant alleles to show that staining of Baz in the nuclear envelope or NMJ is likely unspecific due to an unknown cross-reactivity. Specifically, four antisera, raised against different GST-Baz fusion proteins in different species, recognized Baz at cortical membranes, around nuclei and at NMJs. Nuclear and NMJ staining, however, persisted in baz-RNAi experiments or baz mutant clones. If the endogenous locus is tagged with GFP, Baz-GFP localized to cortical membranes in imaginal disc epithelial cells but was but not detectable in nuclear envelopes or NMJs in muscles. The authors conclude that they could not find evidence for either nuclear or NMJ localization of Baz and any results derived from these antibodies should be regarded with caution.

      The manuscript reports a careful and thorough evaluation of anti-Baz antibodies used in the scientific community. Since it might impact previous findings, any remaining uncertainties should be clarified before publication. I have therefore a number of suggestions to improve the manuscript.

      Major comments:

      1) Any truncation or addition of amino acids might affect the subcellular localization of proteins. Important molecular information on the baz alleles and GFP-fusion proteins are therefore missing in the manuscript. Specifically, what is the underlying molecular nature of the baz alleles used in the study, e.g. bazEH747 (nonsense? position?)? At which amino acid position and in which protein domain is GFP fused to Baz in Baz-GFP (Bac) and Baz-GFP (Trap)? Would these fusions affect subcellular localization and/or functionality? While the authors positively tested Baz-GFP (Bac) in a baz mutant background, this cannot easily be done for Baz-GFP (Trap). The authors should therefore clarify, e.g. by RT-PCR, which of the four Baz isoforms are fused to GFP in Baz-GFP (Trap) and if this might affect functionality and/or location? This information should be depicted or listed together with the epitopes of the antibodies in a figure or table, respectively, in the main manuscript for better orientation of the reader. *

      bazEH747 is a strong loss-of-function allele with a point mutation changing the codon for Q51 to Stop in all four isoforms (numbering is according to isoform A) (Krahn et al., 2010; Shahab et al., 2015). In the Results section, we have changed the wording as follows to make this clear: "For clonal analysis the strong loss-of-function allele bazEH747 was used, where a point mutation in exon 4 results in a premature stop close to the N-terminus of all four isoforms (the codon for amino acid residue Q51 is mutated to a stop in isoform A) (Krahn et al., 2010)."

      We have added two additional supplemental figures to precisely show the insertion site of GFP in the GFP-Baz trap line (Fig. S5) and the Baz-GFP BAC line (Fig. S6). We have changed the Results section to precisely explain the nature of the two Baz-GFP lines as follows: "While strong nuclear envelope immunostaining was observed using several independently raised anti Baz antibodies (Fig. 1; Fig. S1), no nuclear envelope localization was detected in follicular epithelial cells and in larval body wall muscles using a Baz-GFP BAC line (Besson et al., 2015) (Fig. S3C-D', S4A, A') nor in a GFP-Baz protein-trap line (Buszczak et al., 2007)(Fig. S3E-F', S4C, C'). In the GFP-Baz protein-trap line an engineered exon encoding for GFP is inserted into the second untranslated exon (Fig. S5). This exon encoding for GFP is predicted to be spliced in frame into the mRNAs RA and RC encoding for isoforms PA and PC whose translation starts in exon 1 (Fig. S5), resulting in insertion of GFP between amino acid residues K40 and P41 of isoforms PA and PC. The transcripts RB and RD encoding Baz isoforms PB and PD have their translation start within exon 3 and thus cannot form fusion proteins with GFP inserted in exon 2 (Fig. S5). However, GFP-Baz protein trap flies are homozygous viable and are phenotypically indistinguishable from wild type flies, indicating that the corresponding GFP fusion protein is fully functional and faithfully reflects the expression pattern and subcellular localization of Baz isoforms PA and PC. The BAC line integrates the GFP within exon 10 between amino acid residues L1424 and Q1425 of isoform PA, giving rise to GFP fusion proteins for all four isoforms (Fig. S6) (Besson et al., 2015). Like the protein-trap GFP-Baz fusion protein, the Baz-GFP fusion protein in the BAC line is fully functional as it completely rescued lethality and fertility of the bazEH747 allele (Fig. S7D-D') and the baz815-8 allele (Besson et al., 2015)."

      *2) Figure 3D-G: The images for Baz-GFP nicely show that GFP is expressed in imaginal discs but not at NMJs. However, when brightness of Fig. 3D' and 3F' is increased nuclear envelopes, tracheal branches and some synaptic boutons are clearly visible in the Baz-GFP channels. These are likely background signals due to the staining procedure, but to avoid any confusion, images showing unstained (native) GFP fluorescence should be included to proof that there are no residual signals. GFP fluorescence survives formaldehyde fixation and many GFP exon traps are clearly visible even in the absence of immunofluorescent stainings. Furthermore, Fig. 3G appears vastly different compared to Fig. 3E and Baz localization at cell-cell junctions cannot be recognized by people unfamiliar with imaginal discs. The images in Fig. 3G are therefore not suitable and should be replaced. *

      We have added the new Fig. S4 showing the GFP signal without antibody staining of somatic body wall muscles and wing imaginal discs of larvae expressing the Baz-GFP BAC and GFP-Baz trap transgenes. We have also replaced Fig. 3G with images that can easily be compared with the images in Fig. 3E. The following paragraph was added to the Results section: "These findings were confirmed by analysis of fixed larval tissues that were imaged for GFP fluorescence without anti GFP antibody staining (Fig. S4). Neither in the Baz-GFP BAC line (Fig. S4A, A'), nor in the GFP-Baz trap line (Fig. S4C, C') any nuclear envelope or NMJ signal was detectable in somatic muscles, whereas junctional signal in wing imaginal discs was readily detectable in both lines (Fig. S4B, D)."

      *3) The argument that baz4 and baz815-8 carry second site mutations is not fully convincing (page 10, 13). Why should two independent baz alleles carry an additional hit that affect Spectrin levels? Other explanations might be possible. While downregulation of Baz in muscles by RNAi is a good approach to tackle the question of Spectrin localization and expression levels, RNAi itself has its own uncertainties. Why not showing the effect on Spectrin levels or the lack of Baz at the NMJ (or the nuclear envelopes) in "clean" baz null embryos or larvae (e.g. bazEH747/Df)? NMJs can be stained in late stage embryos or compound heterozygous null mutants quite frequently survive until larval stages. *

      We do not have a good explanation for the published reduction of Baz and a-Spectrin signal at the NMJ in larvae heterozygous for the baz alleles baz4 and baz815-8 (Ruiz-Canada et al., 2004; Ramachandran et al., 2009), as our analysis shows that Baz is not expressed there, rendering the reported phenotypes very difficult to explain. It is beyond the scope of our paper to proof that the data published by Ruiz-Canada et al. (2004) and Ramachandran et al. (2009) are indeed reproducible. Our speculation that second site hits on these two mutant chromosomes may have caused the published effects is just based on our own published observation that commonly used chromosomes with these two mutant baz alleles have stronger phenotypes than a clean baz loss-of-function allele (Shahab et al., 2015). We have changed the wording of the corresponding paragraph as follows: "It has been published that heterozygous baz4 mutant larvae show a significant decrease in immunofluorescence signal of Baz and also of Spectrin at the NMJ (Ruiz-Canada et al., 2004). Another publication showed a significant decrease in Baz and Spectrin immunostaining at the NMJ of larvae heterozygous for the baz815-8 allele (Ramachandran et al., 2009). We did not attempt to reproduce these findings. However, in our hands mitotic clones generated with FRT chromosomes carrying these latter two baz alleles showed polarity phenotypes in the follicular epithelium, whereas clones of the clean bazEH747 null allele did not show any polarity defect (Shahab et al., 2015), raising the possibility that the NMJ phenotypes observed by Ruiz-Canada et al. (2004) and Ramachandran et al. (2009) were caused by second site mutations on these chromosomes rather than by reduced Baz activity.

      bazEH747 hemizygous mutant embryos are so abnormal and malformed at late embryonic stages that we did not attempt to stain these for Baz immunoreactivity at NMJs.

      4) It is not really made clear in the manuscript, why the additional reactivity of the anti-Baz antibodies has not been noticed earlier. The paper should therefore include a summarizing paragraph that describes how the specificities of the antibodies have been tested in the past in the laboratories that used them. Have they never been tested in null mutant animals? In null mutants it should be obvious to determine, if some staining patterns do not disappear.

      The vast majority of publications on Baz including those from our own laboratory focused on the functions of Baz at junctions and in the control of cell polarity. For these functions the cortical localization of Baz is relevant, which has been shown to be specific in many independent studies using null alleles and RNAi. Only few publications, in particular those from the laboratory of Vivian Budnik, have focused on potential functions of Baz at the NMJ and the nuclear envelope. Why in these studies no convincing proof of the specificity of the signal at those "unconventional" locations has been provided is beyond our knowledge.

      5) Figure 4 is very difficult to comprehend and should be better labeled (e.g. anterior-posterior, dorsal-ventral, muscle fibers, unspecific signals). It is standard in the field to show ventral muscles 12, 13 or 6, 7 in the center of the image and in a similar orientation (anterior left, dorsal up). Better images should be shown.

      We understand that for researchers interested in the function of specific muscles it is important to adhere to conventions regarding the orientation of muscles in figures. However, in our case it is just relevant whether a muscle expresses RNAi against a gene of interest (GFP+) or not (GFP-) in order to compare the signal intensity for Baz and Spectrin in these two situations. Thus, although we appreciate the validity of this comment, we decided to leave the original images unchanged. However, to help the reader in identifying relevant structures more easily, we have added color-coded arrows and arrowheads to mark NMJs and nuclear envelopes in GFP+ and GFP- muscles.

      *Reviewer #3 (Significance (Required)):

      The authors provide a critical assessment on the specificity of antibodies and highlight the necessity to carefully test antibodies and the conclusions drawn from the resulting stainings, especially when antibodies are bought from companies or have previously been published as specific. This is extremely important for the interpretation of experiments in all fields of molecular and cellular biology. *

      We thank the reviewer for appreciating the importance of this work.

    1. There is no doubt that humans are an artistic species. We make music, television shows, and movies, plus we paint, draw, and sculpt. All of these things are art. Humans are able to think in the abstract. We imagine and create things that do not exist, such as unicorns, monsters, and superheroes. We also build upon the achievements of earlier periods to make art that is grounded in history but is also new.

      Human beings are naturally creative. We sing and use instruments, act out scenes, draw, paint, and express ourselves in unique ways that we may not even recognize as art. We use our imagination to portray fictional places and people, combine images to create a whole new composite image. We also admire or engage in some way with the art of others, past and present.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We would like to first thank the Editor as well as the three reviewers for their enthusiasm and conducting another careful evaluation of our manuscript. We appreciate their thoughtful and constructive comments and suggestions. Some concerns regarding experimental design, data analysis, and over-interpretation of our findings still remains unresolved after the initial revision. Here we endeavored to address these remaining concerns through further refinement of our writing, and inclusion of these concerns in the discussion session. We hope our response can better explain the rationale of our experimental design and data interpretation. In addition, we also acknowledge the limitations of our present study, so that it will benefit future investigations into this topic. Our detail responses are provided below.

      Reviewer #1 (Public Review):

      This study examines whether the human brain uses a hexagonal grid-like representation to navigate in a non-spatial space constructed by competence and trustworthiness. To test this, the authors asked human participants to learn the levels of competence and trustworthiness for six faces by associating them with specific lengths of bar graphs that indicate their levels in each trait. After learning, participants were asked to extrapolate the location from the partially observed morphing bar graphs. Using fMRI, the authors identified brain areas where activity is modulated by the angles of morphing trajectories in six-fold symmetry. The strength of this paper lies in the question it attempts to address. Specifically, the question of whether and how the human brain uses grid-like representations not only for spatial navigation but also for navigating abstract concepts, such as social space, and guiding everyday decision-making. This question is of emerging importance.

      I acknowledge the authors' efforts to address the comments received. However, my concerns persist:

      Thanks very much again for the re-evaluation and comments. Please find our revision plans to each comment below.

      (1) The authors contend that shorter reaction times correlated with increased distances between individuals in social space imply that participants construct and utilize two-dimensional representations. This method is adapted from a previous study by Park et al. Yet, there is a fundamental distinction between the two studies. In the prior work, participants learned relationships between adjacent individuals, receiving feedback on their decisions, akin to learning spatial locations during navigation. This setup leads to two different predictions: If participants rely on memory to infer relationships, recalling more pairs would be necessary for distant individuals than for closer ones. Conversely, if participants can directly gauge distances using a cognitive map, they would estimate distances between far individuals as quickly as for closer ones. Consequently, as the authors suggest, reaction times ought to decrease with increasing decision value, which, in this context, corresponds to distances. However, the current study allowed participants to compare all possible pairs without restricting learning experiences, rendering the application of the same methodology for testing two-dimensional representations inappropriate. In this study, the results could be interpreted as participants not forming and utilizing two-dimensional representations.

      We apologize for not being clear enough about our task design, we have made relevant changes in the methodology section in the manuscript to make it clearer. The reviewer’s concern is that participants learned about all the pairs in the comparison task which makes the distance effect invalid. We would like to clarify that during all the memory test tasks (the comparison task, the collect task and the recall task outside and inside scanner), participants never received feedback on whether their responses were correct or not. Therefore, the comparison task in our study is similar to the previous study by Park et al. (2021). Participants do not have access to correct responses for all possible pairs of comparison prior to or during this task, they would need to make inference based on memory retrieval.

      (2) The confounding of visual features with the value of social decision-making complicates the interpretation of this study's results. It remains unclear whether the observed grid-like effects are due to visual features or are genuinely indicative of value-based decision-making, as argued by the authors. Contrary to the authors' argument, this issue was not present in the previous study (Constantinescu et al.). In that study, participants associated specific stimuli with the identities of hidden items, but these stimuli were not linked to decision-making values (i.e., no image was considered superior to another). The current study's paradigm is more akin to that of Bao et al., which the authors mention in the context of RSA analysis. Indeed, Bao et al. controlled the length of the bars specifically to address the problem highlighted here. Regrettably, in the current paradigm, this conflation remains inseparable.

      We’d like to thank the reviewer for facilitating the discussion on the question of ‘social space’ vs. ‘sensory space’. The task in scanner did not require value-based decision making. It is akin to both the Bao et al. (2019) study and Constantinescu et al. (2016) study in a sense that all three tasks are trying to ask participants to imagine moving along a trajectory in an abstract, non-physical space and the trajectory is grounded in sensory cue. Participants were trained to associate the sensory cue with abstract (social/nonsocial) concepts. We think that the paradigm is a relatively faithful replication of the study by Constantinescu et al. Nonetheless, we agreed that a design similar to Bao et al. (2019) which controls for sensory confounds would be more ideal to address this concern, or adopting a value-based decision-making task in the scanner similar to that by Park et al. (2021), and we have included this limitation in the discussion section.

      (3) While the authors have responded to comments in the public review, my concerns noted in the Recommendation section remain unaddressed. As indicated in my recommendations, there are aspects of the authors' methodology and results that I find difficult to comprehend. Resolving these issues is imperative to facilitate an appropriate review in subsequent stages.

      Considering that the issues raised in the previous comments remain unresolved, I have retained my earlier comments below for review.

      We apologize for not addressing the recommendations properly, please find detailed our response and plans for revision.

      I have some comments. I hope that these can help.

      (1) While the explanation of Fig.4A-C is lacking in both the main text and figure legend, I am not sure if I understand this finding correctly. Did the authors find the effects of hexagonal modulation in the medial temporal gyrus and lingual gyrus correlate with the individual differences in the extent to which their reaction times were associated with the distances between faces when choosing a better collaborator? If so, I am not sure what argument the authors try to draw from these findings. Do the authors argue that these brain areas show hexagonal modulation, which was not supported in the previous analysis (Fig.3)? What is the level of correlation between these behavioral measures and the grid consistency effects in the vmPFC and EC, where the authors found actual grid-like activity? How do the authors interpret this finding? More importantly, how does this finding associate with other findings and the argument of the study?

      We apologize for not being clear enough in the manuscript and we will improve the clarity in our revision. This exploratory analysis reported in Figure 4 aims to use whole-brain analysis to examine: 1) if there is any correlation between the strength of grid-like representation of social value map and behavioral indicators of map-like representation; and 2) if there are any correlation between the strength of grid-like representation of this social value map and participants’ social trait.

      To be more specific, for the behavioral indicator, we used the distance effect in the reaction time of the comparison task outside the scanner. We interpreted stronger distance effect as a behavioral index of having better internal map-like representation. We interpreted stronger grid consistency effect as a neural index of better representation of the 2D social space. Therefore, we’d like to see if there exists correlation between behavioral and neural indices of map-like representation.

      To achieve this goal, behavioral indicators are entered as covariates in second-level analysis of the GLM testing grid consistency effect (GLM2). Figure3 showed results from GLM2 without the covariates. Figure4 showed results of clusters whose neural indices of map-like representation covaried with that from behavior and survived multiple-comparison correction. Indeed, in these regions, the grid consistency effect was not significant at group level (so not shown in Figure 3). We tried to interpret this finding in our discussion (line 374-289 for temporal lobe correlation, line 395-404 for precuneus correlation).

      Finally, we would like to point out that including the covariates in GLM2 did not change results in Figure3, the clusters in Figure3 still survives correction. Meanwhile, these clusters in Figure 3 did not show correlation with behavioral indicators of map-like representation.

      Author response image 1.

      (2) There are no behavioral results provided. How accurately did participants perform each of the tasks? How are the effects of grid consistency associated with the level of accuracy in the map test?

      Why did participants perform the recall task again outside the scanner?

      We will endeavor to improve signposting the corresponding figures in the main text. For the behavioral results, we reported the stats in section “Participants construct social value map after associative learning of avatars and corresponding characteristics” in the main text, and the plots are shown in Figure 1. Particularly, figure 1F showed accuracy of tasks in training, as well as the recall task in the scanner. For the correlation, we did not find significant correlation between behavioural accuracy and grid consistency effect. We will make it clearer in the result section.

      (3) The methods did not explain how the grid orientation was estimated and what the regressors were in GLM2. I don't think equations 2 and 3 are quite right.

      For the grid orientation estimation method, we provided detailed description in the Supplementary methods 2.2.2. We will add links to this section in the main text.

      Equation 2 and 3 describes how the parametric regressors entered into GLM2 were formed and provided prerequisites on calculation of grid orientations. Equation 2 was the results of directly applying the angle addition and subtraction theorems so they should be correct. We will try to make the rationale clearer in the supplementary text.

      (4) With the increase in navigation distances, more grid cells would activate. Therefore, in theory, the activity in the entorhinal cortex should increase with the Euclidean distances, which has not been found here. I wonder if there was enough variability in the Euclidean distances that can be captured by neural correlates. This would require including the distributions of Euclidean distances according to their trajectory angles. Regarding how Fig.1E is generated, I don't understand what this heat map indicates. Additionally, it needs to be confirmed if the grid effects remain while controlling for the Euclidean distances of navigation trajectories.

      We did not specifically control for the trajectory length, we only controlled for the distribution of trajectory to be uniform. We have included a figure of the distribution of Euclidean distances in Figure S9 and the distribution of trajectory direction in Figure S8.

      Author response image 2.

      As for Figure 1E, we aim to reproduce the findings from Figure 1F in Constantinescu et al. (2016) where they showed that participants progressively refined the locations of the outcomes through training. We divided the space into 15×15 subregions and computed the amount of time spent in each subregion and plotted Figure 1E. Brighter color in Figure 1E indicate greater amount of time spent in the corresponding subregion. Note that all these timing indices were computed as a percentage of the total time spent in the explore task in a given session. If participants were well-acquainted with the space and avatars, they would spend more time at the avatar (brighter color in avatar locations) in the review session compared to the learning session.

      As for the effect of distances on grid-like representation, we did not include the distance as a parametric modulator in grid consistency effect GLM (GLM2) due to insufficient trials in each bin (6-8 trials). But there is side evidence that could potentially rule out this confound. In the distance representation analysis, we did not find distance representation in any of the clusters that have significant grid-like representation (regions in Figure 2).

      Reviewer #2 (Public Review):

      Summary:

      In this work, Liang et al. investigate whether an abstract social space is neurally represented by a grid-like code. They trained participants to 'navigate' around a two-dimensional space of social agents characterized by the traits warmth and competence, then measured neural activity as participants imagined navigating through this space. The primary neural analysis consisted of three procedures: 1) identifying brain regions exhibiting the hexagonal modulation characteristic of a grid-like code, 2) estimating the orientation of each region's grid, and 3) testing whether the strength of the univariate neural signal increases when a participant is navigating in a direction aligned with the grid, compared to a direction that is misaligned with the grid. From these analyses, the authors find the clearest evidence of a grid-like code in the prefrontal cortex and weaker evidence in the entorhinal cortex.

      Strengths:

      The work demonstrates the existence of a grid-like neural code for a socially-relevant task, providing evidence that such coding schemes may be relevant for a variety of two-dimensional task spaces.

      Weaknesses:

      In the revised manuscript, the authors soften their claims about finding a grid code in the entorhinal cortex and provide additional caveats about limitations in their findings. It seems that the authors and reviewers are in agreement about the following weaknesses, which were part of my original review: Claims about a grid code in the entorhinal cortex are not well-supported by the analyses presented. The whole-brain analysis does not suggest that the entorhinal cortex exhibits hexagonal modulation; the strength of the entorhinal BOLD signal does not track the putative alignment of the grid code there; multivariate analyses do not reveal any evidence of a grid-like representational geometry.

      In the authors' response to reviews, they provide additional clarification about their exploratory analyses examining whether behavior (i.e., reaction times) and individual difference measures (i.e., social anxiety and avoidance) can be predicted by the hexagonal modulation strength in some region X, conditional on region X having a similar estimated grid alignment with some other region Y. My guess is that readers would find it useful if some of this language were included in the main text, especially with regard to an explanation regarding the rationale for these exploratory studies.

      Thank you very much again for your careful re-evaluation and suggestions. We have tried to improve our writing and incorporate the suggestions in the new revision.

      Reviewer #3 (Public Review):

      Liang and colleagues set out to test whether the human brain uses distance and grid-like codes in social knowledge using a design where participants had to navigate in a two-dimensional social space based on competence and warmth during an fMRI scan. They showed that participants were able to navigate the social space and found distance-based codes as well as grid-like codes in various brain regions, and the grid-like code correlated with behavior (reaction times).

      On the whole, the experiment is designed appropriately for testing for distant-based and grid-like codes, and is relatively well powered for this type of study, with a large amount of behavioral training per participant. They revealed that a number of brain regions correlated positively or negatively with distance in the social space, and found grid-like codes in the frontal polar cortex and posterior medial entorhinal cortex, the latter in line with prior findings on grid-like activity in entorhinal cortex. The current paper seems quite similar conceptually and in design to previous work, most notably Park et al., 2021, Nature Neuroscience.

      (1) The authors claim that this study provides evidence that humans use a spatial / grid code for abstract knowledge like social knowledge.

      This data does specifically not add anything new to this argument. As with almost all studies that test for a grid code in a similar "conceptual" space (not only the current study), the problem is that, when the space is not a uniform, square/circular space, and 2-dimensional then there is no reason the code will be perfectly grid like, i.e., show six-fold symmetry. In real world scenarios of social space (as well as navigation, semantic concepts), it must be higher dimensional - or at least more than two dimensional. It is unclear if this generalizes to larger spaces where not all part of the space is relevant. Modelling work from Tim Behrens' lab (e.g., Whittington et al., 2020) and Bradley Love's lab (e.g., Mok & Love, 2019) have shown/argued this to be the case. In experimental work, like in mazes from the Mosers' labs (e.g., Derdikman et al., 2009), or trapezoid environments from the O'Keefe lab (Krupic et al., 2015), there are distortions in mEC cells, and would not pass as grid cells in terms of the six-fold symmetry criterion.

      The authors briefly discuss the limitations of this at the very end but do not really say how this speaks to the goal of their study and the claim that social space or knowledge is organized as a grid code and if it is in fact used in the brain in their study and beyond. This issue deserves to be discussed in more depth, possibly referring to prior work that addressed this, and raise the issue for future work to address the problem - or if the authors think it is a problem at all.

      Thanks very much again for your careful re-evaluation and comments. We have tried to incorporate some of the suggested papers into our discussion. In summary, we agree that there is more to six-fold symmetric code that can be utilized to represent “conceptual space”. We think that the next step for a stronger claim would be to find the representation of more spontaneous non-spatial maps.

      References

      Bao, X., Gjorgieva, E., Shanahan, L. K., Howard, J. D., Kahnt, T., & Gottfried, J. A. (2019). Grid-like Neural Representations Support Olfactory Navigation of a Two-Dimensional Odor Space. Neuron, 102(5), 1066-1075 e1065. https://doi.org/10.1016/j.neuron.2019.03.034

      Constantinescu, A. O., O'Reilly, J. X., & Behrens, T. E. J. (2016). Organizing conceptual knowledge in humans with a gridlike code. Science, 352(6292), 1464-1468. https://doi.org/10.1126/science.aaf0941

      Park, S. A., Miller, D. S., & Boorman, E. D. (2021). Inferences on a multidimensional social hierarchy use a grid-like code. Nat Neurosci, 24(9), 1292-1301. https://doi.org/10.1038/s41593-02100916-3

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Strengths:

      The study was designed as a 6-month follow-up, with repeated behavioral and EEG measurements through disease development, providing valuable and interesting findings on AD progression and the effect of early-life choline supplantation. Moreover, the behavioral data that suggest an adverse effect of low choline in WT mice are interesting and important beyond the context of AD.

      Thank you for identifying several strengths.

      Weaknesses:

      (1) The multiple headings and subheadings, focusing on the experimental method rather than the narrative, reduce the readability.

      We have reduced the number of headings.

      (2) Quantification of NeuN and FosB in WT littermates is needed to demonstrate rescue of neuronal death and hyperexcitability by high choline supplementation and also to gain further insights into the adverse effect of low choline on the performance of WT mice in the behavioral test.

      We agree and have added WT data for the NeuN and ΔFosB analyses. These data are included in the text and figures. For NeuN, the Figure is Figure 6. For ΔFosB it is Figure 7. In brief, the high choline diet restored NeuN and ΔFosB to the levels of WT mice.

      Below is Figure 6 and its legend to show the revised presentation of data for NeuN. Afterwards is the revised figure showing data for ΔFosB. After that are the sections of the Results that have been revised.

      Author response image 1.

      Choline supplementation improved NeuN immunoreactivity (ir) in hilar cells in Tg2576 animals. A. Representative images of NeuN-ir staining in the anterior DG of Tg2576 animals. (1) A section from a Tg2576 mouse fed the low choline diet. The area surrounded by a box is expanded below. Red arrows point to NeuN-ir hilar cells. Mol=molecular layer, GCL=granule cell layer, HIL=hilus. Calibration for the top row, 100 µm; for the bottom row, 50 µm. (2) A section from a Tg2576 mouse fed the intermediate diet. Same calibrations as for 1. (3) A section from a Tg2576 mouse fed the high choline diet. Same calibrations as for 1. B. Quantification methods. Representative images demonstrate the thresholding criteria used to quantify NeuN-ir. (1) A NeuN-stained section. The area surrounded by the white box is expanded in the inset (arrow) to show 3 hilar cells. The 2 NeuN-ir cells above threshold are marked by blue arrows. The 1 NeuN-ir cell below threshold is marked by a green arrow. (2) After converting the image to grayscale, the cells above threshold were designated as red. The inset shows that the two cells that were marked by blue arrows are red while the cell below threshold is not. (3) An example of the threshold menu from ImageJ showing the way the threshold was set. Sliders (red circles) were used to move the threshold to the left or right of the histogram of intensity values. The final position of the slider (red arrow) was positioned at the onset of the steep rise of the histogram. C. NeuN-ir in Tg2576 and WT mice. Tg2576 mice had either the low, intermediate, or high choline diet in early life. WT mice were fed the standard diet (intermediate choline). (1) Tg2576 mice treated with the high choline diet had significantly more hilar NeuN-ir cells in the anterior DG compared to Tg2576 mice that had been fed the low choline or intermediate diet. The values for Tg2576 mice that received the high choline diet were not significantly different from WT mice, suggesting that the high choline diet restored NeuN-ir. (2) There was no effect of diet or genotype in the posterior DG, probably because the low choline and intermediate diet did not appear to lower hilar NeuN-ir.

      Author response image 2.

      Choline supplementation reduced ∆FosB expression in dorsal GCs of Tg2576 mice. A. Representative images of ∆FosB staining in GCL of Tg2576 animals from each treatment group. (1) A section from a low choline-treated mouse shows robust ∆FosB-ir in the GCL. Calibration, 100 µm. Sections from intermediate (2) and high choline (3)-treated mice. Same calibration as 1. B. Quantification methods. Representative images demonstrating the thresholding criteria established to quantify ∆FosB. (1) A ∆FosB -stained section shows strongly-stained cells (white arrows). (2) A strict thresholding criteria was used to make only the darkest stained cells red. C. Use of the strict threshold to quantify ∆FosB-ir. (1) Anterior DG. Tg2576 mice treated with the choline supplemented diet had significantly less ∆FosB-ir compared to the Tg2576 mice fed the low or intermediate diets. Tg2576 mice fed the high choline diet were not significantly different from WT mice, suggesting a rescue of ∆FosB-ir. (2) There were no significant differences in ∆FosB-ir in posterior sections. D. Methods are shown using a threshold that was less strict. (1) Some of the stained cells that were included are not as dark as those used for the strict threshold (white arrows). (2) All cells above the less conservative threshold are shown in red. E. Use of the less strict threshold to quantify ∆FosB-ir. (1) Anterior DG. Tg2576 mice that were fed the high choline diet had less ΔFosB-ir pixels than the mice that were fed the other diets. There were no differences from WT mice, suggesting restoration of ∆FosB-ir by choline enrichment in early life. (2) Posterior DG. There were no significant differences between Tg2576 mice fed the 3 diets or WT mice.

      Results, Section C1, starting on Line 691:

      “To ask if the improvement in NeuN after MCS in Tg256 restored NeuN to WT levels we used WT mice. For this analysis we used a one-way ANOVA with 4 groups: Low choline Tg2576, Intermediate Tg2576, High choline Tg2576, and Intermediate WT (Figure 5C). Tukey-Kramer multiple comparisons tests were used as the post hoc tests. The WT mice were fed the intermediate diet because it is the standard mouse chow, and this group was intended to reflect normal mice. The results showed a significant group difference for anterior DG (F(3,25)=9.20; p=0.0003; Figure 5C1) but not posterior DG (F(3,28)=0.867; p=0.450; Figure 5C2). Regarding the anterior DG, there were more NeuN-ir cells in high choline-treated mice than both low choline (p=0.046) and intermediate choline-treated Tg2576 mice (p=0.003). WT mice had more NeuN-ir cells than Tg2576 mice fed the low (p=0.011) or intermediate diet (p=0.003). Tg2576 mice that were fed the high choline diet were not significantly different from WT (p=0.827).”

      Results, Section C2, starting on Line 722:

      “There was strong expression of ∆FosB in Tg2576 GCs in mice fed the low choline diet (Figure 7A1). The high choline diet and intermediate diet appeared to show less GCL ΔFosB-ir (Figure 7A2-3). A two-way ANOVA was conducted with the experimental group (Tg2576 low choline diet, Tg2576 intermediate choline diet, Tg2576 high choline diet, WT intermediate choline diet) and location (anterior or posterior) as main factors. There was a significant effect of group (F(3,32)=13.80, p=<0.0001) and location (F(1,32)=8.69, p=0.006). Tukey-Kramer post-hoc tests showed that Tg2576 mice fed the low choline diet had significantly greater ΔFosB-ir than Tg2576 mice fed the high choline diet (p=0.0005) and WT mice (p=0.0007). Tg2576 mice fed the low and intermediate diets were not significantly different (p=0.275). Tg2576 mice fed the high choline diet were not significantly different from WT (p>0.999). There were no differences between groups for the posterior DG (all p>0.05).”

      “∆FosB quantification was repeated with a lower threshold to define ∆FosB-ir GCs (see Methods) and results were the same (Figure 7D). Two-way ANOVA showed a significant effect of group (F(3,32)=14.28, p< 0.0001) and location (F(1,32)=7.07, p=0.0122) for anterior DG but not posterior DG (Figure 7D). For anterior sections, Tukey-Kramer post hoc tests showed that low choline mice had greater ΔFosB-ir than high choline mice (p=0.0024) and WT mice (p=0.005) but not Tg2576 mice fed the intermediate diet (p=0.275); Figure 7D1). Mice fed the high choline diet were not significantly different from WT (p=0.993; Figure 7D1). These data suggest that high choline in the diet early in life can reduce neuronal activity of GCs in offspring later in life. In addition, low choline has an opposite effect, suggesting low choline in early life has adverse effects.”

      (3) Quantification of the discrimination ratio of the novel object and novel location tests can facilitate the comparison between the different genotypes and diets.

      We have added the discrimination index for novel object location to the paper. The data are in a new figure: Figure 3. In brief, the results for discrimination index are the same as the results done originally, based on the analysis of percent of time exploring the novel object.

      Below is the new Figure and legend, followed by the new text in the Results.

      Author response image 3.

      Novel object location results based on the discrimination index. A. Results are shown for the 3 months-old WT and Tg2576 mice based on the discrimination index. (1) Mice fed the low choline diet showed object location memory only in WT. (2) Mice fed the intermediate diet showed object location memory only in WT. (3) Mice fed the high choline diet showed memory both for WT and Tg2576 mice. Therefore, the high choline diet improved memory in Tg2576 mice. B. The results for the 6 months-old mice are shown. (1-2) There was no significant memory demonstrated by mice that were fed either the low or intermediate choline diet. (3) Mice fed a diet enriched in choline showed memory whether they were WT or Tg2576 mice. Therefore, choline enrichment improved memory in all mice.

      Results, Section B1, starting on line 536:

      “The discrimination indices are shown in Figure 3 and results led to the same conclusions as the analyses in Figure 2. For the 3 months-old mice (Figure 3A), the low choline group did not show the ability to perform the task for WT or Tg2576 mice. Thus, a two-way ANOVA showed no effect of genotype (F(1,74)=0.027, p=0.870) or task phase (F(1,74)=1.41, p=0.239). For the intermediate diet-treated mice, there was no effect of genotype (F(1,50)=0.3.52, p=0.067) but there was an effect of task phase (F(1,50)=8.33, p=0.006). WT mice showed a greater discrimination index during testing relative to training (p=0.019) but Tg2576 mice did not (p=0.664). Therefore, Tg2576 mice fed the intermediate diet were impaired. In contrast, high choline-treated mice performed well. There was a main effect of task phase (F(1,68)=39.61, p=<0.001) with WT (p<0.0001) and Tg2576 mice (p=0.0002) showing preference for the moved object in the test phase. Interestingly, there was a main effect of genotype (F(1,68)=4.50, p=0.038) because the discrimination index for WT training was significantly different from Tg2576 testing (p<0.0001) and Tg2576 training was significantly different from WT testing (p=0.0003).”

      “The discrimination indices of 6 months-old mice led to the same conclusions as the results in Figure 2. There was no evidence of discrimination in low choline-treated mice by two-way ANOVA (no effect of genotype, (F(1,42)=3.25, p=0.079; no effect of task phase, F(1,42)=0.278, p=0.601). The same was true of mice fed the intermediate diet (genotype, F(1,12)=1.44, p=0.253; task phase, F(1,12)=2.64, p=0.130). However, both WT and Tg2576 mice performed well after being fed the high choline diet (effect of task phase, (F(1,52)=58.75, p=0.0001, but not genotype (F(1,52)=1.197, p=0.279). Tukey-Kramer post-hoc tests showed that both WT (p<0.0001) and Tg2576 mice that had received the high choline diet (p=0.0005) had elevated discrimination indices for the test session.”

      (4) The longitudinal analyses enable the performance of multi-level correlations between the discrimination ratio in NOR and NOL, NeuN and Fos levels, multiple EEG parameters, and premature death. Such analysis can potentially identify biomarkers associated with AD progression. These can be interesting in different choline supplementation, but also in the standard choline diet.

      We agree and added correlations to the paper in a new figure (Figure 9). Below is Figure 9 and its legend. Afterwards is the new Results section.

      Author response image 4.

      Correlations between IIS, Behavior, and hilar NeuN-ir. A. IIS frequency over 24 hrs is plotted against the preference for the novel object in the test phase of NOL. A greater preference is reflected by a greater percentage of time exploring the novel object. (1) The mice fed the high choline diet (red) showed greater preference for the novel object when IIS were low. These data suggest IIS impaired object location memory in the high choline-treated mice. The low choline-treated mice had very weak preference and very few IIS, potentially explaining the lack of correlation in these mice. (2) There were no significant correlations for IIS and NOR. However, there were only 4 mice for the high choline group, which is a limitation. B. IIS frequency over 24 hrs is plotted against the number of dorsal hilar cells expressing NeuN. The dorsal hilus was used because there was no effect of diet on the posterior hilus. (1) Hilar NeuN-ir is plotted against the preference for the novel object in the test phase of NOL. There were no significant correlations. (2) Hilar NeuN-ir was greater for mice that had better performance in NOR, both for the low choline (blue) and high choline (red) groups. These data support the idea that hilar cells contribute to object recognition (Kesner et al. 2015; Botterill et al. 2021; GoodSmith et al. 2022).

      Results, Section F, starting on Line 801:

      “F. Correlations between IIS and other measurements

      As shown in Figure 9A, IIS were correlated to behavioral performance in some conditions. For these correlations, only mice that were fed the low and high choline diets were included because mice that were fed the intermediate diet did not have sufficient EEG recordings in the same mouse where behavior was studied. IIS frequency over 24 hrs was plotted against the preference for the novel object in the test phase (Figure 9A). For NOL, IIS were significantly less frequent when behavior was the best, but only for the high choline-treated mice (Pearson’s r, p=0.022). In the low choline group, behavioral performance was poor regardless of IIS frequency (Pearson’s r, p=0.933; Figure 9A1). For NOR, there were no significant correlations (low choliNe, p=0.202; high choline, p=0.680) but few mice were tested in the high choline-treated mice (Figure 9B2).

      We also tested whether there were correlations between dorsal hilar NeuN-ir cell numbers and IIS frequency. In Figure 9B, IIS frequency over 24 hrs was plotted against the number of dorsal hilar cells expressing NeuN. The dorsal hilus was used because there was no effect of diet on the posterior hilus. For NOL, there was no significant correlation (low choline, p=0.273; high choline, p=0.159; Figure 9B1). However, for NOR, there were more NeuN-ir hilar cells when the behavioral performance was strongest (low choline, p=0.024; high choline, p=0.016; Figure 9B2). These data support prior studies showing that hilar cells, especially mossy cells (the majority of hilar neurons), contribute to object recognition (Botterill et al. 2021; GoodSmith et al. 2022).”

      We also noted that all mice were not possible to include because they died or other reasons, such a a loss of the headset (Results, Section A, Lines 463-464): Some mice were not possible to include in all assays either because they died before reaching 6 months or for other reasons.

      Reviewer #2 (Public Review):

      Strengths:

      The strength of the group was the ability to monitor the incidence of interictal spikes (IIS) over the course of 1.2-6 months in the Tg2576 Alzheimer's disease model, combined with meaningful behavioral and histological measures. The authors were able to demonstrate MCS had protective effects in Tg2576 mice, which was particularly convincing in the hippocampal novel object location task.

      We thank the Reviewer for identifying several strengths.

      Weaknesses:

      Although choline deficiency was associated with impaired learning and elevated FosB expression, consistent with increased hyperexcitability, IIS was reduced with both low and high choline diets. Although not necessarily a weakness, it complicates the interpretation and requires further evaluation.

      We agree and we revised the paper to address the evaluations that were suggested.

      Reviewer #1 (Recommendations For The Authors):

      (1) A reference directing to genotyping of Tg2576 mice is missing.

      We apologize for the oversight and added that the mice were genotyped by the New York University Mouse Genotyping core facility.

      Methods, Section A, Lines 210-211: “Genotypes were determined by the New York University Mouse Genotyping Core facility using a protocol to detect APP695.”

      (2) Which software was used to track the mice in the behavioral tests?

      We manually reviewed videos. This has been clarified in the revised manuscript. Methods, Section B4, Lines 268-270: Videos of the training and testing sessions were analyzed manually. A subset of data was analyzed by two independent blinded investigators and they were in agreement.

      (3) Unexpectedly, a low choline diet in AD mice was associated with reduced frequency of interictal spikes yet increased mortality and spontaneous seizures. The authors attribute this to postictal suppression.

      We did not intend to suggest that postictal depression was the only cause. It was a suggestion for one of many potential explanations why seizures would influence IIS frequency. For postictal depression, we suggested that postictal depression could transiently reduce IIS. We have clarified the text so this is clear (Discussion, starting on Line 960):

      If mice were unhealthy, IIS might have been reduced due to impaired excitatory synaptic function. Another reason for reduced IIS is that the mice that had the low choline diet had seizures which interrupted REM sleep. Thus, seizures in Tg2576 mice typically started in sleep. Less REM sleep would reduce IIS because IIS occur primarily in REM. Also, seizures in the Tg2576 mice were followed by a depression of the EEG (postictal depression; Supplemental Figure 3) that would transiently reduce IIS. A different, radical explanation is that the intermediate diet promoted IIS rather than low choline reducing IIS. Instead of choline, a constituent of the intermediate diet may have promoted IIS.

      However, reduced spike frequency is already evident at 5 weeks of age, a time point with a low occurrence of premature death. A more comprehensive analysis of EEG background activity may provide additional information if the epileptic activity is indeed reduced at this age.

      We did not intend to suggest that premature death caused reduced spike frequency. We have clarified the paper accordingly. We agree that a more in-depth EEG analysis would be useful but is beyond the scope of the study.

      (4) Supplementary Fig. 3 depicts far more spikes / 24 h compared to Fig. 7B (at least 100 spikes/24h in Supplementary Fig. 3 and less than 10 spikes/24h in Fig. 7B).

      We would like to clarify that before and after a seizure the spike frequency is unusually high. Therefore, there are far more spikes than prior figures.

      We clarified this issue by adding to the Supplemental Figure more data. The additional data are from mice without a seizure, showing their spikes are low in frequency.

      All recordings lasted several days. We included the data from mice with a seizure on one of the days and mice without any seizures. For mice with a seizure, we graphed IIS frequency for the day before, the day of the seizure, and the day after. For mice without a seizure, IIS frequency is plotted for 3 consecutive days. When there was a seizure, the day before and after showed high numbers of spikes. When there was no seizure on any of the 3 days, spikes were infrequent on all days.

      The revised figure and legend are shown below. It is Supplemental Figure 4 in the revised submission.

      Author response image 5.

      IIS frequency before and after seizures. A. Representative EEG traces recorded from electrodes implanted in the skull over the left frontal cortex, right occipital cortex, left hippocampus (Hippo) and right hippocampus during a spontaneous seizure in a 5 months-old Tg2576 mouse. Arrows point to the start (green arrow) and end of the seizure (red arrow), and postictal depression (blue arrow). B. IIS frequency was quantified from continuous video-EEG for mice that had a spontaneous seizure during the recording period and mice that did not. IIS frequency is plotted for 3 consecutive days, starting with the day before the seizure (designated as day 1), and ending with the day after the seizure (day 3). A two-way RMANOVA was conducted with the day and group (mice with or without a seizure) as main factors. There was a significant effect of day (F(2,4)=46.95, p=0.002) and group (seizure vs no seizure; F(1,2)=46.01, p=0.021) and an interaction of factors (F(2,4)=46.68, p=0.002)..Tukey-Kramer post-hoc tests showed that mice with a seizure had significantly greater IIS frequencies than mice without a seizure for every day (day 1, p=0.0005; day 2, p=0.0001; day 3, p=0.0014). For mice with a seizure, IIS frequency was higher on the day of the seizure than the day before (p=0.037) or after (p=0.010). For mice without a seizure, there were no significant differences in IIS frequency for day 1, 2, or 3. These data are similar to prior work showing that from one day to the next mice without seizures have similar IIS frequencies (Kam et al., 2016).

      In the text, the revised section is in the Results, Section C, starting on Line 772:

      “At 5-6 months, IIS frequencies were not significantly different in the mice fed the different diets (all p>0.05), probably because IIS frequency becomes increasingly variable with age (Kam et al. 2016). One source of variability is seizures, because there was a sharp increase in IIS during the day before and after a seizure (Supplemental Figure 4). Another reason that the diets failed to show differences was that the IIS frequency generally declined at 5-6 months. This can be appreciated in Figure 8B and Supplemental Figure 6B. These data are consistent with prior studies of Tg2576 mice where IIS increased from 1 to 3 months but then waxed and waned afterwards (Kam et al., 2016).”

      (5) The data indicating the protective effect of high choline supplementation are valuable, yet some of the claims are not completely supported by the data, mainly as the analysis of littermate WT mice is not complete.

      We added WT data to show that the high choline diet restored cell loss and ΔFosB expression to WT levels. These data strengthen the argument that the high choline diet was valuable. See the response to Reviewer #1, Public Review Point #2.

      • Line 591: "The results suggest that choline enrichment protected hilar neurons from NeuN loss in Tg2576 mice." A comparison to NeuN expression in WT mice is needed to make this statement.

      These data have been added. See the response to Reviewer #1, Public Review Point #2.

      • Line 623: "These data suggest that high choline in the diet early in life can reduce hyperexcitability of GCs in offspring later in life. In addition, low choline has an opposite effect, again suggesting this maternal diet has adverse effects." Also here, FosB quantification in WT mice is needed.

      These data have been added. See the response to Reviewer #1, Public Review Point #2.

      (7) Was the effect of choline associated with reduced tauopathy or A levels?

      The mice have no detectable hyperphosphorylated tau. The mice do have intracellular A before 6 months. This is especially the case in hilar neurons, but GCs have little (Criscuolo et al., eNeuro, 2023). However, in neurons that have reduced NeuN, we found previously that antibodies generally do not work well. We think it is because the neurons become pyknotic (Duffy et al., 2015), a condition associated with oxidative stress which causes antigens like NeuN to change conformation due to phosphorylation. Therefore, we did not conduct a comparison of hilar neurons across the different diets.

      (8) Since the mice were tested at 3 months and 6 months, it would be interesting to see the behavioral difference per mouse and the correlation with EEG recording and immunohistological analyses.

      We agree that would be valuable and this has been added to the paper. Please see response to Reviewer #1, Public Review Point #4.

      Reviewer #2 (Recommendations For The Authors):

      There were several areas that could be further improved, particularly in the areas of data analysis (particularly with images and supplemental figures), figure presentation, and mechanistic speculation.

      Major points:

      (1) It is understandable that, for the sake of labor and expense, WT mice were not implanted with EEG electrodes, particularly since previous work showed that WT mice have no IIS (Kam et al. 2016). However, from a standpoint of full factorial experimental design, there are several flaws - purists would argue are fatal flaws. First, the lack of WT groups creates underpowered and imbalanced groups, constraining statistical comparisons and likely reducing the significance of the results. Also, it is an assumption that diet does not influence IIS in WT mice. Secondly, with a within-subject experimental design (as described in Fig. 1A), 6-month-old mice are not naïve if they have previously been tested at 3 months. Such an experimental design may reduce effect size compared to non-naïve mice. These caveats should be included in the Discussion. It is likely that these caveats reduce effect size and that the actual statistical significance, were the experimental design perfect, would be higher overall.

      We agree and have added these points to the Limitations section of the Discussion. Starting on Line 1050: In addition, groups were not exactly matched. Although WT mice do not have IIS, a WT group for each of the Tg2576 groups would have been useful. Instead, we included WT mice for the behavioral tasks and some of the anatomical assays. Related to this point is that several mice died during the long-term EEG monitoring of IIS.

      (2) Since behavior, EEG, NeuN and FosB experiments seem to be done on every Tg2576 animal, it seems that there are missed opportunities to correlate behavior/EEG and histology on a per-mouse basis. For example, rather than speculate in the discussion, why not (for example) directly examine relationships between IIS/24 hours and FosB expression?

      We addressed this point above in responding to Reviewer #1, Public Review Point #4.

      (3) Methods of image quantification should be improved. Background subtraction should be considered in the analysis workflow (see Fig. 5C and Fig. 6C background). It would be helpful to have a Methods figure illustrating intermediate processing steps for both NeuN and FosB expression.

      We added more information to improve the methods of quantification. We did use a background subtraction approach where ImageJ provides a histogram of intensity values, and it determines when there is a sharp rise in staining relative to background. That point is where we set threshold. We think it is a procedure that has the least subjectivity.

      We added these methods to the Methods section and expanded the first figure about image quantification, Figure 6B. That figure and legend are shown above in response to Reviewer #1, Point #2.

      This is the revised section of the Methods, Section C3, starting on Line 345:

      “Photomicrographs were acquired using ImagePro Plus V7.0 (Media Cybernetics) and a digital camera (Model RET 2000R-F-CLR-12, Q-Imaging). NeuN and ∆FosB staining were quantified from micrographs using ImageJ (V1.44, National Institutes of Health). All images were first converted to grayscale and in each section, the hilus was traced, defined by zone 4 of Amaral (1978). A threshold was then calculated to identify the NeuN-stained cell bodies but not background. Then NeuN-stained cell bodies in the hilus were quantified manually. Note that the threshold was defined in ImageJ using the distribution of intensities in the micrograph. A threshold was then set using a slider in the histogram provided by Image J. The slider was pushed from the low level of staining (similar to background) to the location where staining intensity made a sharp rise, reflecting stained cells. Cells with labeling that was above threshold were counted.”

      (4) This reviewer is surprised that the authors do not speculate more about ACh-related mechanisms. For example, choline deficiency would likely reduce Ach release, which could have the same effect on IIS as muscarinic antagonism (Kam et al. 2016), and could potentially explain the paradoxical effects of a low choline diet on reducing IIS. Some additional mechanistic speculation would be helpful in the Discussion.

      We thank the Reviewer for noting this so we could add it to the Discussion. We had not because we were concerned about space limitations.

      The Discussion has a new section starting on Line 1009:

      “Choline and cholinergic neurons

      There are many suggestions for the mechanisms that allow MCS to improve health of the offspring. One hypothesis that we are interested in is that MCS improves outcomes by reducing IIS. Reducing IIS would potentially reduce hyperactivity, which is significant because hyperactivity can increase release of A. IIS would also be likely to disrupt sleep since it represents aberrant synchronous activity over widespread brain regions. The disruption to sleep could impair memory consolidation, since it is a notable function of sleep (Graves et al. 2001; Poe et al. 2010). Sleep disruption also has other negative consequences such as impairing normal clearance of A (Nedergaard and Goldman 2020). In patients, IIS and similar events, IEDs, are correlated with memory impairment (Vossel et al. 2016).

      How would choline supplementation in early life reduce IIS of the offspring? It may do so by making BFCNs more resilient. That is significant because BFCN abnormalities appear to cause IIS. Thus, the cholinergic antagonist atropine reduced IIS in vivo in Tg2576 mice. Selective silencing of BFCNs reduced IIS also. Atropine also reduced elevated synaptic activity of GCs in young Tg2576 mice in vitro. These studies are consistent with the idea that early in AD there is elevated cholinergic activity (DeKosky et al. 2002; Ikonomovic et al. 2003; Kelley et al. 2014; Mufson et al. 2015; Kelley et al. 2016), while later in life there is degeneration. Indeed, the chronic overactivity could cause the degeneration.

      Why would MCS make BFCNs resilient? There are several possibilities that have been explored, based on genes upregulated by MCS. One attractive hypothesis is that neurotrophic support for BFCNs is retained after MCS but in aging and AD it declines (Gautier et al. 2023). The neurotrophins, notably nerve growth factor (NGF) and brain-derived neurotrophic factor (BDNF) support the health of BFCNs (Mufson et al. 2003; Niewiadomska et al. 2011).”

      Minor points:

      (1) The vendor is Dyets Inc., not Dyets.

      Thank you. This correction has been made.

      (2) Anesthesia chamber not specified (make, model, company).

      We have added this information to the Methods, Section D1, starting on Line 375: The animals were anesthetized by isoflurane inhalation (3% isoflurane. 2% oxygen for induction) in a rectangular transparent plexiglas chamber (18 cm long x 10 cm wide x 8 cm high) made in-house.

      (3) It is not clear whether software was used for the detection of behavior. Was position tracking software used or did blind observers individually score metrics?

      We have added the information to the paper. Please see the response to Reviewer #1, Recommendations for Authors, Point #2.

      (4) It is not clear why rat cages and not a true Open Field Maze were used for NOL and NOR.

      We used mouse cages because in our experience that is what is ideal to detect impairments in Tg2576 mice at young ages. We think it is why we have been so successful in identifying NOL impairments in young mice. Before our work, most investigators thought behavior only became impaired later. We would like to add that, in our experience, an Open Field Maze is not the most common cage that is used.

      (5) Figure 1A is not mentioned.

      It had been mentioned in the Introduction. Figure B-D was the first Figure mentioned in the Results so that is why it might have been missed. We now have added it to the first section of the Results, Line 457, so it is easier to find.

      6) Although Fig 7 results are somewhat complicated compared to Fig. 5 and 6 results, EEG comes chronologically earlier than NeuN and FosB expression experiments.

      We have kept the order as is because as the Reviewer said, the EEG is complex. For readability, we have kept the EEG results last.

      (7) Though the statistical analysis involved parametric and nonparametric tests, It is not clear which normality tests were used.

      We have added the name of the normality tests in the Methods, Section E, Line 443: Tests for normality (Shapiro-Wilk) and homogeneity of variance (Bartlett’s test) were used to determine if parametric statistics could be used. We also added after this sentence clarification: When data were not normal, non-parametric data were used. When there was significant heteroscedasticity of variance, data were log transformed. If log transformation did not resolve the heteroscedasticity, non-parametric statistics were used. Because we added correlations and analysis of survival curves, we also added the following (starting on Line 451): For correlations, Pearson’s r was calculated. To compare survival curves, a Log rank (Mantel-Cox) test was performed.

      Figures:

      (1) In Fig. 1A, Anatomy should be placed above the line.

      We changed the figure so that the word “Anatomy” is now aligned, and the arrow that was angled is no longer needed.

      In Fig. 1C and 1D, the objects seem to be moved into the cage, not the mice. This schematic does not accurately reflect the Fig. 1C and 1D figure legend text.

      Thank you for the excellent point. The figure has been revised. We also updated it to show the objects more accurately.

      Please correct the punctuation in the Fig. 1D legend.

      Thank you for mentioning the errors. We corrected the legend.

      For ease of understanding, Fig. 1C and 1D should have training and testing labeled in the figure.

      Thank you for the suggestion. We have revised the figure as suggested.

      Author response image 6.

      (2) In Figure 2, error bars for population stats (bar graphs) are not obvious or missing. Same for Figure 3.

      We added two supplemental figures to show error bars, because adding the error bars to the existing figures made the symbols, colors, connecting lines and error bars hard to distinguish. For novel object location (Fig. 2) the error bars are shown in Supp. Fig. 2. For novel object recognition, the error bars are shown in Supplemental Fig. 3.

      (3) The authors should consider a Methods figure for quantification of NeuN and deltaFOSB (expansions of Fig. 5C and Fig. 6C).

      Please see Reviewer #1, Public Review Point #2.

      (4) In Figure 5, A should be omitted and mentioned in the Methods/figure legend. B should be enlarged. C should be inset, zoomed-in images of the hilus, with an accompanying analysis image showing a clear reduction in NeuN intensity in low choline conditions compared to intermediate and high choline conditions. In D, X axes could delineate conditions (figure legend and color unnecessary). Figure 5C should be moved to a Methods figure.

      We thank the review for the excellent suggestions. We removed A as suggested. We expanded B and included insets. We used different images to show a more obvious reduction of cells for the low choline group. We expanded the Methods schematics. The revised figure is Figure 6 and shown above in response to Reviewer 1, Public Review Point #2.

      (5) In Figure 6, A should be eliminated and mentioned in the Methods/figure legend. B should be greatly expanded with higher and lower thresholds shown on subsequent panels (3x3 design).

      We removed A as suggested. We expanded B as suggested. The higher and lower thresholds are shown in C. The revised figure is Figure 7 and shown above in response to Reviewer 1, Public Review Point #2.

      (6) In Figure 7, A2 should be expanded vertically. A3 should be expanded both vertically and horizontally. B 1 and 2 should be increased, particularly B1 where it is difficult to see symbols. Perhaps colored symbols offset/staggered per group so that the spread per group is clearer.

      We added a panel (A4) to show an expansion of A2 and A3. However, we did not see that a vertical expansion would add information so we opted not to add that. We expanded B1 as suggested but opted not to expand B2 because we did not think it would enhance clarity. The revised figure is below.

      Author response image 7.

      (7) Supplemental Figure 1 could possibly be combined with Figure 1 (use rounded corner rat cage schematic for continuity).

      We opted not to combine figures because it would make one extremely large figure. As a result, the parts of the figure would be small and difficult to see.

      (8) Supplemental Figure 2 - there does not seem to be any statistical analysis associated with A mentioned in the Results text.

      We added the statistical information. It is now Supplemental Figure 4:

      Author response image 8.

      Mortality was high in mice treated with the low choline diet. A. Survival curves are shown for mice fed the low choline diet and mice fed the high choline diet. The mice fed the high choline diet had a significantly less severe survival curve. B. Left: A photo of a mouse after sudden unexplained death. The mouse was found in a posture consistent with death during a convulsive seizure. The area surrounded by the red box is expanded below to show the outstretched hindlimb (red arrow). Right: A photo of a mouse that did not die suddenly. The area surrounded by the box is expanded below to show that the hindlimb is not outstretched.

      The revised text is in the Results, Section E, starting on Line 793:

      “The reason that low choline-treated mice appeared to die in a seizure was that they were found in a specific posture in their cage which occurs when a severe seizure leads to death (Supplemental Figure 5). They were found in a prone posture with extended, rigid limbs (Supplemental Figure 5). Regardless of how the mice died, there was greater mortality in the low choline group compared to mice that had been fed the high choline diet (Log-rank (Mantel-Cox) test, Chi square 5.36, df 1, p=0.021; Supplemental Figure 5A).”

      Also, why isn't intermediate choline also shown?

      We do not have the data from the animals. Records of death were not kept, regrettably.

      Perhaps labeling of male/female could also be done as part of this graph.

      We agree this would be very interesting but do not have all sex information.

      B is not very convincing, though it is understandable once one reads about posture.

      We have clarified the text and figure, as well as the legend. They are above.

      Are there additional animals that were seen to be in a specific posture?

      There are many examples, and we added them to hopefully make it more convincing.

      We also added posture in WT mice when there is a death to show how different it is.

      Is there any relationship between seizures detected via EEG, as shown in Supplemental Figure 3, and death?

      Several mice died during a convulsive seizure, which is the type of seizure that is shown in the Supplemental Figure.

      (9) Supplemental Figure 3 seems to display an isolated case in which EEG-detected seizures correlate with increased IIEs. It is not clear whether there are additional documented cases of seizures that could be assembled into a meaningful population graph. If this data does not exist or is too much work to include in this manuscript, perhaps it can be saved for a future paper.

      We have added other cases and revised the graph. This is now Supplemental Figure 4 and is shown above in response to Reviewer #1, Recommendation for Authors Point #4.

      Frontal is misspelled.

      We checked and our copy is not showing a misspelling. However, we are very grateful to the Reviewer for catching many errors and reading the manuscript carefully.

      (10) Supplemental Figure 4 seems incomplete in that it does not include EEG data from months 4, 5, and 6 (see Fig. 7B).

      We have added data for these ages to the Supplemental Figure (currently Supplemental Figure 6) as part B. In part A, which had been the original figure, only 1.2, 2, and 3 months-old mice were shown because there were insufficient numbers of each sex at other ages. However, by pooling 1.2 and 2 months (Supplemental Figure 6B1), 3 and 4 months (B2) and 5 and 6 months (B3) we could do the analysis of sex. The results are the same – we detected no sex differences.

      Author response image 9.

      IIS frequency was similar for each sex. A. IIS frequency was compared for females and males at 1.2 months (1), 2 months (2), and 3 months (3). Two-way ANOVA was used to analyze the effects of sex and diet. Female and male Tg2576 mice were not significantly different. B. Mice were pooled at 1.2 and 2 months (1), 3 and 4 months (2) and 5 and 6 months (3). Two-way ANOVA analyzed the effects of sex and diet. There were significant effects of diet for (1) and (2) but not (3). There were no effects of sex at any age.

      (1) There were significant effects of diet (F(2,47)=46.21, p<0.0001) but not sex (F(1,47)=0.106, p=0.746). Female and male mice fed the low choline diet or high choline diet were significantly different from female and male mice fed the intermediate diet (all p<0.05, asterisk).

      (2) There were significant effects of diet (F(2,32)=10.82, p=0.0003) but not sex (F(1,32)=1.05, p=0.313). Both female and male mice of the low choline group were significantly different from male mice fed the intermediate diet (both p<0.05, asterisk) but no other pairwise comparisons were significant.

      (3) There were no significant differences (diet, F(2,23)=1.21, p=0.317); sex, F(1,23)=0.844, p=0.368).

      The data are discussed the Results, Section G, tarting on Line 843:

      In Supplemental Figure 6B we grouped mice at 1-2 months, 3-4 months and 5-6 months so that there were sufficient females and males to compare each diet. A two-way ANOVA with diet and sex as factors showed a significant effect of diet (F(2,47)=46.21; p<0.0001) at 1-2 months of age, but not sex (F1,47)=0.11, p=0.758). Post-hoc comparisons showed that the low choline group had fewer IIS than the intermediate group, and the same was true for the high choline-treated mice. Thus, female mice fed the low choline diet differed from the females (p<0.0001) and males (p<0.0001) fed the intermediate diet. Male mice that had received the low choline diet different from females (p<0.0001) and males (p<0.0001) fed the intermediate diet. Female mice fed the high choline diet different from females (p=0.002) and males (p<0.0001) fed the intermediate diet, and males fed the high choline diet difference from females (p<0.0001) and males (p<0.0001) fed the intermediate diet.

      For the 3-4 months-old mice there was also a significant effect of diet (F(2,32)=10.82, p=0.0003) but not sex (F(1,32)=1.05, p=0.313). Post-hoc tests showed that low choline females were different from males fed the intermediate diet (p=0.007), and low choline males were also significantly different from males that had received the intermediate diet (p=0.006). There were no significant effects of diet (F(2,23)=1.21, p=0.317) or sex (F(1,23)=0.84, p=0.368) at 5-6 months of age.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are very grateful to the reviewers for their constructive comments. Here is a summary of the main changes we made from the previous manuscript version, based on the reviewers’ comments:

      (1) Introduction of a new model, based on a Markov chain, capturing within-trial evolution in search strategy .

      (2) Addition of a new figure investigating inter-animal variations in search strategy.

      (3) Measurement of model fit consistency across 10 simulation repetitions, to prevent the risk of model overfitting.

      (4) Several clarifications have been made in the main text (Results, Discussion, Methods) and figure legends.

      (5) We now provide processed data and codes for analyses and models at GitHub repository

      (6) Simplification of the previous modeling. We realized that the two first models in the previous manuscript version were simply special cases of the third model. Therefore, we retained only the third model, which has been renamed as the ‘mixture model’.

      (7) Modification of Figure 4-6 and Supplementary Figure 7-8 (or their creation) to reflect the aforementioned changes

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors design an automated 24-well Barnes maze with 2 orienting cues inside the maze, then model what strategies the mice use to reach the goal location across multiple days of learning. They consider a set of models and conclude that one of these models, a combined strategy model, best explains the experimental data.

      This study is written concisely and the results presented concisely. The best fit model is reasonably simple and fits the experimental data well (at least the summary measures of the data that were presented).

      Major points:

      (1) One combined strategy (once the goal location is learned) that might seem to be reasonable would be that the animal knows roughly where the goal is, but not exactly where, so it first uses a spatial strategy just to get to the first vestibule, then switches to a serial strategy until it reaches the correct vestibule. How well would such a strategy explain the data for the later sessions? The best combined model presented in the manuscript is one in which the animal starts with a roughly 50-50 chance of a serial (or spatial strategy) from the start vestibule (i.e. by the last session before the reversal the serial and spatial strategies are at ~50-50m in Fig. 5d). Is it the case that even after 15 days of training the animal starts with a serial strategy from its starting point approximately half of the time? The broader point is whether additional examination of the choices made by the animal, combined with consideration of a larger range of possible models, would be able to provide additional insight into the learning and strategies the animal uses.

      Our analysis focused on the evolution of navigation strategies across days and trials. The reviewer raises the interesting possibility that navigation strategy might evolve in a specific manner within each trial, especially on the later days once the environment is learned. To address this possibility, we first examined how some of the statistical distributions, previously analyzed across days, evolved within trials. Consistent with the reviewer’s intuition, the statistical distributions changed within trials, suggesting a specific strategy evolution within trials. Second, we developed a new model, where strategies are represented as nodes of a Markov chain. This model allows potential strategy changes after each vestibule visit, according to a specific set of transition probabilities. Vestibules are chosen based on the same stochastic processes as in the previous model. This new model could be fitted to the experimental distributions and captured both the within-trial evolution and the global distributions. Interestingly, the trials were mostly initiated in the random strategy (~67% chance) and to a lesser extent in the spatial strategy (~25% chance), but rarely in the serial strategy (~8% chance). This new model is presented in Figure 6.

      (2) To clarify, in the Fig. 4 simulations, is the "last" vestibule visit of each trial, which is by definition 0, not counted in the plots of Fig. 4b? Otherwise, I would expect that vestibule 0 is overrepresented because a trial always ends with Vi = 0.

      The last vestibule visit (vestibule 0 by definition) is counted in the plots of Fig.4b. We initially shared the same concern as the reviewer. However, upon further consideration, we arrived at the following explanation: A factor that might lead to an overrepresentation of vestibule 0 is the fact that, unlike other vestibules, it has to be contained in each trial, as trials terminated upon the selection of vestibule 0. Conversely, a factor that might contribute to an underrepresentation of vestibule 0 is that, unlike other vestibules, it cannot be counted more than once per trial. Somehow these two factors seem to counterbalance each other, resulting in no discernible overrepresentation or underrepresentation of vestibule 0 in the random process. 

      Reviewer #2 (Public Review):

      This paper uses a novel maze design to explore mouse navigation behaviour in an automated analogue of the Barnes maze. Overall I find the work to be solid, with the cleverly designed maze/protocol to be its major strength - however there are some issues that I believe should be addressed and clarified.

      (1) Whilst I'm generally a fan of the experimental protocol, the design means that internal odor cues on the maze change from trial to trial, along with cues external to the maze such as the sounds and visual features of the recording room, ultimately making it hard for the mice to use a completely allocentric spatial 'place' strategy to navigate. I do not think there is a way to control for these conflicts between reference frames in the statistical modelling, but I do think these issues should be addressed in the discussion.

      It should be pointed out that all cues on the maze (visual, tactile, odorant) remained unchanged across trials, since the maze was rotated together with goal and guiding cues. Furthermore, the maze was equipped with an opaque cover to prevent mice from seeing the surrounding room (the imaging of mouse trajectories was achieved using infrared light and camera). It is however possible that some other cues such as room sounds and odors could be perceived and somewhat interfered with the sensory cues provided inside the maze. We have now mentioned this possibility in the discussion.

      (2) Somewhat related - I could not find how the internal maze cues are moved for each trial to demarcate the new goal (i.e. the luminous cues) ? This should be clarified in the methods.

      The luminous cues were fixed to the floor of the arena. Consequently, they rotated along with the arena as a unified unit, depicted in figure 1. We have added some clarifications in Figure 1 legend and methods.

      (3) It appears some data is being withheld from Figures 2&3? E.g. Days 3/4 from Fig 2b-f and Days 1-5 on for Fig 3. Similarly, Trials 2-7 are excluded from Fig 3. If this is the case, why? It should be clarified in the main text and Figure captions, preferably with equivalent plots presenting all the data in the supplement.

      The statistical distributions for all single days/trials are shown in the color-coded panels of Figure2&3. In the line plots of Figure2&3, we show only the overlay of 2-3 lines for the sake of clarity. The days/trials represented were chosen to capture the dynamic range of variability within the distributions. We have added this information in the figure legends.

      (4) I strongly believe the data and code should be made freely available rather than "upon reasonable request".

      Matrices of processed data and various codes for simulations and analyses are now available at https://github.com/ sebiroyerlab/Vestibule_sequences.

      Reviewer #3 (Public Review):

      Royer et al. present a fully automated variant of the Barnes maze to reduce experimenter interference and ensure consistency across trials and subjects. They train mice in this maze over several days and analyze the progression of mouse search strategies during the course of the training. By fitting models involving stochastic processes, they demonstrate that a model combined of the random, spatial, and serial processes can best account for the observed changes in mice's search patterns. Their findings suggest that across training days the spatial strategy (using local landmarks) was progressively employed, mostly at the expense of the random strategy, while the serial strategy (consecutive nearby vestibule check) is reinforced from the early stages of training. Finally, they discuss potential mechanistic underpinnings within brain systems that could explain such behavioral adaptation and flexibility.

      Strength:

      The development of an automated Barnes maze allows for more naturalistic and uninterrupted behavior, facilitating the study of spatial learning and memory, as well as the analysis of the brain's neural networks during behavior when combined with neurophysiological techniques. The system's design has been thoughtfully considered, encompassing numerous intricate details. These details include the incorporation of flexible options for selecting start, goal, and proximal landmark positions, the inclusion of a rotating platform to prevent the accumulation of olfactory cues, and careful attention given to atomization, taking into account specific considerations such as the rotation of the maze without causing wire shortage or breakage. When combined with neurophysiological manipulations or recordings, the system provides a powerful tool for studying spatial navigation system.

      The behavioral experiment protocols, along with the analysis of animal behavior, are conducted with care, and the development of behavioral modeling to capture the animal's search strategy is thoughtfully executed. It is intriguing to observe how the integration of these innovative stochastic models can elucidate the evolution of mice's search strategy within a variant of the Barnes maze.

      Weakness:

      (1) The development of the well-thought-out automated Barnes maze may attract the interest of researchers exploring spatial learning and memory. However, this aspect of the paper lacks significance due to insufficient coverage of the materials and methods required for readers to replicate the behavioral methodology for their own research inquiries.

      Moreover, as discussed by the authors, the methodology favors specialists who utilize wired recordings or manipulations (e.g. optogenetics) in awake, behaving rodents. However, it remains unclear how the current maze design, which involves trapping mice in start and goal positions and incorporating angled vestibules resulting in the addition of numerous corners, can be effectively adapted for animals with wired implants.

      The reviewer is correct in pointing out that the current maze design is not suitable for performing experiments with wired implant, particularly due to the maze’s enclosed structure and the access to the start/goal boxes through side holes. Instead, pharmacogenetics and wireless approaches for optogenetic and electrophysiology would need to be used. We have now mentioned this limitation in the discussion.

      (2) Novelty: In its current format, the main axis of the paper falls on the analysis of animal behavior and the development of behavioral modeling. In this respect, while it is interesting to see how thoughtfully designed models can explain the evolution of mice search strategy in a maze, the conclusions offer limited novel findings that align with the existing body of research and prior predictions.

      We agree with the reviewer that our study is weakly connected to previous researches on hippocampus and spatial navigation, as it consists mainly of animal behavior analysis and modeling and addresses a relatively unexplored topic. We hope that the combination of our behavioral approach with optogenetic and electrophysiology will allow in the future new insights that are in line with the existing body of research.

      (3) Scalability and accessibility: While the approach may be intriguing to experts who have an interest in or are familiar with the Barnes maze, its presentation seems to primarily target this specific audience. Therefore, there is a lack of clarity and discussion regarding the scalability of behavioral modeling to experiments involving other search strategies (such as sequence or episodic learning), other animal models, or the potential for translational applications. The scalability of the method would greatly benefit a broader scientific community. In line with this view, the paper's conclusions heavily rely on the development of new models using custom-made codes. Therefore, it would be advantageous to make these codes readily available, and if possible, provide access to the processed data as well. This could enhance comprehension and enable a larger audience to benefit from the methodology.

      The current approach might indeed extend to other species in equivalent environments and might also constitute a general proof of principle regarding the characterization of animal behaviors by the mixing of stochastic processes. We have now mentioned these points in the discussion.

      As suggest by the reviewer, we have now provided model/simulation codes and processed data to replicate the figures, at https://github.com/sebiroyerlab/Vestibule_sequences

      (4) Cross-validation of models: The authors have not implemented any measures to mitigate the risk of overfitting in their modeling. It would have been beneficial to include at least some form of cross-validation with stochastic models to address this concern. Additionally, the paper lacks the presence of analytics or measures that assess and compare the performance of the models.

      To avoid the risk of model overfitting, the most appropriate solution appeared to be repeating the simulations several times and examining the consistency of the obtained parameters across repetitions. For the mixture model, we now show in Supplementary figure 7 the probabilities obtained from 10 repetitions of the simulation. Similarly, for the Markov chain model, the probabilities obtained from 10 repetitions of the simulation are shown in Figure 6.

      Regarding model comparison, we have simplified our mixture model into only one model, as we realized the 2 other models in the previous manuscript version were simply special cases of the 3rd model. Nevertheless, comparison was still needed for the estimation for the best value of N (the number of consecutive segments that a strategy lasts) in the mixture model. We now show the comparison of mean square errors obtained for different values of N, using t-test across 10 repetitions of the simulations (Figure 5c).

      (5) Quantification of inter-animal variations in strategy development: It is important to investigate, and address the argument concerning the possibility that not all animals recruit and develop the three processes (random, spatial, and serial) in a similar manner over days of training. It would be valuable to quantify the transition in strategy across days for each individual mouse and analyze how the population average, reflecting data from individual mice, corresponds to these findings. Currently, there is a lack of such quantification and analysis in the paper.

      We have added a figure (Supplementary figure 8) showing the mixture model matching analyses for individual animals. A lot of variability is indeed observed across animals, with some animals displaying strong preferences for certain strategies compare to others. The average across mouse population showed a similar trend as the result obtained with the pooled data.

      Recommendations for the authors:

      Summary of Reviewer Comments:

      (1) In its present form, the manuscript lacks sufficient coverage of the materials and methods necessary for readers to replicate the behavioral methodology in their own research inquiries. For instance, it would be beneficial to clarify how the cues are rotated relative to the goal.

      (2) The models may be over-fitted, leading to spurious conclusions, and cross-validation is necessary to rule out this possibility.

      (3) The specific choice of the three strategies used to fit behavior in this model should be better justified, as other strategies may account for the observed behavior.

      (4) The study would benefit from an analysis of behavior on an animal-by-animal basis, potentially revealing individual differences in strategies.

      (5) Spatial behavior is not necessarily fully allocentric in this task, as only the two cues in the arena can be used for spatial orientation, unlike odor cues on the floor and sound cues in the room. This should be discussed.

      (6) Making the data and code fully open source would greatly strengthen the impact of this study.

      In addition, each reviewer has raised both major and minor concerns which should be addressed if possible.

      Reviewer #1 (Recommendations For The Authors):

      Minor points:

      (1) Change "tainted" to "tinted" in Fig. 1a

      (2) Should note explicitly in Fig. 2d that the goal is at vestibule 0, and also in the legend

      (3) Fig. 3 legend should say "c-e)", not "c-f)"

      (4) Supplementary Fig. 8 legend repeats "d)" twice

      Reviewer #2 (Recommendations For The Authors):

      Packard & McGaugh 1996 is cited twice as refs 5 and 14

      Reviewer #3 (Recommendations For The Authors):

      - Figure 3: Please correct the labels referenced as "c-f)" in the figure's legend.

      - Rounding numbers issue on page 4: 82.62% + 17.37% equals 99.99%, not 100%.

      We fixed all minor points. We are very thankful to the reviewers for their constructive comments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This study uses carefully designed experiments to generate a useful behavioural and neuroimaging dataset on visual cognition. The results provide solid evidence for the involvement of higher-order visual cortex in processing visual oddballs and asymmetry. However, the evidence provided for the very strong claims of homogeneity as a novel concept in vision science, separable from existing concepts such as target saliency, is inadequate.

      We appreciate the positive and balanced assessment from the reviewers. We agree that visual homogeneity is similar to existing concepts such as target saliency. We have tried our best to articulate our rationale for defining it as a novel concept. However, the debate about whether visual homogeneity is novel or related to existing concepts is completely beside the point, since that is not the key contribution of our study.

      Our key contribution is our quantitative model for how the brain could be solving generic visual tasks by operating on a feature space. In the literature there are no theories regarding the decision-making process by which the brain could be solving generic visual tasks. In fact, oddball search tasks, same-different tasks and symmetry tasks are never even mentioned in the same study because it is tacitly assumed that the underlying processes are completely different! Our work brings together these disparate tasks by proposing a specific computation that enables the brain to solve both types of tasks and providing evidence for it. This specific computation is a well-defined, falsifiable model that will need to be replicated, elaborated and refined by future studies.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors define a new metric for visual displays, derived from psychophysical response times, called visual homogeneity (VH). They attempt to show that VH is explanatory of response times across multiple visual tasks. They use fMRI to find visual cortex regions with VH-correlated activity. On this basis, they declare a new visual region in the human brain, area VH, whose purpose is to represent VH for the purpose of visual search and symmetry tasks.

      Thank you for your concise summary. We appreciate your careful reading and thoughtful and constructive comments.

      Strengths:

      The authors present carefully designed experiments, combining multiple types of visual judgments and multiple types of visual stimuli with concurrent fMRI measurements. This is a rich dataset with many possibilities for analysis and interpretation.

      Thank you for your accurate assessment of the strengths of our study.

      Weaknesses:

      The datasets presented here should provide a rich basis for analysis. However, in this version of the manuscript, I believe that there are major problems with the logic underlying the authors' new theory of visual homogeneity (VH), with the specific methods they used to calculate VH, and with their interpretation of psychophysical results using these methods. These problems with the coherency of VH as a theoretical construct and metric value make it hard to interpret the fMRI results based on searchlight analysis of neural activity correlated with VH.

      We appreciate your concerns, and have tried our best to respond to them fully against your specific concerns below.

      In addition, the large regions of VH correlations identified in Experiments 1 and 2 vs. Experiments 3 and 4 are barely overlapping. This undermines the claim that VH is a universal quantity, represented in a newly discovered area of the visual cortex, that underlies a wide variety of visual tasks and functions.

      We agree with you that the VH regions defined using symmetry task and search task do not overlap completely (as we have shown in Figure S13). However this is to be expected for several reasons. First, the images in the symmetry task were presented at fixation, whereas the images in the visual search task were presented peripherally. Second, the lack of overlap could be due to variations across individuals. Indeed, considerable individual variability has been observed in the location of category-selective regions such as VWFA (Glezer and Riesenhuber 2013) and FFA (Weiner and Grill-Spector, 2012). We propose that testing the same participants on both search and symmetry tasks would reveal overlapping VH regions. We now acknowledge these issues in the Results (p. 26).

      Maybe I have missed something, or there is some flaw in my logic. But, absent that, I think the authors should radically reconsider their theory, analyses, and interpretations, in light of the detailed comments below, to make the best use of their extensive and valuable datasets combining behavior and fMRI. I think doing so could lead to a much more coherent and convincing paper, albeit possibly supporting less novel conclusions.

      We appreciate your concerns. We have tried our best to respond to them fully against your specific concerns below.

      THEORY AND ANALYSIS OF VH

      (1) VH is an unnecessary, complex proxy for response time and target-distractor similarity. VH is defined as a novel visual quality, calculable for both arrays of objects (as studied in Experiments 1-3) and individual objects (as studied in Experiment 4). It is derived from a center-to-distance calculation in a perceptual space. That space in turn is derived from the multi-dimensional scaling of response times for target-distractor pairs in an oddball detection task (Experiments 1 and 2) or in a same-different task (Experiments 3 and 4).

      The above statements are not entirely correct. Experiments 1 & 3 are oddball visual search experiments. Their purpose was to estimate the underlying perceptual space of objects.

      Proximity of objects in the space is inversely proportional to response times for arrays in which they were paired. These response times are higher for more similar objects. Hence, proximity is proportional to similarity. This is visible in Fig. 2B as the close clustering of complex, confusable animal shapes.

      VH, i.e. distance-to-center, for target-present arrays, is calculated as shown in Fig. 1C, based on a point on the line connecting the target and distractors. The authors justify this idea with previous findings that responses to multiple stimuli are an average of responses to the constituent individual stimuli. The distance of the connecting line to the center is inversely proportional to the distance between the two stimuli in the pair, as shown in Fig. 2D. As a result, VH is inversely proportional to the distance between the stimuli and thus to stimulus similarity and response times. But this just makes VH a highly derived, unnecessarily complex proxy for target-distractor similarity and response time. The original response times on which the perceptual space is based are far more simple and direct measures of similarity for predicting response times.

      We agree that VH brings no explanatory power to target-present searches, since target-present response times are a direct estimate of target-distractor similarity. However, we are additionally explaining target-absent response times. Target-absent response times are well known to vary systematically with image properties, but why they do so have not been clear in the literature.

      Our key conceptual advance lies in relating the neural response to a search array to the neural response of the constituent elements, and in proposing a decision variable using which participants can make both target-present and target-absent judgements on any search array.

      (2) The use of VH derived from Experiment 1 to predict response times in Experiment 2 is circular and does not validate the VH theory.

      The use of VH, a response time proxy, to predict response times in other, similar tasks, using the same stimuli, is circular. In effect, response times are being used to predict response times across two similar experiments using the same stimuli. Experiment 1 and the target present condition of Experiment 2 involve the same essential task of oddball detection. The results of Experiment 1 are converted into VH values as described above, and these are used to predict response times in Experiment 2 (Fig. 2F). Since VH is a derived proxy for response values in Experiment 1, this prediction is circular, and the observed correlation shows only consistency between two oddball detection tasks in two experiments using the same stimuli.

      We agree that it would be circular to use oddball search times in Experiment 1 to explain only target-present search times in Experiment 2, since they basically involve the same searches. However, we are explaining both target-present and target-absent search times in a unified framework; systematic variations in target-absent search times have been noted in the literature but never really explained. One could still simply say that target-absent search times are some function of the target-present search times, but this still doesn’t provide an explanation for how participants are making target-present and absent decisions. The existing literature contains models for how visual search might occur for a specific target and distractor but does not elucidate how participants might perform generic visual search where target and distractors are not known in advance.

      Our key conceptual advance lies in relating the neural response to a search array to the neural response of the constituent elements, and in proposing a decision variable using which participants can make both target-present and target-absent judgements on any search array.

      (3) The negative correlation of target-absent response times with VH as it is defined for target-absent arrays, based on the distance of a single stimulus from the center, is uninterpretable without understanding the effects of center-fitting. Most likely, center-fitting and the different VH metrics for target-absent trials produce an inverse correlation of VH with target-distractor similarity.

      We see no cause for concern with the center-fitting procedure, for several reasons. First, the best-fitting center remained stable despite many randomly initialized starting points. Second, the best-fitting center derived from one set of objects was able to predict the target-absent and target-present responses of another set of objects. Finally, the VH obtained for each object (i.e. distance from the best-fitting center) is strongly correlated with the average distance of that object from all other objects (Figure S1A). We have now clarified this in the Results (p. 11).

      The construction of the VH perceptual space also involves fitting a "center" point such that distances to center predict response times as closely as possible. The effect of this fitting process on distance-to-center values for individual objects or clusters of objects is unknowable from what is presented here. These effects would depend on the residual errors after fitting response times with the connecting line distances. The center point location and its effects on the distance-to-center of single objects and object clusters are not discussed or reported here.

      While it is true that the optimal center needs to be found by fitting to the data, there no particular mystery to the algorithm: we are simply performing a standard gradient-descent to maximize the fit to the data. We have described the algorithm clearly and are making our codes public. We find the algorithm to yield stable optimal centers despite many randomly initialized starting points. We find the optimal center to be able to predict responses to entirely novel images that were excluded during model training. We are making no assumption about the location of centre with respect to individual points. Therefore, we see no cause for concern regarding the center-finding algorithm.

      Yet, this uninterpretable distance-to-center of single objects is chosen as the metric for VH of target-absent displays (VHabsent). This is justified by the idea that arrays of a single stimulus will produce an average response equal to one stimulus of the same kind. However, it is not logically clear why response strength to a stimulus should be a metric for homogeneity of arrays constructed from that stimulus, or even what homogeneity could mean for a single stimulus from this set. It is not clear how this VHabsent metric based on single stimuli can be equated to the connecting line VH metric for stimulus pairs, i.e. VHpresent, or how both could be plotted on a single continuum.

      Most visual tasks, such as finding an animal, are thought to involve building a decision boundary on some underlying neural representation. Even visual search has been portrayed as a signal-detection problem where a particular target is to be discriminated from a distractor. However none of these formulations work in the case of generic visual tasks, where the target and distractor identities are unknown. We are proposing that, when we view a search array, the neural response to the search array can be deduced from the neural responses to the individual elements using well known rules, and that decisions about an oddball target being present or absent can be made by computing the distance of this neural response from some canonical mean firing rate of a population of neurons. This distance to center computation is what we denote as visual homogeneity. We have revised our manuscript throughout to make this clearer and we hope that this helps you understand the logic better.

      It is clear, however, what should be correlated with difficulty and response time in the target-absent trials, and that is the complexity of the stimuli and the numerosity of similar distractors in the overall stimulus set. The complexity of the target, similarity with potential distractors, and the number of such similar distractors all make ruling out distractor presence more difficult. The correlation seen in Fig. 2G must reflect these kinds of effects, with higher response times for complex animal shapes with lots of similar distractors and lower response times for simpler round shapes with fewer similar distractors.

      You are absolutely correct that the stimulus complexity should matter, but there are no good measures for stimulus complexity. But considering what factors are correlated with target-absent response times is entirely different from asking what decision variable or template is being used by participants to solve the task.

      The example points in Fig. 2G seem to bear this out, with higher response times for the deer stimulus (complex, many close distractors in the Fig. 2B perceptual space) and lower response times for the coffee cup (simple, few close distractors in the perceptual space). While the meaning of the VH scale in Fig. 2G, and its relationship to the scale in Fig. 2F, are unknown, it seems like the Fig. 2G scale has an inverse relationship to stimulus complexity, in contrast to the expected positive relationship for Fig. 2F. This is presumably what creates the observed negative correlation in Fig. 2G.

      Taken together, points 1-3 suggest that VHpresent and VHabsent are complex, unnecessary, and disconnected metrics for understanding target detection response times. The standard, simple explanation should stand. Task difficulty and response time in target detection tasks, in both present and absent trials, are positively correlated with target-distractor similarity.

      Respectfully, we disagree with your assessment. Your last point is not logically consistent though: response times for target-absent trials cannot be correlated with any target-distractor similarity since there is no target in the first place in a target-absent array. We have shown that target-absent response times are in fact, independent of experimental context, which means that they index an image property that is independent of any reference target (Results, p. 15; Section S4). This property is what we define as visual homogeneity.

      I think my interpretations apply to Experiments 3 and 4 as well, although I find the analysis in Fig. 4 especially hard to understand. The VH space in this case is based on Experiment 3 oddball detection in a stimulus set that included both symmetric and asymmetric objects. However, the response times for a very different task in Experiment 4, a symmetric/asymmetric judgment, are plotted against the axes derived from Experiment 3 (Fig. 4F and 4G). It is not clear to me why a measure based on oddball detection that requires no use of symmetry information should be predictive of within-stimulus symmetry detection response times. If it is, that requires a theoretical explanation not provided here.

      We are using an oddball detection task to estimate perceptual dissimilarity between objects, and construct the underlying perceptual representation of both symmetric and asymmetric objects. This enabled us to then ask if some distance-to-center computation can explain response times in a symmetry detection task, and obtain an answer in the affirmative. We have reworked the text to make this clear.

      (4) Contrary to the VH theory, same/different tasks are unlikely to depend on a decision boundary in the middle of a similarity or homogeneity continuum.

      We have provided empirical proof for our claims, by showing that target-present response times in a visual search task are correlated with “different” responses in the same-different task, and that target-absent response times in the visual search task are correlated with “same” responses in the same-different task (Section S3).

      The authors interpret the inverse relationship of response times with VHpresent and VHabsent, described above, as evidence for their theory. They hypothesize, in Fig. 1G, that VHpresent and VHabsent occupy a single scale, with maximum VHpresent falling at the same point as minimum VHabsent. This is not borne out by their analysis, since the VHpresent and VHabsent value scales are mainly overlapping, not only in Experiments 1 and 2 but also in Experiments 3 and 4. The authors dismiss this problem by saying that their analyses are a first pass that will require future refinement. Instead, the failure to conform to this basic part of the theory should be a red flag calling for revision of the theory.

      We respectfully disagree – by no means did we dismiss this problem! In fact, we have explicitly acknowledged this by saying that VH does not explain all the variance in the response times, but nonetheless explains substantial variance and might form the basis for an initial guess or a fast response. The remaining variance might be explained by processes that involve more direct scrutiny. Please see Results, page 10 & 22.

      The reason for this single scale is that the authors think of target detection as a boundary decision task, along a single scale, with a decision boundary somewhere in the middle, separating present and absent. This model makes sense for decision dimensions or spaces where there are two categories (right/left motion; cats vs. dogs), separated by an inherent boundary (equal left/right motion; training-defined cat/dog boundary). In these cases, there is less information near the boundary, leading to reduced speed/accuracy and producing a pattern like that shown in Fig. 1G.

      The key conceptual advance of our study is that we show that even target/present, same/different or symmetry judgements can be fit into the standard decision-making framework.

      This logic does not hold for target detection tasks. There is no inherent middle point boundary between target present and target absent. Instead, in both types of trials, maximum information is present when the target and distractors are most dissimilar, and minimum information is present when the target and distractors are most similar. The point of greatest similarity occurs at the limit of any metric for similarity. Correspondingly, there is no middle point dip in information that would produce greater difficulty and higher response times. Instead, task difficulty and response times increase monotonically with the similarity between targets and distractors, for both target present and target absent decisions. Thus, in Figs. 2F and 2G, response times appear to be highest for animals, which share the largest numbers of closely similar distractors.

      Unfortunately, your logic does not boil down to any quantitative account, since you are using vague terms like “maximum information”. Further, any argument based solely on item similarity to explain visual search or symmetry responses cannot explain systematic variations observed for target-absent arrays and for symmetric objects, for the reasons below.

      If target-distractor dissimilarity were the sole driver of response times, target-absent judgements should always take the longest time since the target and distractor have zero similarity, with no variation from one image to another. This account does not explain why target-absent response times vary so systematically.

      Similarly, if symmetry judgements are solely based on comparing the dissimilarity between two halves of an object, there should be no variation in the response times of symmetric objects since the dissimilarity between their two halves is zero. However we do see systematic variation in the response times to symmetric objects.

      DEFINITION OF AREA VH USING fMRI

      (1) The area VH boundaries from different experiments are nearly completely non-overlapping.

      In line with their theory that VH is a single continuum with a decision boundary somewhere in the middle, the authors use fMRI searchlight to find an area whose responses positively correlate with homogeneity, as calculated across all of their target present and target absent arrays. They report VH-correlated activity in regions anterior to LO. However, the VH defined by symmetry Experiments 3 and 4 (VHsymmetry) is substantially anterior to LO, while the VH defined by target detection Experiments 1 and 2 (VHdetection) is almost immediately adjacent to LO. Fig. S13 shows that VHsymmetry and VHdetection are nearly non-overlapping. This is a fundamental problem with the claim of discovering a new area that represents a new quantity that explains response times across multiple visual tasks. In addition, it is hard to understand why VHsymmetry does not show up in a straightforward subtraction between symmetric and asymmetric objects, which should show a clear difference in homogeneity. • Actually VHsymmetry is apparent even in a simple subtraction between symmetric and asymmetric objects (Figure S10). The VH regions identified using the visual search task and symmetry task have a partial overlap, not zero overlap as you are incorrectly claiming.

      We have noted that it is not straightforward to interpret the overlap, since there are many confounding factors. One reason could simply be that the stimuli in the symmetry task were presented at fixation, whereas the visual search arrays contained items exclusively in the periphery. Another that the participants in the two tasks were completely different, and the lack of overlap is simply due to inter-individual variability. Testing the same participants in two tasks using similar stimuli would be ideal but this is outside the scope of this study. We have acknowledged these issues in the Results (p. 26) and in the Supplementary Material (Section S8).

      (2) It is hard to understand how neural responses can be correlated with both VHpresent and VHabsent.

      The main paper results for VHdetection are based on both target-present and target-absent trials, considered together. It is hard to interpret the observed correlations, since the VHpresent and VHabsent metrics are calculated in such different ways and have opposite correlations with target similarity, task difficulty, and response times (see above). It may be that one or the other dominates the observed correlations. It would be clarifying to analyze correlations for target-present and target-absent trials separately, to see if they are both positive and correlated with each other.

      Thanks. The positive correlation between VH and neural response holds even when we do the analysis separately for target-present and -absent searches (correlation between neural response in VH region and visual homogeneity (n = 32, r = 0.66, p < 0.0005 for target-present searches & n = 32, r = 0.56, p < 0.005 for target-absent searches).

      (3) The definition of the boundaries and purpose of a new visual area in the brain requires circumspection, abundant and convergent evidence, and careful controls.

      Even if the VH metric, as defined and calculated by the authors here, is a meaningful quantity, it is a bold claim that a large cortical area just anterior to LO is devoted to calculating this metric as its major task. Vision involves much more than target detection and symmetry detection. The cortex anterior to LO is bound to perform a much wider range of visual functionalities. If the reported correlations can be clarified and supported, it would be more circumspect to treat them as one byproduct of unknown visual processing in the cortex anterior to LO, rather than treating them as the defining purpose for a large area of the visual cortex.

      We totally agree with you that reporting a new brain region would require careful interpretation and abundant and converging evidence. However, this requires many studies worth of work, and historically category-selective regions like the FFA have achieved consensus only after they were replicated and confirmed across many studies. We believe our proposal for the computation of a quantity like visual homogeneity is conceptually novel, and our study represents a first step that provides some converging evidence (through replicable results across different experiments) for such a region. We have reworked our manuscript to make this point clearer (Discussion, p 32).

      Reviewer #2 (Public Review):

      Summary:

      This study proposes visual homogeneity as a novel visual property that enables observers perform to several seemingly disparate visual tasks, such as finding an odd item, deciding if two items are the same, or judging if an object is symmetric. In Experiment 1, the reaction times on several objects were measured in human subjects. In Experiment 2, the visual homogeneity of each object was calculated based on the reaction time data. The visual homogeneity scores predicted reaction times. This value was also correlated with the BOLD signals in a specific region anterior to LO. Similar methods were used to analyze reaction time and fMRI data in a symmetry detection task. It is concluded that visual homogeneity is an important feature that enables observers to solve these two tasks.

      Strengths:

      (1) The writing is very clear. The presentation of the study is informative.

      (2) This study includes several behavioral and fMRI experiments. I appreciate the scientific rigor of the authors.

      We are grateful to you for your balanced assessment and constructive comments.

      Weaknesses:

      (1) My main concern with this paper is the way visual homogeneity is computed. On page 10, lines 188-192, it says: "we then asked if there is any point in this multidimensional representation such that distances from this point to the target-present and target-absent response vectors can accurately predict the target-present and target-absent response times with a positive and negative correlation respectively (see Methods)". This is also true for the symmetry detection task. If I understand correctly, the reference point in this perceptual space was found by deliberating satisfying the negative and positive correlations in response times. And then on page 10, lines 200-205, it shows that the positive and negative correlations actually exist. This logic is confusing. The positive and negative correlations emerge only because this method is optimized to do so. It seems more reasonable to identify the reference point of this perceptual space independently, without using the reaction time data. Otherwise, the inference process sounds circular. A simple way is to just use the mean point of all objects in Exp 1, without any optimization towards reaction time data.

      We disagree with you since the same logic applies to any curve-fitting procedure. When we fit data to a straight line, we are finding the slope and intercept that minimizes the error between the data and the straight line, but we would hardly consider the process circular when a good fit is achieved – in fact we take it as a confirmation that the data can be fit linearly. In the same vein, we would not have observed a good fit to the data, if there did not exist any good reference point relative to which the distances of the target-present and target-absent search arrays predicted these response times.

      In Section S1, we have already reported that the visual homogeneity estimates for each object is strongly correlated with the average distance of each object to all other objects (r = 0.84, p<0.0005, Figure S1). Second, to confirm that the results we obtained are not due to overfitting, we have already reported a cross-validation analysis, where we removed all searches involving a particular image and predicted these response times using visual homogeneity. This too revealed a significant model correlation confirming that our results are not due to overfitting.

      (2) On page 11, lines 214-221. It says: "these findings are non-trivial for several reasons". However, the first reason is confusing. It is unclear to me why "it suggests that there are highly specific computations that can be performed on perceptual space to solve oddball tasks". In fact, these two sentences provide no specific explanation for the results.

      We have now revised the text to make it clearer (Results, p. 11).

      (3) The second reason is interesting. Reaction times in target-present trials can be easily explained by target-distractor similarity. But why does reaction time vary substantially across target-absent stimuli? One possible explanation is that the objects that are distant from the feature distribution elicit shorter reaction times. Here, all objects constitute a statistical distribution in the feature (perceptual) space. There is certainly a mean of this distribution. Some objects look like outliers and these outliers elicit shorter reaction times in the target-absent trials because outlier detection is very salient.

      One might argue that the above account is merely a rephrasing of the idea of visual homogeneity proposed in this study. If so, feature saliency is not a new account. In other words, the idea of visual homogeneity is another way of reiterating the old feature saliency theory.

      Thank you for this interesting point. We don’t necessarily see a contradiction. However, we are proposing a quantitative decision variable that the brain could be using to make target present/absent judgements.

      (4) One way to reject the feature saliency theory is to compare the reaction times of the objects that are very different from other objects (i.e., no surrounding objects in the perceptual space, e.g., the wheel in the lower right corner of Fig. 2B) with the objects that are surrounded by several similar objects (e.g., the horse in the upper part of Fig. 2B). Also, please choose the two objects with similar distance from the reference point. I predict that the latter will elicit longer reaction times because they can be easily confounded by surrounding similar objects (i.e., four-legged horses can be easily confounded by four-legged dogs). If the density of object distribution per se influences the visual homogeneity score, I would say that the "visual homogeneity" is essentially another way of describing the distributional density of the perceptual space.

      We agree with you, and we have indeed found that visual homogeneity estimates from our model are highly correlated with the average distance of an object relative to all other objects. However, we performed several additional experiments to elucidate the nature of target-absent response times. We find that they are unaffected by whether these searches are performed in the midst of similar or dissimilar objects (Section S4, Experiment S6), and even when the same searches are performed among nearby sets of objects with completely uncorrelated average distances (Section S4, Experiment S7). We have now reworked the text to make this clearer.

      (5) The searchlight analysis looks strange to me. One can easily perform a parametric modulation by setting visual homogeneity as the trial-by-trial parametric modulator and reaction times as a covariate. This parametric modulation produces a brain map with the correlation of every voxel in the brain. On page 17 lines 340-343, it is unclear to me what the "mean activation" is.

      We have done something similar. For each region we took the mean activation at each voxel as the average activation 3x3x3 voxel neighborhood in the brain, and took its correlation with visual homogeneity. We have now reworked this to make it clearer (Results, p. 16).

      Minor points

      (1) In the intro, it says: "using simple neural rules..." actually it is very confusing what "neural rules" are here. Better to change it to "computational principles" or "neural network models"??

      We have now replaced this with “using well-known principles governing multiple object representations”.

      (2) In the intro, it says: "while machine vision algorithms are extremely successful in solving feature-based tasks like object categorization (Serre, 2019), they struggle to solve these generic tasks (Kim et al., 2018; Ricci et al. 2021). These are not generic tasks. They are just a specific type of visual task-judging relationship between multiple objects. Moreover, a large number of studies in machine vision have shown that DNNs are capable of solving these tasks and even more difficult tasks. Two survey papers are listed here.

      Wu, Q., Teney, D., Wang, P., Shen, C., Dick, A., & Van Den Hengel, A. (2017). Visual question answering: A survey of methods and datasets. Computer Vision and Image Understanding, 163, 21-40.

      Małkiński, M., & Mańdziuk, J. (2022). Deep Learning Methods for Abstract Visual Reasoning: A Survey on Raven's Progressive Matrices. arXiv preprint arXiv:2201.12382.

      Thank you for sharing these references. In fact, a recent study has shown that specific deep networks can indeed solve the same-different task (Tartaglini et al, 2023). However our broader point remains that the same-different or other such visual tasks are non-trivial for machine vision algorithms.

      Reviewer #1 (Recommendations For The Authors):

      Nothing to add to the public review. If my concerns turn out to be invalid, I apologize and will happily accept correction. If they are valid, I hope they will point toward a new version of this paper that optimizes the insights to be gained from this impressive dataset.

      Reviewer #2 (Recommendations For The Authors):

      My suggestions are as follows:

      (1) Analyze the fMRI data using the parametric modulation approach first at the single-subject level and then perform group analysis.

      To clarify, we have obtained image-level activations from each subject, and used it for all our analyses.

      (2) Think about a way to redefine visual homogeneity from a purely image-computable approach. In other words, visual homogeneity should be first defined as an image feature that is independent of any empirical response data. And then use the visual homogeneity scores to predict reaction times.

      While we understand what you mean, any image-computable representation such as from a deep network may carry its own biases and may not be an accurate representation of the underlying object representation. By contrast, neural dissimilarities in the visual cortex are strongly predictive of visual search oddball response times. That is why we used visual search oddball response times as a proxy for the underlying neural representation, and then asked whether some decision variable can be derived from this representation to explain both target present and absent judgements in visual search.

    2. Reviewer #3 (Public Review):

      Summary:

      This study proposes visual homogeneity as a novel visual property that enables observers perform to several seemingly disparate visual tasks, such as finding an odd item, deciding if two items are same, or judging if an object is symmetric. In Exp 1, the reaction times on several objects were measured in human subjects. In Exp 2, visual homogeneity of each object was calculated based on the reaction time data. The visual homogeneity scores predicted reaction times. This value was also correlated with the BOLD signals in a specific region anterior to LO. Similar methods were used to analyze reaction time and fMRI data in a symmetry detection task. It is concluded that visual homogeneity is an important feature that enables observers to solve these two tasks.

      Strengths:

      (1) The writing is very clear. The presentation of the study is informative.<br /> (2) This study includes several behavioral and fMRI experiments. I appreciate the scientific rigor of the authors.

      Weaknesses:

      (1) My main concern with this paper is the way visual homogeneity is computed. On page 10, lines 188-192, it says: "we then asked if there is any point in this multidimensional representation such that distances from this point to the target-present and target-absent response vectors can accurately predict the target-present and target-absent response times with a positive and negative correlation respectively (see Methods)". This is also true for the symmetry detection task. If I understand correctly, the reference point in this perceptual space was found by deliberating satisfying the negative and positive correlations in response times. And then on page 10, lines 200-205, it shows that the positive and negative correlations actually exist. This logic is confusing. The positive and negative correlations emerge only because this method is optimized to do so. It seems more reasonable to identify the reference point of this perceptual space independently, without using the reaction time data. Otherwise, the inference process sounds circular. A simple way is to just use the mean point of all objects in Exp 1, without any optimization towards reaction time data.

      (2) Visual homogeneity (at least given the current from) is an unnecessary term. It is similar to distractor heterogeneity/distractor variability/distractor statics in literature. However, the authors attempt to claim it as a novel concept. The title is "visual homogeneity computations in the brain enable solving generic visual tasks". The last sentence of the abstract is "a NOVEL IMAGE PROPERTY, visual homogeneity, is encoded in a localized brain region, to solve generic visual tasks". In the significance, it is mentioned that "we show that these tasks can be solved using a simple property WE DEFINE as visual homogeneity". If the authors agree that visual homogeneity is not new, I suggest a complete rewrite of the title, abstract, significance, and introduction.

      (3) Also, "solving generic tasks" is another overstatement. The oddball search tasks, same-different tasks, and symmetric tasks are only a small subset of many visual tasks. Can this "quantitative model" solve motion direction judgment tasks, visual working memory tasks? Perhaps so, but at least this manuscript provides no such evidence. On line 291, it says "we have proposed that visual homogeneity can be used to solve any task that requires discriminating between homogeneous and heterogeneous displays". I think this is a good statement. A title that says "XXXX enable solving discrimination tasks with multi-component displays" is more acceptable. The phrase "generic tasks" is certainly an exaggeration.

      (4) If I understand it correctly, one of the key findings of this paper is "the response times for target-present searches were positively correlated with visual homogeneity. By contrast, the response times for target-absent searches were negatively correlated with visual homogeneity" (lines 204-207). I think the authors have already acknowledged that the positive correlation is not surprising at all because it reflects the classic target-distractor similarity effect. But the authors claim that the negative correlations in target-absent searches is the true novel finding.

      (5) I would like to make it clear that this negative correlation is not new either. The seminal paper by Duncan and Humphreys (1989) has clearly stated that "difficulty increases with increased similarity of targets to nontargets and decreased similarity between nontargets" (the sentence in their abstract). Here, "similarity between nontargets" is the same as the visual homogeneity defined here. Similar effects have been shown in Duncan (1989) and Nagy, Neriani, and Young (2005). See also the inconsistent results in Nagy& Thomas, 2003, Vicent, Baddeley, Troscianko&Gilchrist, 2009.<br /> More recently, Wei Ji Ma has systematically investigated the effects of heterogeneous distractors in visual search. I think the introduction part of Wei Ji Ma's paper (2020) provides a nice summary of this line of research.

      I am surprised that these references are not mentioned at all in this manuscript (except Duncan and Humphreys, 1989).

      (6) If the key contribution is the quantitative model, the study should be organized in a different way. Although the findings of positive and negative correlations are not novel, it is still good to propose new models to explain classic phenomena. I would like to mention the three studies by Wei Ji Ma (see below). In these studies, Bayesian observer models were established to account for trial-by-trial behavioral responses. These computational models can also account for the set-size effect, behavior in both localization and detection tasks. I see much more scientific rigor in their studies. Going back to the quantitative model in this paper, I am wondering whether the model can provide any qualitative prediction beyond the positive and negative correlations? Can the model make qualitative predictions that differ from those of Wei Ji's model? If not, can the authors show that the model can quantitatively better account for the data than existing Bayesian models? We should evaluate a model either qualitatively or quantitatively.

      (7) In my opinion, one of the advantages of this study is the fMRI dataset, which is valuable because previous studies did not collect fMRI data. The key contribution may be the novel brain region associated with display heterogeneity. If this is the case, I would suggest using a more parametric way to measure this region. For example, one can use Gabor stimuli and systematically manipulate the variations of multiple Gabor stimuli, the same logic also applies to motion direction. If this study uses static Gabor, random dot motion, object images that span from low-level to high-level visual stimuli, and consistently shows that the stimulus heterogeneity is encoded in one brain region, I would say this finding is valuable. But this sounds like another experiment. In other words, it is insufficient to claim a new brain region given the current form of the manuscript.

      REFERENCES<br /> - Duncan, J., & Humphreys, G. W. (1989). Visual search and stimulus similarity. Psychological Review, 96(3), 433-458. doi: 10.1037/0033-295x.96.3.433<br /> - Duncan, J. (1989). Boundary conditions on parallel processing in human vision. Perception, 18(4), 457-469. doi: 10.1068/p180457<br /> - Nagy, A. L., Neriani, K. E., & Young, T. L. (2005). Effects of target and distractor heterogeneity on search for a color target. Vision Research, 45(14), 1885-1899. doi: 10.1016/j.visres.2005.01.007<br /> - Nagy, A. L., & Thomas, G. (2003). Distractor heterogeneity, attention, and color in visual search. Vision Research, 43(14), 1541-1552. doi: 10.1016/s0042-6989(03)00234-7<br /> - Vincent, B., Baddeley, R., Troscianko, T., & Gilchrist, I. (2009). Optimal feature integration in visual search. Journal of Vision, 9(5), 15-15. doi: 10.1167/9.5.15<br /> - Singh, A., Mihali, A., Chou, W. C., & Ma, W. J. (2023). A Computational Approach to Search in Visual Working Memory.<br /> - Mihali, A., & Ma, W. J. (2020). The psychophysics of visual search with heterogeneous distractors. BioRxiv, 2020-08.<br /> - Calder-Travis, J., & Ma, W. J. (2020). Explaining the effects of distractor statistics in visual search. Journal of Vision, 20(13), 11-11.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors provide convincing experimental evidence of extended motivational signals encoded in the mouse anterior cingulate cortex (ACC) that are implemented by the orbitofrontal cortex (OFC)-to-ACC signaling during learning. The results are valuable to the field of motivation and cognition. The experimental methods used were state-of-the-art. The manuscript would further benefit from theory-driven analyses to inform a mechanistic understanding, particularly for the single-cell calcium imaging results. These results will be of interest to those interested in cortical function, learning, and/or motivation.

      We thank the reviewers for their thoughtful reading of our paper and providing constructive feedback. We have made the relevant changes to the manuscript to improve the writing and figures. We provide responses below to each of the reviewer’s comments.

      Reviewer #1 (Public Review):

      (1) An important conclusion (Figure 4) is that when mice are trained to run through no reward (N) cues in order to reach reward (R) cues, the OFC neurons projecting to ACC each respond to different specific events in a manner that ensures that collectively they tile the extended behavioural sequence. What I was less sure of was whether the ACC neurons do the same or not. Figure 3 suggests that on average ACC neurons maintain activity across N cues in order to get to R cues but I was not sure whether this was because all individual neurons did this or whether some had activity patterns like the OFC neurons projecting to ACC.

      We agree that it remains uncertain what individual ACC neurons do during the extended behavioral sequence. We now include a few sentences in the discussion about what we hypothesize, as we did not perform the cellular resolution imaging to determine this:

      “While we did not perform single-cell imaging of ACC in our task, we hypothesize that individual ACC neurons could encode the distribution of actions/opportunities47 (i.e. stop, run, lick, suppress lick) taken during R or N cues. ACC neurons could compute the relative value of the action taken such that more ACC neurons become recruited once mice learn to run out of N cues. The sustained increase in bulk ACC activity across N cue trials (Figure 2) could come from a stable sequence of individual neurons that encode the timescale of the actions taken. In this way, OFC projections would encode current motivation across N cues before learning, which then triggers ACC to compute the valuebased actions. Motivational signals in OFC would thus represent state since past rewards/goals, while in ACC these signals represent actions taken to pursue rewards/goals in the future.”

      (2) Figure 1 versus Figure 2: There does not seem to be a particular motivation for whether chemogenetic inactivation or optogenetic inhibition were used in different experiments. I think that this is not problematic but, if I am wrong and there were specific reasons for performing each experiment in a certain way, then further clarification as to why these decisions were made would be useful. If there is no particular reason, then simply explaining that this is the case might stop readers from seeking explanations.

      Thank you for this comment and we agree that clarification on this is important. We performed chemogenetic inhibition of ACC in Figure 1 to take a broad survey of behavioral effects throughout a 40-min long behavioral session, and performed optogenetic inhibition in Figure 2 because we wanted to restrict our inhibition to the few seconds of cue presentation during a behavioral session and across days. Furthermore, we wanted to combat any potential off-target effects that would come from repeated administration of CNO over the several days of training (Manvich et al 2018). We have included a couple sentences on page 4 to clarify this:

      “We proceeded to test whether these motivation related signals in ACC are required for learning. To restrict our inhibition to cue presentation portions of our task, and combat any potential off-target effects of CNO31 from repeated administration across several days of training, we used optogenetic inhibition.”

      (3) P5, paragraph 2. The authors argue that OFC and anteriomedial (AM) thalamic inputs into ACC are especially important for mediating motivation through N cues in order to reach R cues. Is this based on a statistical comparison between the activity in OFC or AM inputs as opposed to the other inputs?

      We determined that OFC and AM thalamic inputs to ACC are particularly important by comparing the pre-cue activity in a reward-no reward-reward trial sequence (RNR; Figure 3B). Specifically, we performed paired t-tests comparing pre-cue activity between N and R cues, and found a statistically significant increase for R cues but only for the OFC and AM inputs, not for the BLA or LC inputs.

      (4) P3, paragraph 2. Some papers by Khalighinejad and colleagues (eg Neuron 2020, Current Biology, 2022) might be helpful here in as much as they assess ACC roles in determining action frequency, initiation, and speed and mediating the relationship between reward availability and action frequency and speed.

      We thank the reviewer for bringing these relevant papers to our attention. We have included these papers in our citations in this paragraph.

      (5) Paragraph 1 "This learning is of a more deliberate, informed nature than habitual learning, as they are sensitive to the current value of outcomes and can lead to a novel sequence of actions for a desired outcome1-3." Should "they" be "it"?

      This is correct, we have edited this in the manuscript.

      Reviewer #2 (Public Review):

      Impact:

      The findings will be valuable for further research on the impact of motivational states on behaviour and cognition. The authors provided a promising concept of how persistent motivational states could be maintained, as well as established a novel, reproducible task assay. While experimental methods used are currently state-of-the-art, theoretical analysis seems to be incomplete/not extensive. We thank the reviewer for these comments. In our paper, we performed single-cell calcium imaging of OFC projection neurons to ACC to build a mechanistic understanding for the bulk ramp-like response we identified in these neurons with photometry. We identified ensembles of neurons that tile sequences of trials that match the bulk response, in particular a subset of neurons that are active at the time a reward (R) cue is reached after 2 no-reward (N) cues. We included a paragraph in the discussion to address future theory-driven analyses to address how computation is achieved by OFC projection neurons:

      “We linked the ramp-like increase in neural activity in OFC to motivation, but several questions still remain about how motivation is computed and why it would be represented as a ramp. Motivation could be computed as a combination of several variables such as time since last reward, value of reward, and effort to reach future rewards. Future theorydriven analyses could determine how motivation is computed, and whether individual variables of time, value, and effort, are encoded as clusters of similar tuned neurons, or mixed and collectively represented at the population level. In either case, it is likely that a combined map of task space and value-information carried by OFC are being used to inform downstream regions, such as ACC, for adjusting behavior. ”

      Reviewer #2 (Recommendations for the Authors):

      Overall, the layout of the figures seems a little bit chaotic and makes it hard to understand the boundaries between panels.

      We agree that the figure layout could be improved upon to aid the reader in moving from panel to panel. We have edited two of the main figures with layouts that are most irregular (Figures 2 and 4) to help with this.

      Figures/text should include the promoters used for protein expression so that readers understand which cell types would be affected.

      We have made sure to edit the figures to include the promoter of the viruses we used, and edited the text to include both the AAV serotype and promoter.

      Discuss why it is necessary for multiple prefrontal areas to be involved in maintaining motivational signals.

      We thank the reviewer for this comment. We believe that prefrontal areas would be recruited as tasks to study motivational states become more complex and require animals to keep track of task structure and perform value-guided actions. We have included a couple sentences in the final paragraph of the discussion about this:

      “Our work showed the recruitment of multiple frontal cortical areas in this process, which is to be expected as animals are required to build, maintain, and use representations of task structure and value to drive learned, motivated behaviors47. Future work can build upon the task we developed here to determine how the frontal cortex maintains motivational states across many more cue-outcome associations, and how these associations may dynamically change across time48”.

      Additionally, we included a short discussion on how in motivational signals differ between OFC and ACC in our work. We suggest OFC encodes current motivation before and after learning, which then leads ACC to represent learned actions taken and thus have a longer timescale motivational response (see response to Reviewer 1).

      Minor: Page 4, Line 1: "increase" instead of "increases".

      This is correct, we have edited this in the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides important insights into the role of neurexins as regulators of synaptic strength and timing at the glycinergic synapse between neurons of the medial nucleus of the trapezoid body and the lateral superior olive, key components of the auditory brainstem circuit involved in computing sound source location from differences in the intensity of sounds arriving at the two ears. Through an elegant combination of genetic manipulation, fluorescence in-situ hybridization, ex vivo slice electrophysiology, pharmacology, and optogenetics, the authors provide convincing evidence to support their claims. While further work is needed to reveal the mechanistic basis by which neurexins influence glycinergic neurotransmission, this work will be of interest to both auditory and synaptic neuroscientists.

      We appreciate the recognition of the significance of our study in shedding light on the role of neurexins in regulating synaptic strength and timing at the glycinergic synapse. Indeed, further investigations are warranted to delve deeper into the specific role of each different variant of neurexins in the future. We hope that our work will spark more interest and collaboration in unraveling the complexities of molecular codes of synaptic function.

      Public Reviews:

      Reviewer #1 (Public Review):

      Jiang et al. demonstrated that ablating Neurexins results in alterations to glycinergic transmission and its calcium sensitivity, utilizing a robust experimental system. Specifically, the authors employed rAAV-Cre-EGFP injection around the MNTB in Nrxn1/2/3 triple conditional mice at P0, measuring Glycine receptor-dependent IPSCs from postsynaptic LSO neurons at P13-14. Notably, the authors presented a clear reduction of 60% and 30% in the amplitudes of opto- and electric stimulation-evoked IPSCs, respectively. Additionally, they observed changes in kinetics, alterations in PPR, and sensitivity to lower calcium and the calcium chelator, EGTA, indicating solid evidence for changes in presynaptic properties of glycinergic transmission.

      Furthermore, the authors uncovered an unexpected increase in sIPSC frequency without altering amplitude. Despite the reduction in evoked IPSC, immunostaining revealed an increase in GlyT2 and VGAT in TKO mice, supporting the notion of an increase in synapse number. However, the reviewer expresses caution regarding the authors' conclusion that "glycinergic neurotransmission likely by promoting the synapse formation/maintenance, which is distinct from the phenotypes observed in glutamatergic and GABAergic neurons (Chen et al., 2017; Luo et al., 2021)", as outlined in lines 173-175. The reviewer suggests that this statement may be overstated, pointing out the authors' own discussion in lines 254-265, which acknowledges multiple possibilities, including the potential that the increase in synapses is a consequence rather than a causal effect of Nrxn deletion.

      We appreciate the reviewer’s thoughtful evaluation of our study. We agree that our conclusion regarding the promotion of synapse formation/maintenance may have been overstated and recognize the need for a more nuanced interpretation of our findings. Accordingly, we have revised our interpretation by discussing carefully the various possibilities that may cause the observed increase in synapse number in line 256-266.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Jiang et al., explore the role of neurexins at glycinergic MNTB-LSO synapses. The authors utilize elegant and compelling ex vivo slice electrophysiology to assess how the genetic conditional deletion of Nrxns1-3 impacts inhibitory glycinergic synaptic transmission and found that TKO of neurexins reduced electrically and optically evoked IPSC amplitudes, slowed optically evoked IPSC kinetics and reduced presynaptic release probability. The authors use classic approaches including reduced [Ca2+] in ACSF and EGTA chelation to propose that changes in these evoked properties are likely driven by the loss of calcium channel coupling. Intriguingly, while evoked transmission was impaired, the authors reported that spontaneous IPSC frequency was increased, potentially due to an increased number of synapses in LSO. Overall, this manuscript provides important insight into the role of neurexins at the glycinergic MNTP-LSO synapse and further emphasizes the need for continued study of both the non-redundant and redundant roles of neurexins.

      We thank the reviewer for the strong comments and support of our work.

      Strengths:

      This well-written manuscript seamlessly incorporates mouse genetics and elegant ex vivo electrophysiology to identify a role for neurexins in glycinergic transmission at MNTB-LSO synapses. Triple KO of all neurexins reduced the amplitude and timing of evoked glycinergic synaptic transmission. Further, spontaneous IPSC frequency was increased. The evoked synaptic phenotype is likely a result of reduced presynaptic calcium coupling while the spontaneous synaptic phenotype is likely due to increased synapse numbers. While neuroligin-4 has been identified at glycinergic synapses, this study, to the best of my knowledge, is the first to study Nrxn function at these synapses.<br />

      We again appreciate the positive feedback on the strengths of our study. We agree that the observed reduction in evoked synaptic transmission and the increase in spontaneous IPSC frequency provide intriguing insights into the function of neurexins in regulating glycinergic synaptic activity.

      Weaknesses:

      The data are compelling and report an intriguing functional phenotype. The role of Neurexins redundantly controls calcium channel coupling has been previously reported. Mechanistic insight would significantly strengthen this study.

      We wholeheartedly agree with the reviewer that understanding how neurexins control calcium channel coupling at the presynaptic active zone is crucial for elucidating their role in synaptic transmission. While our current study has provided compelling evidence for the functional phenotypes of pan-neurexin deletion, we recognize the importance of investigating the underlying molecular mechanisms in future research. Exploring these mechanisms would undoubtedly enhance our understanding of neurexin function at various synapses and contribute to advancing the field.

      The claim that triple KO of Nrxns from MNTB increases the number of synapses in LSO is not strongly supported.

      We agree. Echoing the suggestion made by reviewer 1 (as mentioned above), we acknowledge that the claim regarding the increase in synapse numbers in the LSO following the triple knockout of neurexins from the MNTB was overstated. Consequently, we have revised our conclusions more carefully to reflect this adjustment.

      Despite the stated caveats of measuring electrically evoked currents and the more robust synaptic phenotypes observed using optically evoked transmission, the authors rely heavily on electrical stimulation for most measurements.

      We acknowledge that optogenetic stimulation offers crucial advantages, and we have provided a balanced discussion of the caveats associated with both methods in our manuscript. Additionally, we have conducted new optogenetic experiments specifically for measuring the paired-pulse ratio in control and Nrxn123 TKO mice. These results have been included as a new supplementary figure (Figure S2).

      For experiments involving EGTA and low Ca2+ manipulations, we opted for electrical stimulation due to concerns regarding potential side effects of optogenetics, including the phototoxicity and photobleaching during prolonged light exposure.

      The differential expression of individual neurexins might indicate that specific neurexins may dominantly regulate synaptic transmission, however, this possibility is not discussed in detail.

      We thank the reviewer for bringing up this important point. The differential expression of individual neurexins indeed suggests that specific neurexins may play dominant roles in regulating synaptic transmission. While our study primarily focused on the collective impact of ablating all neurexins, we acknowledge the significance of exploring the specific contributions of individual neurexin isoforms in the future. Understanding the distinct roles of each neurexin isoform could provide valuable insights into the precise mechanisms underlying synaptic function and plasticity. We have added discussion in our revised manuscript Line223-230.

      Reviewer #3 (Public Review):

      Summary:

      The authors investigate the hypothesis that neurexins serve a crucial role as regulators of the synaptic strength and timing at the glycinergic synapse between neurons of the medial nucleus of the trapezoid body (MNTB) and the lateral superior olivary complex (LSO). It is worth mentioning that LSO neurons are an integration station of the auditory brainstem circuit displaying high reliability and temporal precision. These features are necessary for computing interaural cues to derive sound source location from comparing the intensities of sounds arriving at the two ears. In this context, the authors' findings build up according to the hypothesis first by displaying that neurexins were expressed in the MNTB at varying levels. They followed this up with the deletion of all neurexins in the MNTB through the employment of a triple knock-out (TKO). Using electrophysiological recordings in acute brainstem slices of these TKO mice, they gathered solid evidence for the role of neurexins in synaptic transmission at this glycinergic synapse primarily by ensuring tight coupling of Ca2+ channels and vesicular release sites. Additionally, the authors uncovered a connection between the deletion of neurexins and a higher number of glycinergic synapses in TKO mice, for which they provided evidence in the form of immunostainings and related it to electrophysiological data on spontaneous release. Consequently, this investigation expands our knowledge on the molecular regulation of synaptic transmission at glycinergic synapses, as well as on the auditory processing at the level of the brainstem.

      Strengths:

      The authors demonstrate substantial results in support of the hypothesis of a critical role of neurexins for regulating glycinergic transmission in the LSO using various techniques. They provide evidence for the expression of neurexins in the MNTB and consecutively successfully generate and characterize the neurexin TKO. For their study on LSO IPSCs the authors transduced MNTB neurons by co-injection of virus-carrying Cre and ChR2 and subsequently optogenetically evoke release of glycine. As a result, they observed a significant reduction in amplitude and significantly slower rise and decay times of the IPSCs of the TKO in comparison with control mice in which MNTB neurons were only transduced with ChR2. Furthermore, they observed an increased paired pulse ratio (PPR) of LSO IPSCs in the TKO mice, indicating lower release probability. Elaborating on the hypothesis that neurexins are essential for the coupling of synaptic vesicles to Ca2+ channels, the authors show lowered Ca2+ sensitivity in the TKO mice. Additionally, they reveal convincing evidence for the connection between the increased frequency of spontaneous IPSC and the higher number of glycinergic synapses of the LSO in the TKO mice, revealed by immunolabeling against the glycinergic presynaptic markers GlyT2 or VGAT.

      We thank the reviewer for the thoughtful and thorough evaluation of the significance of investigating the role of neurexins in glycinergic transmission at the MNTB-LSO synapse, particularly in the context of auditory processing and sound localization. The positive feedback is greatly appreciated.

      Weaknesses:

      The major concern is novelty as this work on the effects of pan-neurexin deletion in a glycinergic synapse is quite consistent with the authors' prior work on glutamatergic synapses (Luo et al., 2020). The authors might want to further work out novel aspects and strengthen the comparative perspective. Conceptually, the authors might want to be more clear about interpreting the results on the altered dependence of release on voltage-gated Ca2+ influx (Ca2+ sensitivity, coupling).

      Regarding the reviewer’s concern about the novelty of our work, we acknowledge that our previous work has explored the effects of pan-neurexin deletion on glutamatergic synapses (Luo et al., 2020). However, we would like to point out that a novelty of our present study indeed stems from the exploration of how different types of synapses converge to employ the same mechanism of synaptic function, particularly in the context of neurexin-mediated regulation. Our previous study focused on glutamatergic synapses, the current study delves into the realm of glycinergic synapses, which represent a distinct population with unique properties and functions. Despite the differences between these synapse types, our findings reveal a commonality in the underlying mechanisms of synaptic regulation mediated by neurexins. This convergence of mechanisms across different synapse types highlights the fundamental role of neurexins in synaptic function and plasticity. By elucidating how neurexins regulate synaptic transmission at both excitatory and inhibitory synapses, we provide valuable insights into the general principles governing synaptic function. In addition, this comparative perspective may shed light on the complex interplay between excitatory and inhibitory neurotransmission, which is crucial for maintaining the balance of neuronal activity and network dynamics.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      During the developmental period spanning P3-P12, the MNTB-LSO synapses undergo a transition from GABAergic to glycinergic transmission. It is well-established that Neurexin plays a role in modulating GABAergic transmission. In the authors' experimental system, AAV was injected at P0, likely impacting GABAergic transmission, including potentially influencing synapse number, before subsequently affecting glycinergic transmission. A thoughtful discussion of how the experimental interventions might have influenced this developmental process and glycinergic transmission would enhance the clarity and interpretation of their findings.

      We thank the reviewer for raising the interesting topic of the transmitter switch during neurodevelopment. Strong evidence using gerbils and rats as animal models demonstrates that the MNTB-LSO synapses undergo a shift from GABAergic to glycinergic during the early development. However, in a more recent study by Friauf and colleagues (Fisher et al., 2019), patch-clamp recordings in acute mouse brainstem slices at P4-P11 combined with pharmacological blockade of GABAA receptors and/or glycine receptors clearly demonstrated no GABAergic synaptic component on LSO principal neurons, suggesting the transmitter subtype switch may be species different. We add a discussion in our revision to clarify this topic.

      Reviewer #2 (Recommendations For The Authors):

      The data are compelling and report an intriguing functional phenotype. Mechanistic insight into how this phenotype manifests would significantly strengthen this study. For example, which neuroligin is found at these MNTB-LSO synapses?

      We agree that investigating the underlying molecular mechanisms, particularly the specific function of each variant of neurexins and their respective ligands on the postsynaptic neurons, is crucial. Exploring these mechanisms, which extend beyond the scope of our current study, would undoubtedly enhance our understanding of neurexin function at various synapses and foster advancements in the field.

      Does the TKO alter the ability of MNTB inputs to induce AP firing in LSO neurons?

      Activation of the MNTB inputs does not directly induce AP firing in LSO neurons, because the MNTB-LSO synapses are glycinergic and serve to inhibit neuronal activity.

      We think the reviewer was to ask whether pan-neurexin deletion in the MNTB neurons alter their ability to impact the firing of LSO neurons. Indeed, the weakening of glycinergic transmission due to pan-neurexin ablation in MNTB neurons could potentially alter the excitation-inhibition (E/I) balance, thereby impacting the overall excitability of LSO neurons. We have conducted preliminary experiments to investigate this aspect and found that the E/I balance at LSO neurons was notably increased in TKO mice. We are currently preparing a manuscript to comprehensively address the role of neurexins at the auditory circuit and behavior levels.

      Additional calcium measurements using GECIs would provide insight into whether nanodomain calcium or total calcium is altered at these synapses.

      We appreciate the valuable suggestion provided by the reviewer. However, distinguishing between Ca2+ nanodomain and Ca2+ microdomain using Ca2+ imaging techniques requires advanced systems such as two-photon STED microscopy, which are beyond the scope of our current research.

      It is unclear why fluorescence intensity is quantified instead of the number of synaptic clusters in LSO. In addition to changes in synapse numbers, fluorescent intensity can indicate a number of other possible morphological changes.

      We appreciate the valuable suggestion from the reviewer. We have re-analyzed our imaging data to compare synaptic density. The results, as included in Fig.3f and 3h, confirm an increase in the number of glycinergic synapses after pan-neurexin deletion.

      The most robust synaptic phenotypes were produced by measuring light-evoked oIPSCs and the authors acknowledge that electrically-evoked eIPSCs might be contaminated by uninfected fibers or by other sources of glycinergic inputs. I suggest that IPSC PPRs, EGTA, and low Ca2+ experiments be performed using optogenetics.

      As discussed in our response to Public Reviews, we acknowledge that optogenetic stimulation offers crucial advantages, and we have provided a balanced discussion of the caveats associated with both methods in our manuscript. Additionally, following the reviewer’s suggestion, we have conducted new optogenetic experiments specifically for measuring the paired-pulse ratio in control and Nrxn123 TKO mice. We included this new dataset in supplementary Figure S2, which is consistent with our result obtained with electrically fiber stimulation.

      For experiments involving EGTA and low Ca2+ manipulations, we opted for electrical stimulation due to major concerns regarding potential side effects of optogenetics, including the phototoxicity and photobleaching during prolonged light exposure.

      It is sometimes confusing which type of evoked stimulation is being used (e.g. PPR, EGTA, and low Ca2+ experiments). To aid in the interpretations of these experiments, it would help to clarify.

      We appreciate the reviewer's suggestion regarding the clarity of the evoked stimulation methods used in our experiments. We have revised the manuscript to provide clearer descriptions of the specific types of evoked stimulation employed in each experiment. Thank you for guiding towards this clarification.

      The comparisons to Chen et al 2017 and the senior author's 2020 paper seem disjointed and do not contribute to the findings, which alone, are quite interesting. Given the prevailing notion that neurexins control different synaptic properties depending on the brain region and/or synapse studied, is it surprising that the findings observed here differ from previous studies of different synapses (glutamatergic and GABAergic)?

      By comparing previous studies at different types of neurons/synapses, our findings reveal a commonality in the underlying mechanisms of synaptic regulation mediated by neurexins. This convergence of mechanisms across different synapse types highlights the fundamental role of neurexins in synaptic function and plasticity. In addition, this comparative perspective may shed light on the complex interplay between excitatory and inhibitory neurotransmission, which is crucial for maintaining the balance of neuronal activity and network dynamics.

      Despite Nrxn3 being the most abundant Nrxn mRNA in MNTB neurons, the possible contributions of this highly expressed protein are not discussed.

      We thank the reviewer for bringing up this important point. The differential expression of individual neurexins indeed suggests that specific neurexins may play dominant roles in regulating synaptic transmission. While our study primarily focused on the collective impact of ablating all neurexins, we acknowledge the significance of exploring the specific contributions of individual neurexin isoforms in the future. Understanding the distinct roles of each neurexin isoform could provide valuable insights into the precise mechanisms underlying synaptic function and plasticity. We have added discussion in our revised manuscript Line223-230.

      Reviewer #3 (Recommendations For The Authors):

      • There are several instances of spaces missing and typos, please carefully check the manuscript.

      We greatly appreciate the reviewer's helpful feedback on the text that could be clarified or improved. We have meticulously edited the manuscript to address these concerns.

      • While studying the properties of IPSC, apart from optogenetic stimulation, the authors performed experiments with electrical fiber stimulation. Their findings showed a slightly significant reduction of the IPSC amplitude and no effect on the IPSCs kinetics when comparing the TKO and control. One weakness is the discrepancy between the results from the optogenetic and fiber stimulation experiments, which the authors contribute to inefficient transfection in the fiber stimulation experiments. The authors state that they tried to optimize their protocols for virus injection protocols. However, they do not elaborate on how the transfection rates could be improved in the discussion section. Moreover, it would be good to further address the reasons for the difference in amplitude between the control IPSCs in the optogenetic and fiber stimulation experiments.

      Echoing the suggestion by Reviewer 2 (see above), we acknowledge that optogenetic stimulation offers certain advantages, and we have provided a balanced discussion of the caveats associated with both methods in our manuscript. In addition, we have performed a new set of optogenetic experiment for the paired-pulse ratio measurement in control and Nrxn123 TKO mice and included as a new figure in supplementary figure S2.

      For experiments involving EGTA and low Ca2+ manipulations, we opted for electrical stimulation due to major concerns regarding potential side effects of optogenetics, including the phototoxicity and photobleaching during prolonged light exposure.

      We added the detail of virus injection strategy that optimized the transfection rates in the method section “To enhance virus infection efficiency, we decreased the dosage per injection while increasing the frequency of injections. Additionally, we ensured the pipette remained immobilized for 20-30 seconds to guarantee virus absorption at injection sites. As a result of this strategy, we estimated that the vast majority of MNTB neurons were inoculated by AAVs.” See line288-290.

      • Abstract: "ablation of all neurexins in MNTB neurons reduced not only the amplitude but also altered the kinetics of the glycinergic synaptic transmission at LSO neurons."

      Changed as suggested.

      • Consider revising to "The synaptic dysfunctions primarily resulted from an altered dependence of release on voltage-gated Ca2+ influx."

      We appreciate the reviewer's suggestion, which helps improve the clarity of our manuscript. We have revise the phrasing as follows: "The synaptic dysfunctions primarily resulted from an impaired calcium sensitivity of release and a loosened coupling between voltage-gated calcium channels and synaptic vesicles."

      • Line 39 should be vertebrates.

      Revised as suggested.

      • Line 49 it would sound better to say "which further points to the diverse actions of neurexins in specific neurons."

      Revised as suggested.

      • Line 60 - this paragraph could include information about GABA signaling from the MNTB to the LSO, because on line 113 you mention LSO neurons receive inhibitory GABAergic/glycinergic inputs, but when you do not mention blocking of GABA currents to isolate the glycinergic ones.

      We thank the reviewer for the thoughtful and detailed suggestion. We revised the text in line 60 to “In the mature mammalian auditory brainstem” and in line 113, we removed GABAergic to emphasize the nature of glycinergic synapse, particularly in the mouse brainstem where no GABAergic components are found (Fisher et al., 2019).

      • Line 72/73 it should be adeno-associated virus; line 73: "combining this with the RNAScope technique" sounds better.

      Changed as suggested.

      • Line 91 using the RNAScope technique; lines 97, 119 as a control; line 108 the functional organization.<br />

      Changed as suggested.

      • Line 113 should be a pharmacological approach; line 122 optogenetically evoked.

      Changed as suggested.

      • Line 132, 160: the control.

      Changed as suggested.

      • Line 147 thus were infected; line 148 likely to be present but were obscured .

      Changed as suggested.

      • Line 154 which has been routinely used.

      Changed as suggested.

      • Line 155 It is not supposed to be Figure 2h but 2i; following that Figure 2i should be 2j; in my opinion, Figure 2i does not display a strong depression for the TKO mice.

      Changed as suggested.

      • Line 171 a better flow is achieved by saying: together these data show.

      Changed as suggested.

      • EC50 rather than IC50 of [Ca2+].

      Changed as suggested.

      • 180 it is better to say "we approached the matter by..."; line 183 while recording;

      Changed as suggested.

      • Line 203 were much stronger than the effect at control synapses; line 206 tightly clustering.

      Changed as suggested.

      • Line 212 sounds like they provide evidence for retina and spinal cord as well, should be made clear.

      Changed as suggested.

      • Line 289 previously.

      Changed as suggested.

      • Line 295 should be 30 min.

      Changed as suggested.

      • Line 336, 337 confocal microscope.

      Changed as suggested.

      • Please provide the number of data points also in figure captions or in the results section.

      Added in the captions as suggested.

      • Line 533, a better phrasing would be: the blocking effect of 0.2 mM Ca on IPSC amplitude.

      Changed as suggested.

      • Explain either in the methods or result section how was the EC50 of Ca2+ calculated.

      Added in the methods as suggested.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides important evidence supporting the ability of a new type of neuroimaging, OPM-MEG system, to measure beta-band oscillation in sensorimotor tasks on 2-14 years old children and to demonstrate the corresponding development changes, since neuroimaging methods with high spatiotemporal resolution that could be used on small children are quite limited. The evidence supporting the conclusion is solid but lacks clarifications about the much-discussed advantages of OPM-MEG system (e.g., motion tolerance), control analyses (e.g., trial number), and rationale for using sensorimotor tasks. This work will be of interest to the neuroimaging and developmental science communities.

      We thank the editors and reviewers for their time and comments on our manuscript. We have responded in detail to the comments, on a point-by-point basis, below. Included in our responses (and our revised manuscript) are additional analyses to control for trial count, clarification of the advantages of OPM-MEG, and justification of our use of sensory (as distinct from motor) stimulation. In what follows, our responses are in bold typeface; additions to our manuscript are in bold italic typeface. 

      Reviewer #1 (Public Review):

      Summary:

      Compared with conventional SQUID-MEG, OPM-MEG offers theoretical advantages of sensor configurability (that is, sizing to suit the head size) and motion tolerance (the sensors are intrinsically in the head reference frame). This study purports to be the first to experimentally demonstrate these advantages in a developmental study from age 2 to age 34. In short, while the theoretical advantages of OPM-MEG are attractive - both in terms of young child sensitivity and in terms of motion tolerance - neither was in fact demonstrated in this manuscript. We are left with a replication of SQUID-MEG observations, which certainly establishes OPM-MEG as "substantially equivalent" to conventional technology but misses the opportunity to empirically demonstrate the much-discussed theoretical advantages/opportunities.

      Thank you for reviewing our manuscript. We agree that our results demonstrate substantial equivalence with conventional MEG. However, as mentioned by Reviewer 3, most past studies have “focused on older children and adolescents (e.g., 9-15 years old)” whereas our youngest group is 25 years. We believe that by obtaining data of sufficient quality in these age groups, without the need for any restriction of head movement, we have demonstrated the advantage of OPM-MEG. We now have made this clear in our discussion:

      “…our primary aim was to test the feasibility of OPM-MEG for neurodevelopmental studies. Our results demonstrate we were able to scan children down to age 2 years, measuring high-fidelity electrophysiological signals and characterising the neurodevelopmental trajectory of beta oscillations. The fact that we were able to complete this study demonstrates the advantages of OPM-MEG over conventional-MEG, the latter being challenging to deploy across such a large age range…”

      Strengths:

      A replication of SQUID-MEG observations, which certainly establishes OPM-MEG as "substantially equivalent" to conventional technology but misses the opportunity to empirically demonstrate the much-discussed theoretical advantages/opportunities.

      As noted above the demonstration of equivalence was one of our primary aims. We have elaborated further on the advantages below.

      Weaknesses:

      The authors describe 64 tri-axial detectors, which they refer to as 192 channels. This is in keeping with some of the SQUID-MEG description, but possibly somewhat disingenuous. For the scientific literature, perhaps "64 tri-axial detectors" is a more parsimonious description.

      The number of channels in a MEG system refers to the number of independent measurements of magnetic field. This, in turn, tells us the number of degrees of freedom in the data that can be exploited by algorithms like signal space separation or beamforming. E.g. the MEGIN (cryogenic) MEG system has 306 channels, 102 magnetometers and 204 planar gradiometers. Sensors are constructed as “triple sensor elements” with one magnetometer and 2 gradiometers (in orthogonal orientations) centred on a single location. In our system, each sensor has three orthogonal metrics of magnetic field which are (by definition) independent. We have 64 such sensors, and therefore 192 independent channels – indeed when implementing algorithms like SSS we have shown we can exploit this number of degrees of freedom.1 192 channels is therefore an accurate description of the system.

      A small fraction (<20%) of trials were eliminated for analysis because of "excess interference" - this warrants further elaboration.

      We agree that this is an important point. We now state in our methods section:

      “…Automatic trial rejection was implemented with trials containing abnormally high variance (exceeding 3 standard deviations from the mean) removed. All experimental trials were also inspected visually by an experienced MEG scientist, to exclude trials with large spikes/drifts that were missed by the automatic approach. In the adult group, there was a significant overlap between automatically and manually detected bad trials (0.7+-1.6 trials were only detected manually). In the children 10.0 +-9.4 trials were only detected manually)…”

      We also note that the other reviewers and editor questioned whether the higher rejection rate in children had any bearing on results. This is an extremely important question. In revising the manuscript this has also been taken into account with all data reanalysed with equal trial counts in children and adults. Results are presented in Supplementary Information Section 5.

      Figure 3 shows a reduced beta ERD in the youngest children. Although the authors claim that OPMMEG would be similarly sensitive for all ages and that SQUID-MEG would be relatively insensitive to young children, one trivial counterargument that needs to be addressed is that OPM has NOT in fact increased the sensitivity to young child ERD. This can possibly be addressed by analogous experiments using a SQUID-based system. An alternative would be to demonstrate similar sensitivity across ages using OPM to a brain measure such as evoked response amplitude. In short, how does Figure 3 demonstrate the (theoretical) sensitivity advantage of OPM MEG in small heads ?

      We completely understand the referees’ point – indeed the question of whether a neuromagnetic effect really changes with age, or apparently changes due to a drop in sensitivity (caused by reduced head size or - in conventional MEG and fMRI - increased subject movement) is a question that can be raised in all neurodevelopmental studies.

      Our authors have many years’ experience conducting studies using conventional MEG (including in neurodevelopment) and agreed that the idea of scanning subjects down to age two in conventional MEG would not be practical; their heads are too small and they typically fail to tolerate an environment where they are forced to remain still for long periods. Even if we tried a comparative study using conventional MEG, the likely data exclusion rate would be so high that the study would be confounded. This is why most conventional MEG studies only scan older children and adolescents. For this reason, we cannot undertake the comparative study the reviewer suggests. There are however two reasons why we believe sensitivity is not driving the neurodevelopmental effects that we observe:

      Proximity of sensors to the head: 

      For an ideal wearable MEG system, the distance between the sensors and the scalp surface (sensor proximity) would be the same regardless of age (and size), ensuring maximum sensitivity in all subjects. To test how our system performed in this regard, we undertook analyses to compute scalp-to-sensor distances. This was done in two ways:

      (1) Real distances in our adaptable system: We took the co-registered OPM sensor locations and computed the Euclidean distance from the centre of the sensitive volume (i.e. the centre of the vapour cell) to the closest point on the scalp surface. This was measured independently for all sensors, and an average across sensors calculated. We repeated this for all participants (recall participants wore helmets of varying size and this adaptability should help minimise any relationship between sensor proximity and age).

      (2) Simulated distances for a non-adaptable system: Here, the aim was to see how proximity might have changed with age, had only a single helmet size been used. We first identified the single example subject with the largest head (scanned wearing the largest helmet) and extracted the scalpto-sensor distances as above. For all other subjects, we used a rigid body transform to co-register their brain to that of the example subject (placing their head (virtually) inside the largest helmet). Proximity was then calculated as above and an average across sensors calculated. This was repeated for all participants.

      In both analyses, sensor proximity was plotted against age and significant relationships probed using Pearson correlation. 

      In addition, we also wanted to probe the relation between sensor proximity and head circumference. Head circumference was estimated by binarising the whole head MRI (to delineate volume of the head), and the axial slice with the largest circumference around was selected. We then plotted sensor proximity versus head circumference, for both the real (adaptive) and simulated (nonadaptive) case (expecting a negative relationship – i.e. larger heads mean closer sensor proximity). The slope of the relationship was measured and we used a permutation test to determine whether the use of adaptable helmets significantly lowered the identified slope (i.e. do adaptable helmets significantly improve sensor proximity in those with smaller head circumference).

      Results are shown in Figure R1. We found no measurable relationship between sensor proximity and age (r = -0.195; p = 0.171) in the case of the real helmets (panel A). When simulating a non-adaptable helmet, we did see a significant effect of age on scalp-to-sensor distance (r = -0.46; p = 0.001; panel B). This demonstrates the advantage of the adaptability of OPM-MEG; without the ability to flexibly locate sensors, we would have a significant confound of sensor proximity. 

      Plotting sensor proximity against head circumference we found a significant negative relationship in both cases (r = -0.37; p = 0.007 and  r = -0.78; p = 0.000001); however, the difference between slopes was significant according to a permutation test (p < 0.025) suggesting that adaptable has indeed improved sensor proximity in those with smaller head circumference. This again shows the benefits of adaptability to head size.

      Author response image 1.

      Scalp-to-sensor distance as a function of age (A/B) and head circumference (C/D). A and C show the case for the real helmets; B and D show the simulated non-adaptable case.

      In sum, the ideal wearable system would see sensors located on the scalp surface, to get as close as possible to the brain in all subjects. Our system of multiple helmet sizes is not perfect in this regard (there is still a significant relationship between proximity and head circumference). However, our solution has offered a significant improvement over a (simulated) non-adaptable system. Future systems should aim to improve even further on this, either by using additively manufactured bespoke helmets for every subject (this is a gold standard, but also costly for large studies), or potentially adaptable flexible helmets.

      Burst amplitudes:

      The reviewer suggested to “demonstrate similar sensitivity across ages using OPM to a brain measure”. We decided not to use the evoked response amplitude (as suggested), since this would be expected to change with age. Instead, we used the amplitude of the bursts.

      Our manuscript shows a significant correlation between beta modulation and burst probability – implying that the stimulus-related drop in beta amplitude occurs because bursts are less likely to occur. Further, we showed significant age-related changes in both beta amplitude and burst probability leading to a conclusion that the age dependence of beta modulation was caused by changes in the likelihood of bursts (i.e. bursts are less likely to ’switch off’ during sensory stimulation in children). We have now extended these analyses to test whether burst amplitude also changes significantly with age – we reasoned that if burst amplitude remained the same in children and adults, this would not only suggest that beta modulation is driven by burst probability (distinct from burst amplitude), but also show directly that the beta effects we see are not attributable to a lack of sensitivity in younger people. 

      We took the (unnormalized) beamformer projected electrophysiological time series from sensorimotor cortex and filtered it 5-48 Hz (the motivation for the large band was because bursts are known to be pan-spectral and have lower frequency content in children; this band captures most of the range of burst frequencies highlighted in our spectra). We then extracted the timings of the bursts, and for each burst took the maximum projected signal amplitude. These values were averaged across all bursts in an individual subject, and plotted for all subjects against age.

      Author response image 2.

      Beta burst amplitude as a function of age; A) shows index finger simulation trials; B shows little finger stimulation trials. In both case there was no significant modulation of burst amplitude with age.

      Results (see Figure R2) showed that the amplitude of the beta burst showed no significant age-related modulation (R2 = 0.01, p = 0.48 for index finger and R2 = 0.01, p = 0.57 for the little finger). This is distinct from both burst probability and task induced beta modulation. This adds weight to the argument that the diminished beta modulation in children is not caused by a lack of sensitivity to the MEG signal and supports our conclusion that burst probability is the primary driver of the agerelated changes in beta oscillations.

      Both of the above analyses have been added to our supplementary information and mentioned in the main manuscript. The first shows no confound of sensor proximity to the scalp with age in our study. The second shows that the bursts underlying the beta signal are not significantly lower amplitude in children – which we reasoned they would be if sensitivity was diminished at younger ages. We believe that the two together suggest that we have mitigated a sensitivity confound in our study.

      The data do not make a compelling case for the motion tolerance of OPM-MEG. Although an apparent advantage of a wearable system, an empirical demonstration is still lacking. How was motion tracked in these participants?

      We agree that this was a limitation of our experiment. 

      We have the equipment to track motion of the head during an experiment, using IR retroreflective markers placed on the helmet and a set of IR cameras located inside the MSR. However, the process takes a long time to set up, it lacks robustness, and would have required an additional computer (the one we typically use was already running the somatosensory stimulus and video). When the study was designed, we were concerned that the increased set up time for motion tracking would cause children to get bored, and result in increased participant drop out. For this reason we decided not to capture motion of the head during this study.

      With hindsight this was a limitation which – as the reviewer states – makes us unable to prove that motion robustness was a significant advantage for this study. That said, during scanning there was both a parent and an experimenter in the room for all of the children scanned, and anecdotally we can say that children tended to move their head during scans – usually to talk to the parent. Whilst this cannot be quantified (and is therefore unsatisfactory) we thought it worth mentioning in our discussion, which reads:

      “…One limitation of the current study is that practical limitations prevented us from quantitatively tracking the extent to which children (and adults) moved their head during a scan. Anecdotally however, experimenters present in the room during scans reported several instances where children moved, for example to speak to their parents who were also in the room. Such levels of movement could not be tolerated in conventional MEG or MRI and so this again demonstrates the advantages afforded by OPM-MEG…”

      As a note, empirical demonstrations of the motion tolerance of OPM-MEG have been published previously: Early demonstrations included Boto et al. 2 who captured beta oscillations in adults playing a ball game and Holmes et al. who measured visual responses as participants moved their head to change viewing angle3. In more recent demonstrations, Seymour et al. measured the auditory evoked field in standing mobile participants4; Rea et al. measured beta modulation as subjects carried out a naturalistic handwriting task5 and Holmes et al measured beta modulation as a subject walked around a room.6

      Furthermore, while the introduction discusses at some length the phenomenon of PMBR, there is no demonstration of the recording of PMBR (or post-sensory beta rebound). This is a shame because there is literature suggesting an age-sensitivity to this, that the optimal sensitivity of OPM-MEG might confirm/refute. There is little evidence in Figure 3 for adult beta rebound. Is there an explanation for the lack of sensitivity to this phenomenon in children/adolescents? Could a more robust paradigm (button-press) have shed light on this?

      We understand the question. There are two limitations to the current study in respect to measuring the PMBR:

      Firstly, sensory tasks generally do not induce as strong a PMBR as motor tasks and with this in mind a stronger rebound response could have been elicited using a button press. However, it was our intention to scan children down to age 2 and we were sceptical that the youngest children would carry out a button press as instructed. For this reason we opted for entirely passive stimulation, requiring no active engagement from our participants. The advantages of this was a stimulus that all subjects could engage with. However, this was at the cost of a diminished rebound.

      The second limitation relates to trial length. Multiple studies have shown that the PMBR can last over ~10 s 7,8. Indeed, Pfurtscheller et al. argued in 1999 that it was necessary to leave 10 s between movements to allow the PMBR to return to a true baseline9, though this has rarely been adhered to in the literature. Here, we wanted to keep recordings short for the comfort of the younger participants, so we adopted a short trial duration. However, a consequence of this short trial length is that it becomes impossible to access the PMBR directly; one can only measure beta modulation with the task. This limitation has now been addressed explicitly in our discussion:

      “…this was the first study of its kind using OPM-MEG, and consequently aspects of the study design could have been improved. Firstly, the task was designed for children; it was kept short while maximising the number of trials (to maximise signal to noise ratio). However, the classical view of beta modulation includes a PMBR which takes ~10 s to reach baseline following task cessation7–9. Our short trial duration therefore doesn’t allow the rebound to return to baseline between trials, and so conflates PMBR with rest. Consequently, we cannot differentiate the neural generators of the task induced beta power decrease and the PMBR; whilst this helped ensure a short, child friendly task, future studies should aim to use longer rest windows to independently assess which of the two processes is driving age related changes…”

      Data on functional connectivity are valuable but do not rely on OPM recording. They further do not add strength to the argument that OPM MEG is more sensitive to brain activity in smaller heads - in fact, the OPM recordings seem plagued by the same insensitivity observed using conventional systems.

      Given the demonstration above that bursts are not significantly diminished in amplitude in children relative to adults; and further given the demonstrations in the literature (e.g. Seedat et al.10) that functional connectivity is driven by bursts, we would argue that the effects of connectivity changing with age are not related to sensitivity but rather genuinely reflect a lack of coordination of brain activity.

      The discussion of burst vs oscillations, while highly relevant in the field, is somewhat independent of the OPM recording approach and does not add weight to the OPM claims.

      We agree that the burst vs. oscillations discussion does not add weight to the OPM claims per se. However, we had two aims of our paper, the second being to “investigate how task-induced beta modulation in the sensorimotor cortices is related to the occurrence of pan-spectral bursts, and how the characteristics of those bursts change with age.” As the reviewer states, this is highly relevant to the field, and therefore we believe adds impact, not only to the paper, but also by extension to the technology.

      In short, while the theoretical advantages of OPM-MEG are attractive - both in terms of young child sensitivity and in terms of motion tolerance, neither was in fact demonstrated in this manuscript. We are left with a replication of SQUID-MEG observations, which certainly establishes OPM-MEG as "substantially equivalent" to conventional technology but misses the opportunity to empirically demonstrate the much-discussed theoretical advantages/opportunities.

      We thank the referee for the time and important contributions to this paper. We believe the fact that we were able to record good data in children as young as two years old was, in itself, an experimental realisation of the ‘theoretical advantages’ of OPM-MEG. Our additional analyses, inspired by the reviewers comments, help to clarify the advantages of OPM-MEG over conventional technology. The reviewers’ insights have without doubt improved the paper.

      Reviewer #2 (Public Review):

      Summary:

      The authors introduce a new 192-channel OPM system that can be configured using different helmets to fit individuals from 2 to 34 years old. To demonstrate the veracity of the system, they conduct a sensorimotor task aimed at mapping developmental changes in beta oscillations across this age range. Many past studies have mapped the trajectory of beta (and gamma) oscillations in the sensorimotor cortices, but these studies have focused on older children and adolescents (e.g., 9-15 years old) and used motor tasks. Thus, given the study goals, the choice of a somatosensory task was surprising and not justified. The authors recorded a final sample of 27 children (2-13 years old) and 24 adults (21-34 years) and performed a time-frequency analysis to identify oscillatory activity. This revealed strong beta oscillations (decreases from baseline) following the somatosensory stimulation, which the authors imaged to discern generators in the sensorimotor cortices. They then computed the power difference between 0.3-0.8 period and 1.0-1.5 s post-stimulation period and showed that the beta response became stronger with age (more negative relative to the stimulation period). Using these same time windows, they computed the beta burst probability and showed that this probability increased as a function of age. They also showed that the spectral composition of the bursts varied with age. Finally, they conducted a whole-brain connectivity analysis. The goals of the connectivity analysis were not as clear as prior studies of sensorimotor development have not conducted such analyses and typically such whole-brain connectivity analyses are performed on resting-state data, whereas here the authors performed the analysis on task-based data. In sum, the authors demonstrate that they can image beta oscillations in young children using OPM and discern developmental effects.

      Thank you for this summary and for taking the time to review our manuscript.

      Strengths:

      Major strengths of the study include the novel OPM system and the unique participant population going down to 2-year-olds. The analyses are also innovative in many respects.

      Thank you – we also agree that the major strength is in the unique cohort.

      Weaknesses:

      Several weaknesses currently limit the impact of the study. 

      First, the choice of a somatosensory stimulation task over a motor task was not justified. The authors discuss the developmental motor literature throughout the introduction, but then present data from a somatosensory task, which is confusing. Of note, there is considerable literature on the development of somatosensory responses so the study could be framed with that.

      We completely understand the referee’s point, and we agree that the motivation for the somatosensory task was not made clear in our original manuscript.

      Our choice of task was motivated completely by our targeted cohort; whilst a motor task would have been our preference, it was generally felt that making two-year-olds comply with instructions to press a button would have been a significant challenge. In addition, there would likely have been differences in reaction times. By opting for a passive sensory stimulation we ensured compliance, and the same stimulus for all subjects. We have added text on this to our introduction as follows:

      “…Here, we combine OPM-MEG with a burst analysis based on a Hidden Markov Model (HMM) 10–12 to investigate beta dynamics. We scanned a cohort of children and adults across a wide age range (upwards from 2 years old). Because of this, we implemented a passive somatosensory task which can be completed by anyone, regardless of age…”

      We also state in our discussion:

      “…here we chose to use passive (sensory) stimulation. This helped ensure compliance with the task in subjects of all ages and prevented confounds of e.g. reaction time, force, speed and duration of movement which would be more likely in a motor task.7,8 However, there are many other systems to choose and whether the findings here regarding beta bursts and the changes with age also extend to other brain networks remains an open question.…”

      Regarding the neurodevelopmental literature – we are aware of the literature on somatosensory evoked responses – particularly median nerve stimulation – but we can find little on the neurodevelopmental trajectory of somatosensory induced beta oscillations (the topic of our paper). We have edited our introduction as follows:

      “…All these studies probed beta responses to movement execution; in the case of tactile stimulation (i.e. sensory stimulation without movement) both task induced beta power loss, and the post stimulus rebound have been consistently observed in adults9,13–18. Further, beta amplitude in sensory cortex has been related to attentional processes19 and is broadly thought to carry top down top down influence on primary areas20. However, there is less literature on how beta modulation changes with age during purely sensory tasks.…”

      We would be keen for the reviewer to point to any specific papers in the literature that we may have missed.

      Second, the primary somatosensory response actually occurs well before the time window of interest in all of the key analyses. There is an established literature showing mechanical stimulation activates the somatosensory cortex within the first 100 ms following stimulation, with the M50 being the most robust response. The authors focus on a beta decrease (desynchronization) from 0.3-0.8 s which is obviously much later, despite the primary somatosensory response being clear in some of their spectrograms (e.g., Figure 3 in older children and adults). This response appears to exhibit a robust developmental effect in these spectrograms so it is unclear why the authors did not examine it. This raises a second point; to my knowledge, the beta decrease following stimulation has not been widely studied and its function is unknown. The maps in Figure 3 suggest that the response is anterior to the somatosensory cortex and perhaps even anterior to the motor cortex. Since the goal of the study is to demonstrate the developmental trajectory of well-known neural responses using an OPM system, should the authors not focus on the best-understood responses (i.e., the primary somatosensory response that occurs from 0.0-0.3 s)?

      We understand the reviewer’s point. The original aim of our manuscript was to investigate the neurodevelopmental trajectory of beta oscillations, not the evoked response. In fact, the evoked response in this paradigm is complicated by the fact that there are three stimuli in a very short (<500 ms) time window. For this reason, we prefer the focus of our paper to remain on oscillations.

      Nevertheless, we agree that not including the evoked responses was a missed opportunity.  We have now added evoked responses to our analysis pipeline and manuscript. As surmised by the reviewer, the M50 shows neurodevelopmental changes (an increase with age). Our methods section has been updated accordingly and Figure 3 has been modified. The figure and caption are copied below for the convenience of the reviewer.

      Author response image 3.

      Beta band modulation with age: (A) Brain plots show slices through the left motor cortex, with a pseudo-T-statistical map of beta modulation (blue/green) overlaid on the standard brain. Peak MNI coordinates are indicated for each subgroup. Time frequency spectrograms show modulation of the amplitude of neural oscillations (fractional change in spectral amplitude relative to the baseline measured in the 2.5-3 s window). Vertical lines indicate the time of the first braille stimulus. In all cases results were extracted from the location of peak beta desynchronisation (in the left sensorimotor cortex). Note the clear beta amplitude reduction during stimulation. The inset line plots show the 4-40 Hz trial averaged phase-locked evoked response, with the expected prominent deflections around 20 and 50 ms. (B) Maximum difference in beta-band amplitude (0.3-0.8 s window vs 1-1.5 s window) plotted as a function of age (i.e., each data point shows a different participant; triangles represent children, circles represent adults). Note significant correlation (𝑅2 \= 0.29, 𝑝 = 0.00004 *). (C) Amplitude of the P50 component of the evoked response plotted against age. There was no significant correlation (𝑅2 \= 0.04, 𝑝 = 0.14 ). All data here relate to the index finger stimulation; similar results are available for the little finger stimulation in Supplementary Information Section 1.

      Regarding the developmental effects, the authors appear to compute a modulation index that contrasts the peak beta window (.3 to .8) to a later 1.0-1.5 s window where a rebound is present in older adults. This is problematic for several reasons. First, it prevents the origin of the developmental effect from being discerned, as a difference in the beta decrease following stimulation is confounded with the beta rebound that occurs later. A developmental effect in either of these responses could be driving the effect. From Figure 3, it visually appears that the much later rebound response is driving the developmental effect and not the beta decrease that is the primary focus of the study. Second, these time windows are a concern because a different time window was used to derive the peak voxel used in these analyses. From the methods, it appears the image was derived using the .3-.8 window versus a baseline of 2.5-3.0 s. How do the authors know that the peak would be the same in this other time window (0.3-0.8 vs. 1.0-1.5)? Given the confound mentioned above, I would recommend that the authors contrast each of their windows (0.3-0.8 and 1.0-1.5) with the 2.5-3.0 window to compute independent modulation indices. This would enable them to identify which of the two windows (beta decrease from 0.3-0.8 s or the increase from 1.0-1.5 s) exhibited a developmental effect. Also, for clarity, the authors should write out the equation that they used to compute the modulation index. The direction of the difference (positive vs. negative) is not always clear.

      We completely understand the referee’s point; referee 1 made a similar point. In fact, there are two limitations of our paradigm regarding the measurement of PMBR versus the task-induced beta decrease:

      Firstly, sensory tasks generally do not induce as strong a PMBR as motor tasks and with this in mind a stronger rebound response could have been elicited using a button press. However, as described above it was our intention to scan children down to age 2 and we were sceptical that the youngest children would carry out a button press as instructed.

      The second limitation relates to trial length. Multiple studies have shown that the PMBR can last over ~10 s7,8. Indeed, Pfurtscheller et al. argued in 1999 that it was necessary to leave 10 s between movements to allow the PMBR to return to a true baseline9 Here, we wanted to keep recordings relatively short for the younger participants, and so we adopted a short trial duration. However, a consequence of this short trial length is that it becomes impossible to access the PMBR directly because the PMBR of the nth trial is still ongoing when the (n+1)th trial begins. Because of this, there is no genuine rest period, and so the stimulus induced beta decrease and subsequent rebound cannot be disentangled. This limitation has now been made clear in our discussion as follows:

      “…this was the first study of its kind using OPM-MEG, and consequently aspects of the study design could have been improved. Firstly, the task was designed for children; it was kept short while maximising the number of trials (to maximise signal to noise ratio). However, the classical view of beta modulation includes a PMBR which takes ~10 s to reach baseline following task cessation7–9. Our short trial duration therefore doesn’t allow the rebound to return to baseline between trials, and so conflates PMBR with rest. Consequently, we cannot differentiate the neural generators of the task induced beta power decrease and the PMBR; whilst this helped ensure a short, child friendly task, future studies should aim to use longer rest windows to independently assess which of the two processes is driving age related changes…”

      To clarify our method of calculating the modulation index, we have added the following statement to the methods:

      “The beta modulation index was calculated using the equation , where , and are the average Hilbert-envelope-derived amplitudes in the stimulus (0.3-0.8s), post-stimulus (1-1.5s) and baseline (2.5-3s) windows, respectively.”

      Another complication of using a somatosensory task is that the literature on bursting is much more limited and it is unclear what the expectations would be. Overall, the burst probability appears to be relatively flat across the trial, except that there is a sharp decrease during the beta decrease (.3-.8 s). This matches the conventional trial-averaging analysis, which is good to see. However, how the bursting observed here relates to the motor literature and the PMBR versus beta ERD is unclear.

      Again, we agree completely; a motor task would have better framed the study in the context of existing burst literature – but as mentioned above, making 2-year-olds comply with the instructions for a motor task would have been difficult. Interestingly in a recent paper, Rayson et al. used EEG to investigate burst activity in infants (9 and 12 months) and adults during observed movement execution, with results showing stimulus induced decrease in beta burst rate at all ages, with the largest effects in adults21. This paper was not yet published when we submitted our article but does help us to frame our burst results since there is strong agreement between their study and ours. We now mention this study in both our introduction and discussion. 

      Another weakness is that all participants completed 42 trials, but 19% of the trials were excluded in children and 9% were excluded in adults. The number of trials is proportional to the signal-to-noise ratio. Thus, the developmental differences observed in response amplitude could reflect differences in the number of trials that went into the final analyses.

      This is an important observation and we thank the reviewer for raising the issue. We have now re-analysed all of our data, removing trials in the adults such that the overall number of trials was the same as for the children. All effects with age remained significant. We chose to keep the Figures in the main manuscript with all good trials (as previously) and present the additional analyses (with matched trial numbers) in supplementary information. However, if the reviewer feels strongly, we could do it the other way around (there is very little difference between the results).

      Reviewer #3 (Public Review):

      This study demonstrated the application of OPM-MEG in neurodevelopment studies of somatosensory beta oscillations and connections with children as young as 2 years old. It provides a new functional neuroimaging method that has a high spatial-temporal resolution as well wearable which makes it a new useful tool for studies in young children. They have constructed a 192-channel wearable OPM-MEG system that includes field compensation coils which allow free head movement scanning with a relatively high ratio of usable trials. Beta band oscillations during somatosensory tasks are well localized and the modulation with age is found in the amplitude, connectivity, and panspectral burst probability. It is demonstrated that the wearable OPM-MEG could be used in children as a quite practical and easy-to-deploy neuroimaging method with performance as good as conventional MEG. With both good spatial (several millimeters) and temporal (milliseconds) resolution, it provides a novel and powerful technology for neurodevelopment research and clinical applications not limited to somatosensory areas.

      We thank the reviewer for their summary, and their time in reviewing our manuscript.

      The conclusions of this paper are mostly well supported by data acquired under the proper method. However, some aspects of data analysis need to be improved and extended.

      (1) The colour bars selected for the pseudo-T-static pictures of beta modulation in Figures 2 and 3, which are blue/black and red/black, are not easily distinguished from the anatomical images which are grey-scale. A colour bar without black/white would make these figures better. The peak point locations are also suggested to be marked in Figure 2 and averaged locations in Figure 3 with an error bar.

      Thank you for this comment which we certainly agree with. The colour scheme used has now been changed to avoid black. We have also added peak locations. 

      (2) The data points in plots are not constant across figures. In Figures 3 and 5, they are classified into triangles and circles for children and adults, but all are circles in Figures 4 and 6.

      Thank you! We apologise for the confusion. Data points are now consistent across plots.

      (3) Although MEG is much less susceptible to conductivity inhomogeneity of the head than EEG, the forward modulating may still be impacted by the small head profile. Add more information about source localization accuracy and stability across ages or head size.

      This is an excellent point. We have added to our discussion relating to the accuracy of the forward model. 

      “…We failed to see a significant difference in the spatial location of the cortical representations of the index and little finger; there are three potential reasons for this. First, the system was not designed to look for such a difference – sensors were sparsely distributed to achieve whole head coverage (rather than packed over sensory cortex to achieve the best spatial resolution in one area22). Second, our “pseudo-MRI” approach to head modelling (see Methods) is less accurate than acquisition of participantspecific MRIs, and so may mask subtle spatial differences. Third, we used a relatively straightforward technique for modelling magnetic fields generated by the brain (a single shell forward model). Although MEG is much less susceptible to conductivity inhomogeneity of the head than EEG, the forward model may still be impacted by the small head profile. This may diminish spatial resolution and future studies might look to implement more complex models based on e.g. finite element modelling23. Finally, previous work 24 suggested that, for a motor paradigm in adults, only the beta rebound, and not the power reduction during stimulation, mapped motortopically. This may also be the case for purely sensory stimulation. Nevertheless, it remains the case that by placing sensors closer to the scalp, OPM-MEG should offer improved spatial resolution in children and adults; this should be the topic of future work…”

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Major items to further test include the differing number of trials, the windowing issue, and the focus on motor findings in the intro and discussion. First, I would recommend the authors adjust the number of trials in adults to equate them between groups; this will make their developmental effects easier to interpret.  

      Thank you for raising this important point. This has now been done and appears in our supplementary information as discussed above.

      Second, to discern which responses are exhibiting developmental effects, the authors need to contrast the 0.3-0.8 window with the later window (2.5-3.0), not the window that appears to have the PMBR-like response. This artificially accentuates the response. I also think they should image the 1.0-1.5 vs 2.5-3.0s window to determine whether the response in this time window is in the same location as the decrease and then contrast this for beta differences. 

      We completely understand this point, which relates to separating the reduction in beta amplitude during stimulation and the rebound post stimulation. However, as explained above, doing so unambiguously would require the use of much longer trials. Here we were only able to measure stimulus induced beta modulation (distinct from the separate contributions of the task induced beta power reduction and rebound). It may be that future studies, with >10 s trial length, could probe the role of the PMBR, but such studies require long paradigms which are challenging to implement with children.

      Third, changing the framing of the study to highlight the somatosensory developmental literature would also be an improvement.

      We have added to our introduction a stated in the responses above.

      Finally, the connectivity analysis on data from a somatosensory task did not make sense given the focus of the study and should be removed in my opinion. It is very difficult to interpret given past studies used resting state data and one would expect the networks to dynamically change during different parts of the current task (i.e., stimulation versus baseline).

      We appreciate the point regarding connectivity. However, it was our intention to examine the developmental trajectory of beta oscillations, and a major role of beta oscillations is in mediating connectivity. It is true that most studies are conducted in the resting state (or more recently – particularly in children – during movie watching). The fact that we had a sensory task running is a confound; nevertheless, the connectivity we derived in adults bears a marked similarity to that from previous papers (e.g. 25) and we do see significant changes with age. We therefore believe this to be an important addition to the paper and we would prefer to keep it.

      References

      (1) Holmes, N., Bowtell, R., Brookes, M. J. & Taulu, S. An Iterative Implementation of the Signal Space Separation Method for Magnetoencephalography Systems with Low Channel Counts.

      Sensors 23, 6537 (2023).

      (2) Boto, E. et al. Moving magnetoencephalography towards real-world applications with a wearable system. Nature (2018) doi:10.1038/nature26147.

      (3) Holmes, M. et al. A bi-planar coil system for nulling background magnetic fields in scalp mounted magnetoencephalography. NeuroImage 181, 760–774 (2018).

      (4) Seymour, R. A. et al. Using OPMs to measure neural activity in standing, mobile participants. NeuroImage 244, 118604 (2021).

      (5) Rea, M. et al. A 90-channel triaxial magnetoencephalography system using optically pumped magnetometers. annals of the new york academy of sciences 1517, https://doi.org/10.1111/nyas.14890 (2022).

      (6) Holmes, N. et al. Enabling ambulatory movement in wearable magnetoencephalography with matrix coil active magnetic shielding. NeuroImage 274, 120157 (2023).

      (7) Pakenham, D. O. et al. Post-stimulus beta responses are modulated by task duration. NeuroImage 206, 116288 (2020).

      (8) Fry, A. et al. Modulation of post-movement beta rebound by contraction force and rate of force development. Human Brain Mapping 37, 2493–2511 (2016).

      (9) Pfurtscheller, G. & Lopes da Silva, F. H. Event-related EEG/MEG synchronization and desynchronization: Basic principles. Clin Neurophysio 110, 1842–1857 (1999).

      (10) Seedat, Z. A. et al. The role of transient spectral ‘bursts’ in functional connectivity: A magnetoencephalography study. NeuroImage 209, 116537 (2020).

      (11) Baker, A. P. et al. Fast transient networks in spontaneous human brain activity. eLife 2014, 1867 (2014).

      (12) Vidaurre, D. et al. Spectrally resolved fast transient brain states in electrophysiological data. NeuroImage 126, 81–95 (2016).

      (13) Gaetz, W. & Cheyne, D. Localization of sensorimotor cortical rhythms induced by tactile stimulation using spatially filtered MEG. NeuroImage 30, 899–908 (2006).

      (14) Cheyne, D. et al. Neuromagnetic imaging of cortical oscillations accompanying tactile stimulation. Cognitive Brain Research 17, 599–611 (2003).

      (15) van Ede, F., Jensen, O. & Maris, E. Tactile expectation modulates pre-stimulus β-band oscillations in human sensorimotor cortex. NeuroImage 51, 867–876 (2010).

      (16) Salenius, S., Schnitzler, A., Salmelin, R., Jousmäki, V. & Hari, R. Modulation of Human Cortical Rolandic Rhythms during Natural Sensorimotor Tasks. NeuroImage 5, 221–228 (1997).

      (17) Cheyne, D. O. MEG studies of sensorimotor rhythms: A review. Experimental Neurology 245, 27–39 (2013).

      (18) Kilavik, B. E., Zaepffel, M., Brovelli, A., MacKay, W. A. & Riehle, A. The ups and downs of beta oscillations in sensorimotor cortex. Experimental Neurology 245, 15–26 (2013).

      (19) Bauer, M., Oostenveld, R., Peeters, M. & Fries, P. Tactile Spatial Attention Enhances Gamma-Band Activity in Somatosensory Cortex and Reduces Low-Frequency Activity in Parieto-Occipital Areas. J. Neurosci. 26, 490–501 (2006).

      (20) Barone, J. & Rossiter, H. E. Understanding the Role of Sensorimotor Beta Oscillations. Frontiers in Systems Neuroscience 15, (2021).

      (21) Rayson, H. et al. Bursting with Potential: How Sensorimotor Beta Bursts Develop from Infancy to Adulthood. J Neurosci 43, 8487–8503 (2023).

      (22) Hill, R. M. et al. Optimising the Sensitivity of Optically-Pumped Magnetometer Magnetoencephalography to Gamma Band Electrophysiological Activity. Imaging Neuroscience (2024) doi:10.1162/imag_a_00112.

      (23) Stenroos, M., Hunold, A. & Haueisen, J. Comparison of three-shell and simplified volume conductor models in magnetoencephalography. NeuroImage 94, 337–348 (2014).

      (24) Barratt, E. L., Francis, S. T., Morris, P. G. & Brookes, M. J. Mapping the topological organisation of beta oscillations in motor cortex using MEG. NeuroImage 181, 831–844 (2018).

      (25) Rier, L. et al. Test-Retest Reliability of the Human Connectome: An OPM-MEG study. Imaging Neuroscience (2023) doi:10.1162/imag_a_00020.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Protein conformational changes are often critical to protein function, but obtaining structural information about conformational ensembles is a challenge. Over a number of years, the authors of the current manuscript have developed and improved an algorithm, qFit protein, that models multiple conformations into high resolution electron density maps in an automated way. The current manuscript describes the latest improvements to the program, and analyzes the performance of qFit protein in a number of test cases, including classical statistical metrics of data fit like Rfree and the gap between Rwork and Rfree, model geometry, and global and case-by-case assessment of qFit performance at different data resolution cutoffs. The authors have also updated qFit to handle cryo-EM datasets, although the analysis of its performance is more limited due to a limited number of high-resolution test cases and less standardization of deposited/processed data.

      Strengths:

      The strengths of the manuscript are the careful and extensive analysis of qFit's performance over a variety of metrics and a diversity of test cases, as well as the careful discussion of the limitations of qFit. This manuscript also serves as a very useful guide for users in evaluating if and when qFit should be applied during structural refinement.

      Reviewer #2 (Public Review):

      Summary

      The manuscript by Wankowicz et al. describes updates to qFit, an algorithm for the characterization of conformational heterogeneity of protein molecules based on X-ray diffraction of Cryo-EM data. The work provides a clear description of the algorithm used by qFit. The authors then proceed to validate the performance of qFit by comparing it to deposited X-ray entries in the PDB in the 1.2-1.5 Å resolution range as quantified by Rfree, Rwork-Rfree, detailed examination of the conformations introduced by qFit, and performance on stereochemical measures (MolProbity scores). To examine the effect of experimental resolution of X-ray diffraction data, they start from an ultra high-resolution structure (SARS-CoV2 Nsp3 macrodomain) to determine how the loss of resolution (introduced artificially) degrades the ability of qFit to correctly infer the nature and presence of alternate conformations. The authors observe a gradual loss of ability to correctly infer alternate conformations as resolution degrades past 2 Å. The authors repeat this analysis for a larger set of entries in a more automated fashion and again observe that qFit works well for structures with resolutions better than 2 Å, with a rapid loss of accuracy at lower resolution. Finally, the authors examine the performance of qFit on cryo-EM data. Despite a few prominent examples, the authors find only a handful (8) of datasets for which they can confirm a resolution better than 2.0 Å. The performance of qFit on these maps is encouraging and will be of much interest because cryo-EM maps will, presumably, continue to improve and because of the rapid increase in the availability of such data for many supramolecular biological assemblies. As the authors note, practices in cryo-EM analysis are far from uniform, hampering the development and assessment of tools like qFit.

      Strengths

      qFit improves the quality of refined structures at resolutions better than 2.0 A, in terms of reflecting true conformational heterogeneity and geometry. The algorithm is well designed and does not introduce spurious or unnecessary conformational heterogeneity. I was able to install and run the program without a problem within a computing cluster environment. The paper is well written and the validation thorough.

      I found the section on cryo-EM particularly enlightening, both because it demonstrates the potential for discovery of conformational heterogeneity from such data by qFit, and because it clearly explains the hurdles towards this becoming common practice, including lack of uniformity in reporting resolution, and differences in map and solvent treatment.

      Weaknesses

      The authors begin the results section by claiming that they made "substantial improvement" relative to the previous iteration of qFit, "both algorithmically (e.g., scoring is improved by BIC, sampling of B factors is now included) and computationally (improving the efficiency and reliability of the code)" (bottom of page 3). However, the paper does not provide a comparison to previous iterations of the software or quantitation of the effects of these specific improvements, such as whether scoring is improved by the BIC, how the application of BIC has changed since the previous paper, whether sampling of B factors helps, and whether the code faster. It would help the reader to understand what, if any, the significance of each of these improvements was.

      Indeed, it is difficult (embarrassingly) to benchmark against our past work due to the dependencies on different python packages and the lack of software engineering. With the infrastructure we’ve laid down with this paper, made possible by an EOSS grant from CZI, that will not be a problem going forward. Not only is the code more reliable and standardized, but we have developed several scientific test sets that can be used as a basis for broad comparisons to judge whether improvements are substantial. We’ve also changed with “substantial improvement” to “several modifications”  to indicate the lack of comparison to past versions.

      The exclusion of structures containing ligands and multichain protein models in the validation of qFit was puzzling since both are very common in the PDB. This may convey the impression that qFit cannot handle such use cases. (Although it seems that qFit has an algorithm dedicated to modeling ligand heterogeneity and seems to be able to handle multiple chains). The paper would be more effective if it explained how a user of the software would handle scenarios with ligands and multiple chains, and why these would be excluded from analysis here.

      qFit can indeed handle both. We left out multiple chains for simplicity in constructing a dataset enriched for small proteins while still covering diversity to speed the ability to rapidly iterate and test our approaches. Improvements to qFit ligand handling will be discussed in a forthcoming work as we face similar technical debt to what we saw in proteins and are undergoing a process of introducing “several modifications” that we hope will lead to “substantial improvement” - but at the very least will accelerate further development.

      It would be helpful to add some guidance on how/whether qFit models can be further refined afterwards in Coot, Phenix, ..., or whether these models are strictly intended as the terminal step in refinement.

      We added to the abstract:

      “Importantly, unlike ensemble models, the multiconformer models produced by qFit can be manually modified in most major model building software (e.g. Coot)  and fit can be further improved by refinement using standard pipelines (e.g. Phenix, Refmac, Buster).”

      and introduction:

      “Multiconformer models are notably easier to modify and more interpretable in software like Coot12 unlike ensemble methods that generate multiple complete protein copies(Burnley et al. 2012; Ploscariu et al. 2021; Temple Burling and Brünger 1994).”

      and results:

      “This model can then be examined and edited in Coot12 or other visualization software, and further refined using software such as phenix.refine, refmac, or buster as the modeler sees fit.”

      and discussion

      “qFit is compatible with manual modification and further refinement as long as the subsequent software uses the PDB standard altloc column, as is common in most popular modeling and refinement programs. The models can therefore generally also be deposited in the PDB using the standard deposition and validation process.”

      Appraisal & Discussion

      Overall, the authors convincingly demonstrate that qFit provides a reliable means to detect and model conformational heterogeneity within high-resolution X-ray diffraction datasets and (based on a smaller sample) in cryo-EM density maps. This represents the state of the art in the field and will be of interest to any structural biologist or biochemist seeking to attain an understanding of the structural basis of the function of their system of interest, including potential allosteric mechanisms-an area where there are still few good solutions. That is, I expect qFit to find widespread use.

      Reviewer #3 (Public Review):

      Summary:

      The authors address a very important issue of going beyond a single-copy model obtained by the two principal experimental methods of structural biology, macromolecular crystallography and cryo electron microscopy (cryo-EM). Such multiconformer model is based on the fact that experimental data from both these methods represent a space- and time-average of a huge number of the molecules in a sample, or even in several samples, and that the respective distributions can be multimodal. Different from structure prediction methods, this approach is strongly based on high-resolution experimental information and requires validated single-copy high-quality models as input. Overall, the results support the authors' conclusions.

      In fact, the method addresses two problems which could be considered separately:

      - An automation of construction of multiple conformations when they can be identified visually;

      - A determination of multiple conformations when their visual identification is difficult or impossible.

      We often think about this problem similarly to the reviewer. However, in building qFit, we do not want to separate these problems - but rather use the first category (obvious visual identification) to build an approach that can accomplish part of the second category (difficult to visualize) without building “impossible”/nonexistent conformations - with a consistent approach/bias.

      The first one is a known problem, when missing alternative conformations may cost a few percent in R-factors. While these conformations are relatively easy to detect and build manually, the current procedure may save significant time being quite efficient, as the test results show.

      We agree with the reviewers' assessment here. The “floor” in terms of impact is automating a tedious part of high resolution model building and improving model quality.

      The second problem is important from the physical point of view and has been addressed first by Burling & Brunger (1994; https://doi.org/10.1002/ijch.199400022). The new procedure deals with a second-order variation in the R-factors, of about 1% or less, like placing riding hydrogen atoms, modeling density deformation or variation of the bulk solvent. In such situations, it is hard to justify model improvement. Keeping Rfree values or their marginal decreasing can be considered as a sign that the model is not overfitted data but hardly as a strong argument in favor of the model.

      We agree with the overall sentiment of this comment. What is a significant variation in R-free is an important question that we have looked at previously (http://dx.doi.org/10.1101/448795) and others have suggested an R-sleep for further cross validation (https://pubmed.ncbi.nlm.nih.gov/17704561/). For these reasons it is important to get at the significance of the changes to model types from large and diverse test sets, as we have here and in other works, and from careful examination of the biological significance of alternative conformations with experiments designed to test their importance in mechanism.

      In general, overall targets are less appropriate for this kind of problem and local characteristics may be better indicators. Improvement of the model geometry is a good choice. Indeed, yet Cruickshank (1956; https://doi.org/10.1107/S0365110X56002059) showed that averaged density images may lead to a shortening of covalent bonds when interpreting such maps by a single model. However, a total absence of geometric outliers is not necessarily required for the structures solved at a high resolution where diffraction data should have more freedom to place the atoms where the experiments "see" them.

      Again, we agree—geometric outliers should not be completely absent, but it is comforting when they and model/experiment agreement both improve.

      The key local characteristic for multi conformer models is a closeness of the model map to the experimental one. Actually, the procedure uses a kind of such measure, the Bayesian information criteria (BIC). Unfortunately, there is no information about how sharply it identifies the best model, how much it changes between the initial and final models; in overall there is not any feeling about its values. The Q-score (page 17) can be a tool for the first problem where the multiple conformations are clearly separated and not for the second problem where the contributions from neighboring conformations are merged. In addition to BIC or to even more conventional target functions such as LS or local map correlation, the extreme and mean values of the local difference maps may help to validate the models.

      We agree with the reviewer that the problem of “best” model determination is poorly posed here. We have been thinking a lot about htis in the context of Bayesian methods (see: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9278553/); however, a major stumbling block is in how variable representations of alternative conformations (and compositions) are handled. The answers are more (but by no means simply) straightforward for ensemble representations where the entire system is constantly represented but with multiple copies.

      This method with its results is a strong argument for a need in experimental data and information they contain, differently from a pure structure prediction. At the same time, absence of strong density-based proofs may limit its impact.

      We agree - indeed we think it will be difficult to further improve structure prediction methods without much more interaction with the experimental data.

      Strengths:

      Addressing an important problem and automatization of model construction for alternative conformations using high-resolution experimental data.

      Weaknesses:

      An insufficient validation of the models when no discrete alternative conformations are visible and essentially missing local real-space validation indicators.

      While not perfect real space indicators, local real-space validation is implicit in the MIQP selection step and explicit when we do employ Q-score metrics.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      A point of clarification: I don't understand why waters seem to be handled differently in for cryo-EM and crystallography datasets. I am interested about the statement on page 19 that the Molprobity Clashscore gets worse for cryo-EM datasets, primarily due to clashes with waters. But the qFit algorithm includes a round of refinement to optimize placement of ordered waters, and the clashscore improves for the qFit refinement in crystallography test cases. Why/how is this different for cryo-EM?

      We agree that this was not an appropriate point. We believe that the high clash score is coming from side chains being incorrectly modeled. We have updated this in the manuscript and it will be a focus of future improvements.

      Reviewer #2 (Recommendations For The Authors):

      - It would be instructive to the reader to explain how qFit handles the chromophore in the PYP (1OTA) example. To this end, it would be helpful to include deposition of the multiconformer model of PYP. This might also be a suitable occasion for discussion of potential hurdles in the deposition of multiconformer models in the PDB (if any!). Such concerns may be real concerns causing hesitation among potential users.

      Thank you for this comment. qFit does not alter the position or connectivity of any HETATM records (like the chromophore in this structure). Handling covalent modifications like this is an area of future development.

      Regarding deposition, we have noted above that the discussion now includes:

      “qFit is compatible with manual modification and further refinement as long as the subsequent software uses the PDB standard altloc column, as is common in most popular modeling and refinement programs. The models can therefore, generally also be deposited in the PDB using the standard deposition and validation process.”

      Finally, we have placed all PDBs in a Zenodo deposition (XXX) and have included that language in the manuscript. It is currently under a separate data availability section (page XXX). We will defer to the editor as to the best header that should go under.

      - It may be advisable to take the description of true/false pos/negatives out of the caption of Figure 4, and include it in a box or so, since these terms are important in the main text too, and the caption becomes very cluttered.

      We think adding the description of true/false pos/negatives to the Figure panel would make it very cluttered and wordy. We would like to retain this description within the caption. We have also briefly described each in the main text.

      - page 21, line 4: some issue with citation formatting.

      We have updated these citations.

      - page 25, second paragraph: cardinality is the number of members of a set. Perhaps "minimal occupancy" is more appropriate.

      Thank you for pointing this out. This was a mistake and should have been called the occupancy threshold.

      - page 26: it's - its

      Thank you, we have made this change. 

      - Font sizes in Supplementary Figures 5-7 are too small to be readable.

      We agree and will make this change. 

      Reviewer #3 (Recommendations For The Authors):

      General remarks

      (1) As I understand, the procedure starts from shifting residues one by one (page 4; A.1). Then, geometry reconstruction (e.g., B1) may be difficult in some cases joining back the shifted residues. It seems that such backbone perturbation can be done more efficiently by shifting groups of residues ("potential coupled motions") as mentioned at the bottom of page 9. Did I miss its description?

      We would describe the algorithm as sampling (which includes minimal shifts) in the backbone residues to ensure we can link neighboring residues. We agree that future iterations of qFit should include more effective backbone sampling by exploring motion along the Cβ-Cα, C-N, and (Cβ-Cα × C-N) bonds and exploring correlated backbone movements.

      (2) While the paper is well split in clear parts, some of them seem to be not at their right/optimal place and better can be moved to "Methods" (detailed "Overview of the qFit protein algorithm" as a whole) or to "Data" missed now (Two first paragraphs of "qFit improves overall fit...", page 8, and "Generating the qFit test set", page 22, and "Generating synthetic data ..." at page 26; description of the test data set), At my personal taste, description of tests with simulated data (page 15) would be better before that of tests with real data.

      Thank you for this comment, but we stand by our original decision to keep the general flow of the paper as it was submitted.

      (3) I wonder if the term "quadratic programming" (e.g., A3, page 5) is appropriate. It supposes optimization of a quadratic function of the independent parameters and not of "some" parameters. This is like the crystallographic LS which is not a quadratic function of atomic coordinates, and I think this is a similar case here. Whatever the answer on this remark is, an example of the function and its parameters is certainly missed.

      We think that the term quadratic programming is appropriate. We fit a function with a loss function (observed density - calculated density), while satisfying the independent parameters. We fit the coefficients minimizing a quadratic loss. We agree that the quadratic function is missing from the paper, and we have now included it in the Methods section.

      Technical remarks to be answered by the authors :

      (1) Page 1, Abstract, line 3. The ensemble modeling is not the only existing frontier, and saying "one of the frontiers" may be better. Also, this phrase gives a confusing impression that the authors aim to predict the ensemble models while they do it with experimental data.

      We agree with this statement and have re-worded the abstract to reflect this.

      (2) Page 2. Burling & Brunger (1994) should be cited as predecessors. On the contrary, an excellent paper by Pearce & Gros (2021) is not relevant here.

      While we agree that we should mention the Burling & Brunger paper and the Pearce & Gros (2021) should not be removed as it is not discussing the method of ensemble refinement.

      (3) Page 2, bottom. "Further, when compared to ..." The preference to such approach sounds too much affirmative.

      We have amended this sentence to state:

      “Multiconformer models are notably easier to modify and more interpretable in software like Coot(Emsley et al. 2010) unlike ensemble methods that generate multiple complete protein copies(Burnley et al. 2012; Ploscariu et al. 2021; Temple Burling and Brünger 1994).”

      “The point we were trying to make in this sentence was that ensemble-based models are much harder to manually manipulate in Coot or other similar software compared to multiconformer models. We think that the new version of this sentence states this point more clearly.”

      (4) Page 2, last paragraph. I do not see an obvious relation of references 15-17 to the phrase they are associated with.

      We disagree with this statement, and think that these references are appropriate.

      “Multiconformer models are notably easier to modify and more interpretable in software like Coot12 unlike ensemble methods that generate multiple complete protein copies(Burnley et al. 2012; Ploscariu et al. 2021; Temple Burling and Brünger 1994).”

      (5) Page 3, paragraph 2. Cryo-EM maps should be also "high-resolution"; it does not read like this from the phrase.

      We agree that high-resolution should be added, and the sentence now states:

      “However, many factors make manually creating multiconformer models difficult and time-consuming. Interpreting weak density is complicated by noise arising from many sources, including crystal imperfections, radiation damage, and poor modeling in X-ray crystallography, and errors in particle alignment and classification, poor modeling of beam induced motion, and imperfect detector Detector Quantum Efficiency (DQE) in high-resolution cryo-EM.”

      (6) Page 3, last paragraph before "results". The words "... in both individual cases and large structural bioinformatic projects" do not have much meaning, except introducing a self-reference. Also, repeating "better than 2 A" looks not necessary.

      We agree that this was unnecessary and have simplified the last sentence to state:

      “With the improvements in model quality outlined here, qFit can now be increasingly used for finalizing high-resolution models to derive ensemble-function insights.”

      (7) Page 3. "Results". Could "experimental" be replaced by a synonym, like "trial", to avoid confusing with the meaning "using experimental data"?

      We have replaced experimental with exploratory to describe the use of qFit on CryoEM data. The statement now reads:

      “For cryo-EM modeling applications, equivalent metrics of map and model quality are still developing, rendering the use of qFit for cryo-EM more exploratory.”

      (8) Page 4, A.1. Should it be "steps +/- 0.1" and "coordinate" be "coordinate axis"? One can modify coordinates and not shift them. I do not understand how, with the given steps, the authors calculated the number of combinations ("from 9 to 81"). Could a long "Alternatively, ...absent" be reduced simply to "Otherwise"?

      We have simplified and clarified the sentence on the sampling of backbone coordinates to state:

      “If anisotropic B-factors are absent, the translation of coordinates occurs in the X, Y, and Z directions. Each translation takes place in steps of 0.1 along each coordinate axis, extending to 0.3 Å, resulting in 9 (if isotropic) or to 81 (if anisotropic) distinct backbone conformations for further analysis.”

      (9) Page 6, B.1, line 2. Word "linearly" is meaningless here.

      We have modified this to read:

      “Moving from N- to C- terminus along the protein,”

      (10) Page 9, line 2. It should be explained which data set is considered as the test set to calculate Rfree.

      We think this is clear and would be repetitive if we duplicated it.

      (11) Page 9, line 7. It should be "a valuable metric" and not "an"

      We agree and have updated the sentence to read:

      “Rfree is a valuable metric for monitoring overfitting, which is an important concern when increasing model parameters as is done in multiconformer modeling.”

      (12) Page 10, paragraph 3. "... as a string (Methods)". I did not find any other mention of this term "string", including in "Methods" where it supposed to be explained. Either this should be explained (and an example is given?), or be avoided.

      We agree that string is not necessary (discussing the programmatic datatype). We have removed this from the sentence. It now reads:

      “To quantify how often qFit models new rotameric states, we analyzed the qFit models with phenix.rotalyze, which outputs the rotamer state for each conformer (Methods).”

      (13) Page10, lines 3-4 from bottom. Are these two alternative conformations justified?

      We are unsure what this is referring to.

      (14) Page 12, Fig. 2A. In comparison with Supplement Fig 2C, the direction of axes is changed. Could they be similar in both Figures?

      We have updated Supplementary Figure 2C to have the same direction of axes as Figure 2A.

      (15) Page 15, section's title. Choose a single verb in "demonstrate indicate".

      We have amended the title of this section to be:

      “Simulated data demonstrate qFit is appropriate for high-resolution data.”

      (16) Page 15, paragraph 2. "Structure factors from 0.8 to 3.0 A resolution" does not mean what the author wanted apparently to tell: "(complete?) data sets with the high-resolution limit which varied from 0.8 to 3.0 A ...". Also, a phrase of "random noise increasing" is not illustrated by Figs.5 as it is referred to.

      We have edited this sentence to now read:

      “To create the dataset for resolution dependence, we used the ground truth 7KR0 model, including all alternative conformations, and generated artificial structure factors with a high resolution limit ranging from  0.8 to 3.0 Å resolution (in increments of 0.1 Å).”

      (17) Page 15, last paragraph is written in a rather formal and confusing way while a clearer description is given in the figure legend and repeated once more in Methods. I would suggest to remove this paragraph.

      We agree that this is confusing. Instead of create a true positive/false positive/true negative/false negative matrix, we have just called things as they are, multiconformer or single conformer and match or no match. We have edited the language the in the manuscript and figure legends to reflect these changes.

      (18) Page 16. Last two paragraphs start talking about a new story and it would help to separate them somehow from the previous ones (sub-title?).

      We agree that this could use a subtitle. We have included the following subtitle above this section:

      “Simulated multiconformer data illustrate the convergence of qFit.”

      (19) Page 20. "or static" and "we determined that" seem to be not necessary.

      We have removed static and only used single conformer models. However, as one of the main conclusions of this paper is determining that qFit can pick up on alternative conformers that were modeled manually, we have decided to the keep the “we determined that”.

      (20) Page 21, first paragraph. "Data" are plural; it should be "show" and "require"

      We have made these edits. The sentence now reads:

      “However, our data here shows that not only does qFit need a high-resolution map to be able to detect signal from noise, it also requires a very well-modeled structure as input.”

      (21) Page 21, References should be indicated as [41-45], [35,46-48], [55-57]. A similar remark to [58-63] at page 22.

      We have fixed the reference layout to reflect this change.

      (22) Page 21, last paragraph. "Further reduce R-factors" (moreover repeated twice) is not correct neither by "further", since here it is rather marginal, nor as a goal; the variations of R-factors are not much significant. A more general statement like "improving fit to experimental data" (keeping in mind density maps) may be safer.

      We agree with the duplicative nature of these statements. We have amended the sentence to now read:

      “Automated detection and refinement of partial-occupancy waters should help improve fit to experimental data further reduce Rfree15 and provide additional insights into hydrogen-bond patterns and the influence of solvent on alternative conformations.”

      (23) Page 22. Sub-sections of "Methods" are given in a little bit random order; "Parallelization of large maps" in the middle of the text is an example. Put them in a better order may help.

      We have moved some section of the Methods around and made better headings by using an underscore to highlight the subsections (Generating and running the qFit test set, qFit improved features, Analysis metrics, Generating synthetic data for resolution dependence).

      (24) Page 24. Non-convex solution is a strange term. There exist non-convex problems and functions and not solutions.

      We agree and we have changed the language to reflect that we present the algorithm with non-convex problems which it cannot solve.

      (25) Page 26, "Metrics". It is worthy to describe explicitly the metrics and not (only) the references to the scripts.

      For all metrics, we describe a sentence or two on what each metric describes. As these metrics are well known in the structural biology field, we do not feel that we need to elaborate on them more.

      (26) Page 26. Multiplying B by occupancy does not have much sense. A better option would be to refer to the density value in the atomic center as occ*(4*pi/B)^1.5 which gives a relation between these two entities.

      We agree and have update the B-factor figures and metrics to reflect this.

      (27) Page 40, suppl. Fig. 5. Due to the color choice, it is difficult to distinguish the green and blue curves in the diagram.

      We have amended this with the colors of the curves have been switched.

      (28) Page 42, Suppl. Fig. 7. (A) How the width of shaded regions is defined? (B) What the blue regions stand for? Input Rfree range goes up to 0.26 and not to 0.25; there is a point at the right bound. (C) Bounds for the "orange" occupancy are inversed in the legend.

      (A) The width of the shaded region denotes the standard deviations among the values at every resolution. We have made this clearer in the caption

      (B) The blue region denotes the confidence interval for the regression estimate. Size of the confidence interval was set to 95%. We have made this clearer in the caption

      (C) This has been fixed now

      The maximum R-free value is 0.2543, which we rounded down to 0.25.

      (29) Page 43. Letters E-H in the legend are erroneously substituted by B-E.

      We apologize for this mistake. It is now corrected.

    1. AbstractMost of available reference genomes are lack of the sequence map of sex-limited chromosomes, that make the assemblies uncompleted. Recent advances on long reads sequencing and population sequencing raise the opportunity to assemble sex-limited chromosomes without the traditional complicated experimental efforts. We introduce a computational method that shows high efficiency on sorting and assembling long reads sequenced from sex-limited chromosomes. It will lead to the complete reference genomes and facilitate downstream research of sex-limited chromosomes.Competing Interest StatementThe authors have declared no competing interest.

      Reviewer 3. Arang Rhie

      Comments to Author: 1. In the introduction, add recent marker based graph phasing algorithms in long-reads, such as hifiasm trio and verkko trio mode after the T2T-Y. They are different from trio-binning, which tries to phase the reads upfront. Graph based phasing is using markers to determine haplotype specific paths to traverse. a. T2T-Y chromosome should be referencing Rhie et al., Nature 2023. Verkko is a successor of the manual efforts taken in T2T-Y, which should be also noted in the introduction. b. Reference for sexPhase program is still missing. Also, some rephrasing of the sentence is needed, as the way it is currently written is easily misleading to be understood as sexPhase was part of the methods used in the assembly of the T2T-Y. 2. There are other approaches for phasing genomes taken in plants, for example the poly ploid potato phasing using many siblings of the child by Mari et al. bioRxiv 2022.3. "But only one male and one female could suffer from sampling error" - this part is unclear. Please clarify. 4. Reference for the mason_simulator, badread software is missing. 5. Provide the accession (HG02982) for the "African human Y" in the main text. 6. I appreciate that the authors compared assemblies to T2T-Y as I requested before. However, fundamentally, mapping to T2T-Y and comparing length of each sequence classes is comparing apples to oranges, particularly in the heterochromatic region and ampliconic region of the Y. It is known to have variable copy numbers and size differences between two individuals. Frequent inversions have been reported in the ampliconic regions across different Y haplogroup. The number, size, and distribution of the repeat arrays composing the heterochromatic region has been shown to vary among different Y haplogroups in Hallast et al., Nature 2023. This can be also seen in Fig. 3c; the overall depth of the flow sorting in the heterochromatic region is below 1 - indicating the Yqh is shorter than T2T-Y, as it is in Fig. 3b. To make the benchmark legit, the authors should compare SRY and the flow sorting method using samples from the same individual. HG02982 and HX1 are presumably having very different sequence compositions given the diverged population history (African vs. Asian). Comparing total length of the assembled region against a 3rd different Y haplogroup (HG002Y) makes things more complicated, especially on regions that are known to vary a lot. If the authors think flow sorting based method needs to be compared, it should be benchmarked on the same individual to make an apple-to-apple comparison. I do agree results from read sorting (i.e. portion of reads sequenced from non-Y chromosomes in SRY vs. flow-sorting) is an important finding. However, I'd still argue comparing assemblies from the two different Y haplogroups is a stretch. The authors could have performed the same assembly length comparison on the T2T-Y using results from their SRY sorted reads with Verkko of HG002 vs. Verkko assembly using trio-binned markers. 7. In the section where assemblies are compared, the authors point to Table 1, which contains results from HG01109. HG01109 has never been mentioned before. I thought the authors were comparing assemblies from SRY sorted reads of HX1? I am not sure why the authors suddenly added a 3rd PUR genome with no context. Was this a mistake? Add results from HX1 to Table 1. 8. Please add divider lines in Table 1 between All / Ampliconic / X-degenerate / X-transposed / PAR / Het / Others. It is hard to see which rows belong to which category. 9. The last result section where authors compare results from Verkko, it is unclear how the verkko assembly was run. The authors say "default option", and later "in trio mode" in the methods. Did the authors collect parental reads from HG002 (HG003 and HG004)? How was "trio mode" performed? Did the authors used trio binning to sort the reads, then run Verkko? Or used the homopolymer compressed parental kmers and used that in the Rukki step of Verkko (and this should be benchmarked)? Was the HG002 trio assembly taken from Rautiainen et al. paper? Please clarify and add the missing parts to the main text and methods. 10. Related to the above section, it is hard to see in Fig. 4a the "two approximately 1 Mb contigs aligning to the same region of the Y chromosome". An enlarged inset of the dotplot may be helpful. Also, add legends and scale to the X and Y axis of the dotplots. 11. Note there is a mis-assembly reported on T2T-Y palindrome P5 (https://github.com/marbl/CHM13-issues/blob/main/v2.0_issues.bed), which the entire P5 should be inverted. I don't see this in the dotplots of Fig. 4. 12. In the discussion, the authors are mentioning results from the 10 trios that have been removed from the previous results. Please add the 10 trio results to the main text if it was a mistake, or remove the irrelevant results from the Discussions and Supp. Tables. 13. The authors discuss the suboptimal performance of SRY in the PAR is contributed by the restricted data types. I thought it was contributed by the lower density of the markers? The PAR parental marker density was very similar to that of autosomes, with stretches of runs of homozygosity, presumably to maintain enough homology for recombination. What was the marker density in the PAR? Was it below their 7 kmer / 1kb? 14. The authors mentioned there are no ZW genomes available to test SRY. There is a Zebra finch trio (ZW, female, bTaeGut2) and a male sample (ZZ, male, bTaeGut1) available with HiFi of the child (bTaeGut2) and Illumina of all the genomes from the Vertebrate Genomes Project (Rhie et al., Nature, 2021). Perhaps the authors could apply SRY on this individual, and compare the W chromosome results to what has been released on https://www.genomeark.org/vgp-all/Taeniopygia_guttata.html.

      Re-review: The authors have addressed most of my concerns. The revised manuscript reads much better than before. Regarding my last comment and response from the authors about the W chromosome, I was hoping to see comparable coverage of the W chromosome to the reference, as a proof of principle that SRY could be applied to non-human, highly diverged genomes. The assembly looks very fragmented though. Was it only the similarity to the Z chromosome that caused the fragmentation? Are there no other factors contributing to the discontinuity of the W chromosome? A few minor comments below to the revised version: 1. Please indicate which genome was compared in the legend of Supp. Table 5. 2.When using et al notations, please use the last name. Mari et al should be Serra Mari et al., Mikko et al should be Rautiainen et al. Also, Serra Mari et al is now published in Genome Biology: https://doi.org/10.1186/s13059-023-03160-z. Please update the reference. 3. There are a few grammar corrections to make.

    1. Dynamic functional connectivity (dFC) has become an important measure for understanding brain function and as a potential biomarker. However, various methodologies have been developed for assessing dFC, and it is unclear how the choice of method affects the results. In this work, we aimed to study the results variability of commonly-used dFC methods. We implemented seven dFC assessment methods in Python and used them to analyze fMRI data of 395 subjects from the Human Connectome Project. We measured the pairwise similarity of dFC results using several similarity metrics in terms of overall, temporal, spatial, and inter-subject similarity. Our results showed a range of weak to strong similarity between the results of different methods, indicating considerable overall variability. Surprisingly, the observed variability in dFC estimates was comparable to the expected natural variation over time, emphasizing the impact of methodological choices on the results. Our findings revealed three distinct groups of methods with significant inter-group variability, each exhibiting distinct assumptions and advantages. These findings highlight the need for multi-analysis approaches to capture the full range of dFC variation. They also emphasize the importance of distinguishing neural-driven dFC variations from physiological confounds, and developing validation frameworks under a known ground truth. To facilitate such investigations, we provide an open-source Python toolbox that enables multi-analysis dFC assessment. This study sheds light on the impact of dFC assessment analytical flexibility, emphasizing the need for careful method selection and validation, and promoting the use of multi-analysis approaches to enhance reliability and interpretability of dFC studies.Competing Interest StatementThe authors have declared no competing interest.

      Reviewer 2. Nicolas Farrugia

      Comments to Author: Summary of review This paper fills a very important gap in the literature investigating time-varying functional connectivity (or dynamic functional connectivity, dFC), by measuring analytical flexibility of seven different dFC methods. An impressive amount of work has been put up to generate a set of convincing results, that essentially show that the main object of interest of dFC, which is the temporal variability of connectivity, cannot be measured with a high consistency, as this variability is of the same order of magnitude or even higher than the changes observed across different methods on the same data. In this very controversial field, it is very remarkable to note that the authors have managed to put together a set of analysis to demonstrate this in a very clear and transparent way. The paper is very well written, the overall approach is based on a few assumptions that make it possible to compare methods (e.g. subsampling of temporal aspects of some methods, spatial subsampling), and the provided analysis is very complete. The most important results are condensed in a few figures in the main manuscript, which is enough to convey the main messages. The supplementary materials provide an exhaustive set of additional results, which are shortly discussed one by one. Most importantly, the authors have provided an open source implementation of 7 main dfc methods. This is very welcome for the community and for reproductibility, and is of course particularly suited for this kind of contribution. A few suggestions follow. Clarification questions and suggestions : 1- How was the uniform downsampling of 286 ROI to 96 done ? Uniform in which sense ? According to the RSN ? Were ROIs regrouped with spatial contiguity ? I understand this was done in order to reduce computational complexity and to harmonize across methods, but the manuscript would benefit from having an added sentence to explain what was done. 2- Table A in figure 1 shows the important hyperparameters (HP) for each method, but the motivations regarding the choice of HP for each method is only explained in the discussion (end of page 11, "we adopted the hyperparameter values recommended by the original paper or consensus among the community for each method"). It would be better to explain it in the methods, and then only discuss why this can be a limitation, in the discussion. 3- The github repository https://github.com/neurodatascience/dFC/tree/main does not reference the paper 4- The github repository https://github.com/neurodatascience/dFC/tree/main is not documented enough. There are two very large added values in this repo : open implementation of methods, and analytical flexibility tools. The demo notebook shows how to use the analytical flexibility tools, but the methods implementation is not documented. I expect that many people will want to perform analysis using the methods as well as comparison analysis, so the documentation of individual methods should not be minimized. 5 - For the reader, it would be better to include early in the manuscript (in the introduction) the presence of the code for reproductibility. Currently, the toolbox is only introduced in the final paragraph of the discussion. It comes as a very nice suprise when reading the manuscript in full, but I think the manuscript would gain a lot of value if this paragraph was included earlier, and if the development of the toolbox was included much earlier (ie. in the abstract). 6 - We have published two papers on dFC that the authors may want to include, although these papers have investigated cerebello-cerebral dFC using whole brain + cerebellum parcellations. The first paper used continuous HMM on healthy subjects, and found correlations with impulsivity scores, while the second papers used network measures on sliding window dFC matrices on a clinical cohort (patients with alcohol use disorder). I am not sure why the authors have not found our papers in their litterature, but maybe it would be good to include them. Authors need to update the final table in supplementary materials as well as the citations in the main paper. Abdallah, M., Farrugia, N., Chirokoff, V., & Chanraud, S. (2020). Static and dynamic aspects of cerebro-cerebellar functional connectivity are associated with self-reported measures of impulsivity: A resting-state fMRI study. Network Neuroscience, 4(3), 891-909. Abdallah, M., Zahr, N. M., Saranathan, M., Honnorat, N., Farrugia, N., Pfefferbaum, A., Sullivan, E. & Chanraud, S. (2021). Altered cerebro-cerebellar dynamic functional connectivity in alcohol use disorder: a resting-state fMRI study. The Cerebellum, 20, 823-835. Note that in Abdallah et al. (2020), while we did not compare HMM results with other dFC methods, we did investigate the influence of HMM hyperparameters, as well as perform internal cross validation on our sample + null models of dFC.

      Minor comments 6 - "[..] what lies behind the of methods. Instead, they reveal three groups of methods, 720 variations in dynamic functional connectivity?. " -> an extra "." was added (end of page 10).

    1. Background Culture-free real-time sequencing of clinical metagenomic samples promises both rapid pathogen detection and antimicrobial resistance profiling. However, this approach introduces the risk of patient DNA leakage. To mitigate this risk, we need near-comprehensive removal of human DNA sequence at the point of sequencing, typically involving use of resource-constrained devices. Existing benchmarks have largely focused on use of standardised databases and largely ignored the computational requirements of depletion pipelines as well as the impact of human genome diversity.Results We benchmarked host removal pipelines on simulated Illumina and Nanopore metagenomic samples. We found that construction of a custom kraken database containing diverse human genomes results in the best balance of accuracy and computational resource usage. In addition, we benchmarked pipelines using kraken and minimap2 for taxonomic classification of Mycobacterium reads using standard and custom databases. With a database representative of the Mycobacterium genus, both tools obtained near-perfect precision and recall for classification of Mycobacterium tuberculosis. Computational efficiency of these custom databases was again superior to most standard approaches, allowing them to be executed on a laptop device.Conclusions Nanopore sequencing and a custom kraken human database with a diversity of genomes leads to superior host read removal from simulated metagenomic samples while being executable on a laptop. In addition, constructing a taxon-specific database provides excellent taxonomic read assignment while keeping runtime and memory low. We make all customised databases and pipelines freely available.Competing Interest StatementThe authors have declared no competing interest.

      Reviewer 2. Darrin Lemmer, M.S.

      Comments to Author: This paper describes a method for improving the accuracy and efficiency of extracting a pathogen of interest (M. tuberculosis in this instance, though the methods should work equally well for other pathogens) from a "clinical" metagenomic sample. The paper is well written and provides links to all source code and datasets used, which were well organized and easy to understand. The premise – that using a pangenome database improves classification -- seems pretty intuitive, but it is nice to see some benchmarking to prove it. For clarity I will arrange my comments by the three major steps of your methods: dataset generation, human read removal, and Mycobacterium read classification. 1. Dataset generation -- I appreciate that you used a real-world study (reference #8) to approximate the proportions of organisms in your sample, however I am disappointed that you generated exactly one dataset for benchmarking. Even if you use the exact same community composition, there is a level of randomness involved in generating sequencing reads, and therefore some variance. I would expect to see multiple generations and an averaging of the results in the tables, however with a sufficiently high read depth, the variance won't likely change your results much, so it would be nice, and more true to real sequencing data, to vary the number of reads generated (I didn't see where you specified to what read depth for each species you generated the reads for), as it is rare in the real world to always get this deep of coverage. Ideally it would also be nice to see datasets varying the proportions of MTBC in the sample to test the limits of detection, but that may be beyond the scope of this particular paper. 2. Human read removal -- The data provided do not really support the conclusion, as all methods benchmarked performed quite well and, particularly when using the long reads from the Nanopore simulated dataset, fairly indistinguishable with the exception of HRRT. The short Illumina reads show a little more separation between the methods, probably due to the shorter sequences being able to align to multiple sequences in the reference databases, however comparing kraken human to kraken HPRC still shows very little difference, thus not supporting the conclusion that the pangenome reference provides "superior" host removal. The run times and memory used do much more to separate the performance of the various methods, and particularly with the goal of being able to run the analysis on a personal computer where peak memory usage is important. The only methods that perform well within the memory constraints of a personal computer for both long reads and short leads are HRRT and the two kraken methods, with kraken being superior at recall, but again, kraken human and kraken HPRC are virtually indistinguishable, making it hard to justify the claim that the pangenome is superior. Also, it appears your run time and peak memory usage is again based on one single data point, these should be performed multiple times and averaged. Finally, as an aside, I did find it interesting and disturbing that HRRT had such a high false negative rate compared to the other methods, given that this is the primary method used by NCBI for publishing in the SRA database, implying there are quite a few human remaining in SRA. 3. Mycobacterium read classification -- Here we do have some pretty good support for using a pangenome reference database, particularly compared to the kraken standard databases, though as mentioned previously, a single datapoint isn't really adequate, and I'd like to see both multiple datasets and multiple runs of each method. Additionally, given the purpose here is to improve the amount of MTB extracted from a metagenomic sample, these data should be taken the one extra step to show the coverage breadth and depth of the MTB genome provided by the reads classified as MTB, as a high number of reads doesn't mean much if they are all stacked at the same region of the genome. Given that these are simulated reads, which tend to have pretty even genome coverage, this may not show much, however it is still an important piece to show the value of your recommended method. One final comment is that it should be fairly easy to take this beyond a theoretical exercise, by running some actual real world datasets through the methods you are recommending to see how well they perform in actuality. For instance, reference #8, which you used as a basis for the composition of your simulated metagenomic sample, published their actual sequenced sputum samples. It would be easy to show if you can improve the amount of Mycobacterium extracted from their samples over the methods they used, thus showing value to those lower income/high TB burden regions where whole metagenome sequencing may be the best option they have.

      Re-review.

      This is a significantly stronger paper than originally submitted. I especially appreciate that multiple runs have now been done with more than one dataset, including a "real" dataset, and the analysis showing the breadth and depth of coverage of the retained Mtb reads, proving that you can still generally get a complete genome of a metagenomic sample with these methods. However kraken's low sensitivity when using the standard database definitely impacts the results, making a stronger argument for using a pangenome database (Kraken-Standard can identify the presence of Mtb, but if you want to do anything more with it, like AMR detection, you would need to use a pangenome database). I really think that this should be emphasized more, and perhaps some or all of the data in tables S9-S12 be brought into the main paper. It is maybe worth noting, that the significant drop in breadth, I would imagine, is a result of dividing the total size of the aligned reads by the size of the genome, implying a shallow coverage, but the reality is still high coverage in the areas that are covered, but lots of complete gaps in coverage. I did also like the switch to the somewhat more standard sensitivity/specificity metrics, though I do lament the actual FN/FP counts being relegated to the supplemental tables, as I thought these numbers valuable (or at least interesting) when comparing the results of the various pipelines, particularly with human read removal, where the various pipelines perform quite similarly.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      *The study examined the mechanisms behind the nuclear transport of capsid proteins of various flaviviruses. The study used mass spectrometry to identify the interaction partners of JEV capsid protein and found Importin 7 as the top hit. After validating this interaction with IP-western blotting, using IPO7 knock-out cells they showed that the nuclear accumulation of capsid is dependent on IPO7. Moreover, they also observed nearly 10-folds reduction in titre of virus produced from knock out cells without reduction in virus replication or particle assembly.

      The study needs improvements to bring it to publication standards. Some overaarching problems include, all capsid localization studies being done with GFP-tagged capsid, and not wild type capsid produced during authentic infection, lack of quantitation of most of the localization data and not showing capsid localization from infection experiments in knock out cells, and no in-depth analysis of the potential mechanisms behind the observed reduction in titre in knock out cells etc.

      Thank you for your constructive comments. We have sincerely answered all of them, as shown below. We hope you are satisfied with our additional data and the revised manuscript.

      The major comments are

      Fig 1B: Please add quantitation and statistical analyses of the ratio of nuclear and cytoplasmic capsid protein of all different capsids used. Also include western blot to prove that there is no cleavage between Capsid and GFP and the green signal indeed comes from the fusion protein. Ideally you should use capsid alone instead of a fusion protein for at least selected few constructs to prove that the Capsid-GFP behaves identical to Capsid alone.

      Following the reviewer’s comments, we have added quantification and statistical data in Figure 1D. We have added CBB data and western blot data in Figures 1B and S1. Because recombinant proteins of low molecular weights were artificially translocated into the nucleus through diffusion, less than 20 kDa proteins are typically used as GFP or GST fusion proteins for the IJ and PM experiments. Instead of IJ and PM experiments, we have added data on the translocation of the non-tagged core using IFA and its statistical data in Figure 1A. Although in vitro data on the translocation of capsid protein differ somewhat from IFA data, the data on nuclear translocation of core proteins are consistent across different experiments.

      Fig 1C: It is unclear from the figure legends the WT JEV capsid means GFP-Capsid or Capsid alone. You should clearly state the GFP part if the construct includes GFP. Quantitation and statistics are missing and the information on how many independent experiments were performed is also not included in the figure legend.

      Following the reviewer’s suggestion, we have described that the JEV proteins fused GFP as follows: “AcGFP-JEVCoreWT or AcGFP-JEVCoreGP/AA” (Line. 771). We added quantification and statistical analysis as shown in Figure 1E. IJ and PM experiments were performed three times independently and described in the legend of Figure 1 in the revised manuscript (Lines 773–774).

      Fig 2B: Quantitation and statistics are missing. Ideally, the data need to be reproduced with Capsid alone instead of Capsid-GFP. A positive control is needed for the activity of Bimax to prove that the drug was working in the assay.

      We have added quantitative and statistical data in the revised Figure 2B. As mentioned above, capsid alone is potentially translocated into the nucleus artificially using the IJ and PM assay. Bimax binds to importin alpha but not importin beta, specifically inhibiting the importin alpha/beta pathway. The RanGTP mutant binds to the importin beta family, including importin beta 1, and widely inhibits importin beta-dependent nuclear import. These inhibitors are well-characterized and recognized in the field. We cited the following reference: Tsujii et al., JBC, 2015.

      Fig 2C: How do you reconcile the IP mass spectrometry data that Importin b1 is the second strongest hit with the lack of IP interaction you observed in fig 2C?

      As shown in Figure 2C, importin b1 does not interact with the JEV core. Importin b1 is the most abundant member of the importin beta family. Thus, it might be a non-specific interaction between importin b1 and the JEV core. Therefore, we excluded importin b1 from further analyses. We added a sentence to explain why importin b1 was excluded on Line 145.

      Fig 3C: How many independent confirmations of this experiment was performed?

      All IJ and PM experiments were performed thrice independently. We described this in the legend of Figure 3 in the revised manuscript (Line; 794).

      Fig 4A and B: Add quantitation for the western blot. 4A-D Include data on the number of biological repetitions. 4C-D: Add quantitation and statistical analyses of the ratio of nuclear and cytoplasmic capsid protein.

      We have added quantification data, as shown in Figures 4A and 4B. All experimental results shown in Figures 4A, 4B, 4C, and 4D were performed thrice independently, as described in the legend of Figure 4 of the revised manuscript (Lines; 810-812).

      Fig 5B. This data should be shown in the context of infection with untagged Capsid at least for 1-2 viruses. This is a serious drawback of the present study as there is no clear evidence presented that the native capsid protein in an infection context depend on importin 7 for nuclear accumulation and behave similar to the GFP-Capsid constructs being used.

      Following the reviewer’s concerns, we used an un-tagged JEV and DENV core to examine core translocation in WT or IPO7KO Huh7 cells. As shown in Figures 5C and 5D and their quantitative data, nuclear translocation of JEV and DENV core protein was inhibited in IPO7KO Huh7 cells. We tested the translocation of core protein upon infection with DENV as shown in Figure 5F. Although we could not examine ZIKV infection because we could not find appropriate antibodies against the ZIKV core, these data are consistent in that nuclear translocation of flavivirus core protein largely depends on IPO7.

      Fig 5 A-D: Two repetitions are insufficient; a minimum of three biological repeats and statistical analysis need to be included. 5E-F: You cannot do statistics on two repeats, need minimum of three repeats to perform statistical analysis. 5G-H: I presume three repetitions based on the data points shown, this should be clearly stated in the figure legend.

      We repeated three independent experiments, shown in Figures 5A and 5C-5F, and indicated them on Lines 823. We have added statistical data in Figures 5B-5F. We have corrected the statement of biological repeats in Figures 6A and 6B (Lines; 843-844).

      Fig 5E-G: Taking the data of 5E and 5G together it seems Importin 7 functions as the level of particle release and not particle assembly or maturation. Have you checked for the specific infectivity of the particles released from knock out cells to determine the reason behind the reduction in virus titre? You could look at the prM maturation by furin cleavage to check it this is altered in the IPO7 knock out cells.

      We determined the ratio of infectious titer per 103 copies of viral RNA in Figure 6F. The proportion of infectious viruses targeting extracellular JEV RNA was decreased in IPO7KO cells. Simultaneously, no difference was observed in the proportion of infectious viruses targeting intracellular JEV RNA between WT and IPO7KO cells. Although we could not find appropriate antibodies against the JEV core, we checked prM expression using the DENV virus. The expression of prM was slightly increased in JEV-infected IPO7-KO Huh7 cells (Figure S3D). This result suggests that the efficiency of prM cleavage by furin was partially involved in the impairment of infectious virus release in IPO7KO Huh7 cells.

      Fig 5H: Have you checked if the observation regarding intracellular RNA levels in 5F is applicable to these viruses as well.

      We checked the intracellular RNA levels of DENV and ZIKV-infected cells. In contrast to JEV, intracellular ZIKV or DENV RNA showed no difference in IPO7-KO Huh7 cells (Figure 6H). We discuss it in Discussion section (Lines; 269-271)

      Fig 6: The figure legend "Data are representative of two (A, B) independent experiments and are presented as the mean {plus minus} SD of three independent experiments (C)" is confusing. The sentence should be reworded to state the repetitions separately for independent experiments. Fig 6C should show original titres and not percentages.

      We have corrected Figure legends according to the reviewer’s comments. We have showed the original titers in Figures 6C and 6E.

      Fig 7B: This experiment should be performed in IPO7 knock out cells to confirm that the observed reduction of core mutant is mainly contributed from its lack of interaction with IPO7 and not from any other confounding factors.

      Following the reviewer’s suggestion, we performed SRIP experiments for GP/AA mutation using IPO7KO Huh7 cells. As shown in Figure 7C, the SRIPs harboring WT core were impaired in IPO7KO Huh7 cells; no difference was observed in the SRIPs harboring GP/AA mutations in WT and IPO7KO cells. These results suggest that IPO7-dependent nuclear translocation of core protein is important for the viral release.

      Reviewer #1 (Significance (Required)): While the authors could convincingly demonstrate the interaction between capsid and IPO7, how that interaction results in the observed reduction in viral titre is largely unexplored. As all the localization data used a GFP-tagged capsid outside an infection context, this reviewer is not confident that all the reported observations will hold in an infection setting. This need to be urgently addressed to rise the confidence about the observation. The current data is insufficient to confidently attribute the change in titre to the interaction between capsid and IPO7 and the capsid localization to the nucleus. Knocking out IPO7 could have pleotropic effects independent of capsid nuclear accumulation that could lead to the observed titre reduction. This need to be addressed further before linking both these phenotypes. Certain key experiments needed to address these questions are currently missing. While the interaction of Capsid with IPO7 is certainly intriguing, the implications of this interaction on virus biology needed further investigation before clear conclusions can be drawn regarding this observation.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: In this study Itoh and colleagues investigate the mechanism, role and impact of the nuclear localization of the flavivirus core protein. The import of the core protein has long been observed and investigated and herein the authors use some novel approaches to identify potential cellular binding partners that facilitate nuclear import. Via proteomics and biochemical approaches they determine that importin-7 plays a crucial role in the import of the core protein that appears to be conserved across Flavivirus members. In general the findings and conclusions are sound but there are some significant omissions and caveats that warrant further investigation.

      Major comments: - one of the major caveats of the study is that the flavivirus NS5 protein also translocates to the nucleus in an Importin-alpha/beta dependent manner. Therefore how can the authors discount any impact of preventing NS5 import, in addition to core, on virus and SRIP replication and production. Some discussion, if not additional experiments are required here ie. NS5 localization in the KO cells during virus infection

      We examined the localization of NS5 using IPO7KO Huh7 cells. As shown in Figure S2D and S2E, we confirmed that IPO7 was not involved in the nuclear localization of NS5.

      • the localization is predominantly nucleolus rather that nucleoplasm when compared to the SV40 NLS. What are the sequence differences between the flavivirus proteins that potentially could account for this? A protein known to localize solely to the cytoplasm should also be used eg. NS1 or NS3.

      The JEV core does not contain a consensus nucleolar localization signal. Nuclear localization of NS5 depended on importin-α similar to the SV40 NLS, while flavivirus core proteins were independent of importin-α. Gly42 and Pro43 are critical amino acids for the nuclear localization of the core protein, as shown in Figures 1C and 1D. The Gly42 to Pro43 of core proteins were well-conserved in the core proteins of the Flaviviridae family.

      • controls for Figure 2? Ie. a protein known to be inhibited by Bimax but not the RanGTP mutant and vice versa.

      Bimax binds to importin alpha but not importin beta and specifically inhibits the importin alpha/beta pathway. The RanGTP mutant binds to the importin beta family, including importin beta 1, and widely inhibits importin beta-dependent nuclear import. These inhibitors are well-characterized and recognized in the field. Therefore, we have cited the following references: Tsujii et al., JBC, 2015.

      • Fig 5. Difference with WNV and DENV in nucleoplasm localization but also WNV still appeared to have Core in the nucleus in the KO cells

      We agree with the reviewer’s comment about differences in nuclear localization among the viruses using the IJ assay. We have added new data to examine the localization of the DENV core after DENV infection. Nucleolar localization of the DENV core following DENV infection was observed, as shown in Figure 5F. Therefore, differences in nucleoplasm or nucleolar localization among different viruses shown in Figure 1C and Figure 5B might be artifacts of recombinant proteins. One possibility is that the localization of core proteins using IJ assay was detected by anti-GFP antibodies. Although purified GFP-core proteins, as shown in Figure 1B and S1, were observed as a single band of fusion proteins, core proteins of WNV and DENV might be cleaved during IJ experiments, and GFP alone might be detected at nucleoplasm, as shown in Figure 5B. Because our study focused on the nuclear translocation of flavivirus core proteins, the detailed localization of each core protein in the nucleus will be studied in the future.

      • Fig 5C still has substantial JEV and DENV core but not WNV and ZIKV. Why is the DENV and WNV localization pattern different to Fig 5B?

      We appreciate the reviewer’s suggestion; we re-checked all our data presented in Figure 5B and other data shown in Figure 5B. We quantified the ratio of nuclear localization as shown in the right of Figure 5B. Our quantification data showed that the nuclear transport of all core proteins used in this study was dependent on IPO7. In contrast, Figure 5A shows that nuclear translocation of WNV core protein is partially dependent on IPO7. This discrepancy might be explained that nuclear translocation of WNV core protein might be regulated by several nuclear carriers. We described this in discussion section (Line; 250-254).

      • Fig 5F, does the KO also restrict NS5 from entering the nucleus and could this then results in increase polymerase activity confined to the cytoplasm resulting in more viral RNA?

      Following the reviewer’s suggestion, we examined NS5 localization during viral infection and plasmid transfection, as shown in Figure S2D and S2E. Previous data regarding the nuclear localization of NS5 depended on importin-α. Our data are consistent with previous reports that IPO7 was not involved in the nuclear localization of NS5. In contract to JEV, we also confirm that intracellular ZIKV or DENV RNA showed no difference in WT and IPO7-KO Huh7 cells (Figure 6H). As described in the discussion, other factors, such as antiviral factors, might be involved in IPO7-mediated nuclear transports in JEV infected cells (Line; 269-271).

      • Why was WNV infection not performed in Fig 5H? What where the viral tires compared to for the relative % values?

      Because our institution does not have a BSL3 facility, we could not use WNV. Following the reviewer’s comment, we showed viral titers in Figure 6G.

      • Fig 6B, still a significant amount of core present in the nucleolus. Also WT cells have (almost?) no cytoplasmic staining for core where this could be clearly observed in the WT cells in Fig 5D. Why the difference?

      Plasmid transfection of AcGFP-Core WT showed that almost all core proteins were located in the nucleus. We assumed that AcGFP might influence nuclear exports of core proteins or the efficiency of nuclear transports as shown in other data of in vitro experiments. However, our finding that IPO7 was involved in the nuclear transport of core proteins is consistent.

      • In Fig 7B, D and E, when were the SRIPs collected and what was the time period after subsequent infection?

      Following the reviewer’s comments, we have added more details on SRIP experiments in Materials & Methods (Line; 521-523).

      • In Fig 7C was the luciferase measured from the initial transfection and how did it correlate with RNA production? A 15-fold increase in replicon RNA actually seems quite low over a 48h period

      Because large amounts of in vitro-transcribed replicon RNA were injected into cells in this experiment, we observed that significant amounts of luciferase values were detected after 4 h. However, the 15-fold enhancement in luciferase value was consistent with previous reports (PMID: 30413742, PMID: 17024179). We have added references in the revised manuscript.

      • quantitation is required throughout all of the experimental IFA data provided

      Following reviewer comments, we have quantified all IFA data and showed their results.

      Reviewer #2 (Significance (Required)):

      The nuclear translocation of flavivirus protein has long been studied and it has been observed that the core, NS5 (RNA polymerase) and potentially the NS3 (helicase/protease) proteins all translocate the nucleus. Importin alpha and beta have been shown to facilitate this process. The authors aim to extend this to identify importin-7 as a major cellular factor enabling nuclear translocation. Overall the experiments have been performed well but there is a lack of quantitation for many of the results an suitable controls are required.

      I am a researcher in the field of flavivirus replication

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In the presented study the authors identified and mechanistically investigated how Flaviviruses including Japanese encephalitis virus (JEV), Dengue virus (DENV), and Zika virus (ZIKV) commonly use importin-7 (IPO7), an importin-β family protein, as a cellular carrier protein to facilitate nuclear core protein translocation. The authors evaluated how the production of infectious viruses is regulated by IPO7 using cellular infection models including IPO7-deficient knockout cells. In the submitted manuscript, the authors provide evidence that IPO7 facilitates viral core protein import into the nucleus of infected cells, which is essential for effective Flavivirus replication. Taken together, the study is interesting to a broader readership with interest in molecular virology, and its findings are informative for potential future targeting of IPO7 to affect flavivirus replication using small molecule drugs. The manuscript is well-written and easy to follow, the methods are appropriate, the structure is logical, and statistical analysis is adequate.

      Major comments:

      • It is unclear why the authors specifically used Ala substitution at Gly42 anb Pro43 to obtain the abolishment of nuclear core protein localization. It would be helpful to put this into more context and explain the approach.

      Mutations of Gly42 and Pro43 to Ala were previously reported and characterized by the same research group (PMID: 15731239). Following the reviewer’s comment, we have added more details of GP mutations in the text (Lines 66–70).

      • In Figure 4, the authors claim that the binding between IPO7 and RPS7 is disrupted upon the addition of RanGTPQ69L. This is not clearly evident from the pulldown experiment and should be proven experimentally with additional experiments (e.g. by using an imaging approach) to underline the statement that the binding mode of IPO7 to the JEV core protein is similar to that of RPS7. Loading controls for pulldown blots should be added.

      As described in response to the comment by reviewer#2 regarding Figure 2, the RanGTPQ69L mutant inhibits the interaction between the importin beta family, including IPO7 and its substrates, by directly binding to importin beta proteins. For the benefit of readers without knowledge of the typical Ran-dependent nuclear transport mechanism, we have described its effects with several cited references (Dickmanns et al., 1996; Tachibana et al., 2000). We referred to a study that showed that IPO7 transports RPL proteins, including RPS7 (Jäkel and Görlich, 1998). The data in Figures 4A and 4B demonstrate that adding RanGTPQ69L remarkably reduces the binding of IPO7 to the Core proteins and that the effect is more robust than that for RPS7. We believe that these results are experimentally valid, indicating that nuclear transport of Core proteins by IPO7 is achieved through a typical Ran-dependent pathway.

      • Most methods used are presented logically but require some more details so that they can be reproduced. In particular, the difference between Figure 4 E and 4H is confusing. What is the difference? Is 4E showing intracellular viral titers and 4H infectious viral titers in the supernatant of cells? Clarification needed. Put relevance of these experiments in context of the hypothesis.

      We apologize for the confusion regarding the data in Figures 5E and 5H (we assume). These data were derived from the same experiments, except for the time-course data presented in Figure 5E. We have removed Figure 5E to simplify our results.

      • Identical phenotypes induced by IPO7 knockout in a number of HuH7 clones are shown in Figures 6A to 6C. This data does not add to the overall understanding and should be moved to supplementary figures. Why are 293T cells used in experiments shown in Figure 6D and 6E? What is the relevance of kidney cells to Flavirius infections?

      Following the reviewer’s comments, we have moved Figure 6 to supplementary figures. We used 293T cells because of efficient JEV propagation and gene-deficient efficiency. We wanted to demonstrate that our data are not Huh7-dependent through experiments in 293T cells.

      • Prior studies are referenced appropriately, however, in a recent study it was demonstrated that IPO7 is stabilized upon Epstein-Barr Virus infection and that IPO7 presence is required for the survival of host cells (Yang YC, Front Microbiol. 2021 Feb 16;12:643327. doi: 10.3389/fmicb.2021.643327).

      We deeply appreciate the publications in these fields. Following the reviewer’s comment, we have cited these references.

      This important study about the physiological relevance of IPO7 during viral infections has not been cited by Itoh and colleagues in the presented study. However, the results of the uncited study are very relevant to the provided manuscript, since Itoh and colleagues are using IPO7 knockout cells to investigate its function in Flavivirus core protein nuclear import. Hence, the authors should perform cell survival and cellular fitness experiments to demonstrate that observed phenomena of reduced viral replication and virus export in IPO7 knockout cells are independent of compromised cellular fitness due to IPO7 deficiency.

      We evaluated cellular fitness between WT and IPO7KO Huh7 cells using PI (Propidium Iodide) staining through flow cytometry. As shown in Figure S2F, no differences were observed in cell viability between WT and IPO7KO Huh7 cells. It suggests that viral titers reduced in IPO7KO Huh7 cells are not involved in cellular fitness.

      Minor comments:

      • Describing Figure 3B, the authors state that they focused on IPO7 among the core binding proteins belonging to the importin-b family, because IPO7 "was identified the most peptides" in the mass spectrometry approach. This requires a more detailed explanation. Also, an explanation of why HEK293T cells were used for this approach and not HuH7 cells, as used predominately in most parts of the study, would provide more clarity to the reader.

      We focused on IPO7 because it had the highest number of detected peptides, and we found that the second most detected peptide, IPOB1, did not bind to JEV core proteins as shown in Figure 2C. Therefore, we included the lack of interaction between IPO7 and IPOB1 as part of the rationale.

      • In Figures 4E and 4F, colour coding is missing.

      We have indicated color coding in this data. Thank you for your comments.

      Reviewer #3 (Significance (Required)):

      The provided manuscript 'Importin-7-dependent nuclear localization of the Flavivirus core protein is required for infectious virus production' by Itoh and colleagues investigates a topic with important scientific relevance. The presented study builds on previous findings by the authors where they have demonstrated that Flavivirus core protein nuclear localization is actually conserved among Flaviviridae and represents a potential target for broad-range antiviral small molecule drugs (Tokunaga et al., Virology, 2020 Feb;541:41-51). However, our understanding of Flavivirus core protein nuclear localization during viral replication and how the processes could potentially be targeted using novel therapeutic drugs remains elusive. Here, the provided manuscript addresses a mechanistic investigation of how the Flavivirus core protein is actually translocated from the cytoplasm to the nucleus of infected cells. The study is informative particularly for virologists with expertise in Flavivirus replication.

      However, from my point of view as a virologist investigating host-pathogen interactions with a strong interest in clinical translational, the manuscript requires a more careful evaluation and interpretation of some results of key experiments. In addition, some of the results need to be more precisely described for clearer understanding by a broader readership.

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      Summary: In the manuscript entitled "Importin-7-dependent nuclear localization of the Flavivirus core protein is required for infectious virus production", by combining proteomics, CRISPR/Cas9 gene KO, CLSM and standard virology techniques, Yumi Itoh report novel data concerning the involvement of IPO7 in the nuclear and nucleolar localization of Flaviviridae core nuclear and nucleolar localization and viral particle release. Surprisingly, IMPa/b1 inhibition via Bimax2 does not affect core nuclear transport, whereas both RanQ69L and WGA did so. The authors try to identify the cellular transporters involved in core nuclear import, and to this end performed a MS spec analysis of JEV core interactors, which yielded IPO7 as the most likely candidate. After confirming the result by Co-IP, the authors go on showing most core proteins require IPO7 for nuclear delivery using Huh7 and HEK7 IPO7-KO cells, with the exception of WNV core which was able to partially enter the nucleus. In such cells, upon infection, extracellular (but not intracellular) viral titers were strongly reduced, a phenotype which was observed with a JEV core mutant bearing the Gly42 and Pro43 to Ala substitutions in a previous study.

      Major comments: - The major conclusions of the study are:

      1.IPO7 is the main driver of core nuclear transport 2.Core nuclear localization is somehow important for viral particle release Both conclusions are well-supported by experimental evidence.

      Methods are clear and precise, the study appears to have been produced with high quality standards, and so is the presentation of the results. A few controls however should be added to increase the reliability of the results presented here (see below)

      Since the authors attempt to link the phenotype observed on virus release upon IPO7 KO to defects on core nuclear import by making a parallelism with core GP/AA mutant, it would be important to know the behavior of such virus in Huh7 wt and Huh IPO7 KO cells. In other words, is GP/AA JEV released efficiently in Huh7 IPO7 KO cells?

      We have added new data examining the propagation of the GP/AA JEV mutant in IPO7KO Huh7 cells (Figure 6F). Our new data showed that there were no differences in the propagation of the GP/AA mutant in WT and IPO7-KO Huh7 cells.

      A similar approach can be applied to data shown in Figure 7 (effect on release on a capsid nuclear deficient mutant). This would help understand if IPO7 KO, viral release defects and core nuclear import are somehow linked.

      We produced SRIPs harboring GP/AA core using WT and IPO7KO Huh7 cells and demonstrated that the number of infectious viruses produced by WT and IPO7KO Huh7 cells was the same (Figure 7C).

      Minor comments:

      INTRODUCTION • “Flaviviruses...are mosquito-borne human pathogens" What about tick borne encephalitis virus?

      We have corrected it (Line; 43-44).

      • " replication.... occur in the endoplasmic reticulum (ER)" This sentence is a bit inaccurate. Flaviviridae RNA replication occurs in so-called viral replication factories, double membrane vesicles which are partly derived from the ER. see "PMID: 26958917".

      We have corrected this sentence according to the reviewer’s comment (Line; 60-62).

      • "it is known that some flavivirus core proteins are translocated from the cytoplasm into the nucleus" o I think the first evidence of core in the nucleus dates back to 1989, and here it might be appropriate to cite the original reference: "PMID: 2471810". o It might be worth mentioning that NS5 has also been reported in the nucleus (See "PMID: 28106839")

      We have corrected the sentence according to the reviewer’s comment (Line; 63-65).

      • "In the cytoplasm, NLS-containing proteins are recognized by importin-α " o This is true only for classical NLSs, not every NLS binds IMPa, as the authors confirm in this study! Indeed, we have also PY-NLS, IPO7 specific NLSs, IPOb1 NLSs, etc. I therefore suggest rephrasing.

      Thank you for pointing out the exact description of NLS. We agree with the reviewer’s comment that “NLS” includes all types of signal sequences, such as PY-NLS. To clearly distinguish between the CLASSICAL nuclear transport pathway by importin α/β1 and the various nuclear transport pathways by the importin β family, such as transportin, we refer to NLS as classical NLS (cNLS) in the document. We have modified the following sentence by adding “such as transportin” and “without importin-α.”

      RESULTS

      • Fig. 1. o it is not clear what is new here, with respect to what has been already published. The authors should clearly differentiate novel findings from confirmatory results

      Thank you for your suggestion. We would like to introduce our new assay using recombinant virus core proteins, as shown in Figures 1C and 1D. The data shown in Figure 1 are crucial for understanding our data in Figure 2, and we believe this figure is required for broad-ranging readers.

      Fig. 2 and 4 o Proteins whose nuclear transport is dependent on IMPa/IMPb1 (such as SV40 NLS) are lacking here

      Bimax binds to importin alpha but not to importin beta and specifically inhibits the importin alpha/beta pathway. The RanGTP mutant binds to the importin beta family, including importin beta 1, and widely inhibits importin beta-dependent nuclear import. These inhibitors are well-characterized and recognized in the field. Therefore, we have cited the following references: Tsujii et al., JBC, 2015.

      • Fig.5 o It would be important to know the effect on total virus infectivity (intracellular + extracellular) and total viral RNA. It would also be important the effect on RNA replication by using a subgenomic viral replicon (with deletion of the env gene for example). The question here is if IPO7 depletion affects to any extent viral genome replication, and this is impossible to assess in a fully assembling system. We determined the ratio of infectious titer per 103 copies of viral RNA in Figure 5D. The proportion of infectious viruses targeting extracellular JEV RNA was decreased in IPO7KO cells, and there was no difference in the proportion of infectious viruses targeting intracellular JEV RNA between WT and IPO7KO cells. We examined the effects of IPO7 on viral RNA replication of subgenomic replicon. We showed that the deficiency of IPO7 enhanced viral RNA replication as shown in Figure 7E. As described in the Discussion section, IPO7 may transport other factors possessing antiviral activity against flaviviruses. These data will be investigated in the future.

      o Panels A-F legend is missing, consider adding it?

      We have added more details to Figure 5A-5F following the reviewer’s suggestion.

      • Fig.7 o I did not completely understand how NLuc is the readout here To quantify RNA replication, we quantified Nluc values using a plate reader. We have added more details on the reporter assay in Materials and Methods (Line; 521-523).

      o Also, I do not understand if the effect of GP/AA substitution of panel B has already been reported or if it is a novel finding

      Previous reports regarding the effect of GP/AA substitution of JEV showed the impairment of infectious virus release. However, the SRIP assay was performed to examine the viral release step. Our detailed data showed that the lack of IPO7-mediated nuclear transport of core proteins impaired infectious viral release, and our new results using SRIPs harboring GP/AA core showed that the lack of nuclear transport of core proteins also impaired the release of infectious viruses. Our data strongly suggest that the lack of nuclear transport of core proteins influences the viral release.

      • All CLSM figures lack quantification (Fn/c; Fno/n)

      We have added quantitative data for IFA experiments in our revised manuscript.

      DISCUSSION

      • "The nuclear entry of viral genomic DNA has been demonstrated to involve IPO7" o It would be nice to know which viruses the authors are freeing to here

      We have added the virus name and corresponding references.

      • "While RNA viruses, including flaviviruses, are considered to replicate in the cytoplasm of mammalian cells, increasing evidence suggests nucleolar localization of the viruses " o I suspect Rawlinson did not propose the viruses localize to the nucleolus, as this sentence seems to imply. Rather, a trafficking of viral proteins to nucleoli, to manipulate cell function, is more realistic. I suggest considering rephrasing. We have corrected this sentence.

      Reviewer #4 (Significance (Required)):

      SECTION B - Significance ========================

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field. As alluded to above, this work presents several advances of current knowledge in the field of viral proteins nuclear trafficking, and in Flavivirus biology. The finding of most core proteins depending on IPO-7 is novel and intriguing, and opens the question of what makes WNV core special. Indeed, this protein nuclear targeting is only partially inhibited in IPO7 deficient cells. The fact that the authors extend their findings to several Flaviviruses adds significance. The role of nuclear core for virus release is also intriguing, but appears poorly characterized. In this respect a mechanistic explanation of the phenomenon would be highly desirable to increase the significance of the work presented here.

      In this context I would have a few suggestions:

      A) The authors performed MS spec on JEV core, this most likely resulted in a long list of "hits". However, they only report IMPb superfamily members. This is perfectly fine, since they focus at identifying partners responsible for nuclear import. However, it might be helpful for understanding the role of nuclear core. By comparing MS of wt core and GP/AA core, and or wt core in wt and IPO7KO cells, authors could identify core biding partners in the nucleus (in the nucleolus?) which are important for virus release. This could be subsequently addressed by knocking down these factors and study the effect on virus life cycle.

      We appreciate the reviewer’s valuable comments. We did not perform MS analysis on GP/AA core protein and core protein using WT or IPO7KO Hun7 cells. To report IPO7-mediated core translocation simply, we would like to cite our manuscript focusing on IPO7. To clarify the importance of nuclear transport of core protein on the viral life cycle, we will perform wide-ranging proteomics.

      1. B) Further, the authors should try to address the role of core in the nucleus (and nucleolus). Does it interact with cellular/nucleolar proteins? Does it deliver viral RNA to sites of assembly? Does it interfere with rRNA synthesis? All these findings would be easily obtainable using the GP/AA virus and/or Huh7 KO cells, and tremendously increase the impact of the study, which at the moment is limited at points 1 and 2 in the first section of the current report.

      Thank you for your valuable comments. We agree that we should clarify the roles of the nucleus or nucleolar localization of the core protein. We tested the effects of rRNA synthesis on JEV core expression. Our data showed that core protein expression slightly impaired the maturation of rRNA synthesis, as shown here. However, the core expression did not influence protein translation. We focused on the phase separation capacity of core protein localized in the nucleolar or nucleus. From our accumulating data, we hypothesized that the acquisition of phase separation capacity of core protein might be involved in an efficient virus release step. We hope that these data will be reported in the near future.

      Overall, this work should be interesting for both cell biologists interested in trafficking of viral proteins, and virologists interested in virus-host interactions. The antiviral approach at the moment is a bit less convincing, but the manuscript might be interesting for scientists trying to develop new antiviral strategies. (In this context it might be worth reading and possible discussing the very recent paper from the Bartenschlager group "PMID: 37702492." Also, I think that it would be worth discussing the recent discovery that a closely related virus belonging to the Hepacivirus genus within the Flaviviridae family, mediated re-localization of Nups to viral replication factories, where they are believed to control access to DMVs interior, thereby regulating virus replication and assembly. Could the core IPO7-interaction have any role in core delivery to DMVs? See "PMID: 26150811".

      Thank you for your valuable comments. We have added several sentences in the Discussion section (Line; 297-305). We will investigate the role of nuclear transports in viral life cycles in the future.

      Since I am a molecular virologist studying viral nucleocytoplasmic trafficking, virus-host interactions, and antiviral drug-discovery I think I have sufficient expertise for an informative and helpful revision of this work.

    1. One study participant said, “I will work until everything is done and everything is beautiful and wonderful …  And if I have to not sleep for three days to do that, that's what happens.” For students with autism, this inclination could manifest as putting in extra effort to make eye contact when speaking with another person, even if it makes them uncomfortable. This pressure to take on more and hide parts of themselves can lead to burnout and have negative impacts on mental health.

      I chose this section because it’s extremely relatable to me as a STEM student who’s spent countless nights up staying up completing assignments or studying. Most community college STEM students are like me and want the perfect 4.0 to transfer to a nice University, so we all get our assignments to look “beautiful and wonderful” to get a perfect grade on it. This article is important to me because I’m more aware of neurodivergent students and the challenges they face, with my new knowledge I’ll be able to be more supportive towards my fellow future STEM classmates. I also think this article is important for STEM students because some of them could be neurodivergent students and this article could help them manage challenges they may have like work balancing. Now I’m not personally a neurodivergent student but I’ve noticed tons of the same traits are relatable to me which is interesting, overall I’m glad I chose this article because it’s really informative and helps you gather information about other people's struggles.

  2. May 2024
    1. Neurodivergent students see their neurotypical peers as the “ideal” students, which can lead to negative self-judgment—telling oneself, for example, “I don’t do things the way I’m expected to, so there’s something wrong with me,” Syharat says. They often have challenges in areas in which they feel they are expected to excel, so they may struggle to feel that they belong.

      i have chosen this quote because not only can i personally relate to it, but i know many of my friends could relate to it too. when talking to a friend that i share a class with, we often wonder why everyone else understands the concepts much easier even though we are doing everything we can to try and understand. its very frustrating and often times makes me think i'm missing something, like everyone automatically knows what to do and im behind somehow, which all in all can lower confidence in classwork. so i could definitely relate to this paragraph. i think this article connects the idea of designing for equity and inclusion by giving a voice to the fact that neurodivergent students have felt this way for a long time and gave examples of ways they have had to adapt to the "ideal students" world.

    1. According to all known laws of aviation,

      there is no way a bee should be able to fly.

      Its wings are too small to get its fat little body off the ground.

      The bee, of course, flies anyway

      because bees don't care what humans think is impossible.

      Yellow, black. Yellow, black. Yellow, black. Yellow, black.

      Ooh, black and yellow! Let's shake it up a little.

      Barry! Breakfast is ready!

      Ooming!

      Hang on a second.

      Hello?

      Barry?

      Adam?

      Oan you believe this is happening?

      I can't. I'll pick you up.

      Looking sharp.

      Use the stairs. Your father paid good money for those.

      Sorry. I'm excited.

      Here's the graduate. We're very proud of you, son.

      A perfect report card, all B's.

      Very proud.

      Ma! I got a thing going here.

      You got lint on your fuzz.

      Ow! That's me!

      Wave to us! We'll be in row 118,000.

      Bye!

      Barry, I told you, stop flying in the house!

      Hey, Adam.

      Hey, Barry.

      Is that fuzz gel?

      A little. Special day, graduation.

      Never thought I'd make it.

      Three days grade school, three days high school.

      Those were awkward.

      Three days college. I'm glad I took a day and hitchhiked around the hive.

      You did come back different.

      Hi, Barry.

      Artie, growing a mustache? Looks good.

      Hear about Frankie?

      Yeah.

      You going to the funeral?

      No, I'm not going.

      Everybody knows, sting someone, you die.

      Don't waste it on a squirrel. Such a hothead.

      I guess he could have just gotten out of the way.

      I love this incorporating an amusement park into our day.

      That's why we don't need vacations.

      Boy, quite a bit of pomp… under the circumstances.

      Well, Adam, today we are men.

      We are!

      Bee-men.

      Amen!

      Hallelujah!

      Students, faculty, distinguished bees,

      please welcome Dean Buzzwell.

      Welcome, New Hive Oity graduating class of…

      …9:15.

      That concludes our ceremonies.

      And begins your career at Honex Industries!

      Will we pick ourjob today?

      I heard it's just orientation.

      Heads up! Here we go.

      Keep your hands and antennas inside the tram at all times.

      Wonder what it'll be like? A little scary. Welcome to Honex, a division of Honesco

      and a part of the Hexagon Group.

      This is it!

      Wow.

      Wow.

      We know that you, as a bee, have worked your whole life

      to get to the point where you can work for your whole life.

      Honey begins when our valiant Pollen Jocks bring the nectar to the hive.

      Our top-secret formula

      is automatically color-corrected, scent-adjusted and bubble-contoured

      into this soothing sweet syrup

      with its distinctive golden glow you know as…

      Honey!

      That girl was hot.

      She's my cousin!

      She is?

      Yes, we're all cousins.

      Right. You're right.

      At Honex, we constantly strive

      to improve every aspect of bee existence.

      These bees are stress-testing a new helmet technology.

      What do you think he makes? Not enough. Here we have our latest advancement, the Krelman.

      What does that do? Oatches that little strand of honey that hangs after you pour it. Saves us millions.

      Oan anyone work on the Krelman?

      Of course. Most bee jobs are small ones. But bees know

      that every small job, if it's done well, means a lot.

      But choose carefully

      because you'll stay in the job you pick for the rest of your life.

      The same job the rest of your life? I didn't know that.

      What's the difference?

      You'll be happy to know that bees, as a species, haven't had one day off

      in 27 million years.

      So you'll just work us to death?

      We'll sure try.

      Wow! That blew my mind!

      "What's the difference?" How can you say that?

      One job forever? That's an insane choice to have to make.

      I'm relieved. Now we only have to make one decision in life.

      But, Adam, how could they never have told us that?

      Why would you question anything? We're bees.

      We're the most perfectly functioning society on Earth.

      You ever think maybe things work a little too well here?

      Like what? Give me one example.

      I don't know. But you know what I'm talking about.

      Please clear the gate. Royal Nectar Force on approach.

      Wait a second. Oheck it out.

      Hey, those are Pollen Jocks! Wow. I've never seen them this close.

      They know what it's like outside the hive.

      Yeah, but some don't come back.

      Hey, Jocks! Hi, Jocks! You guys did great!

      You're monsters! You're sky freaks! I love it! I love it!

      I wonder where they were. I don't know. Their day's not planned.

      Outside the hive, flying who knows where, doing who knows what.

      You can'tjust decide to be a Pollen Jock. You have to be bred for that.

      Right.

      Look. That's more pollen than you and I will see in a lifetime.

      It's just a status symbol. Bees make too much of it.

      Perhaps. Unless you're wearing it and the ladies see you wearing it.

      Those ladies? Aren't they our cousins too?

      Distant. Distant.

      Look at these two.

      Oouple of Hive Harrys. Let's have fun with them. It must be dangerous being a Pollen Jock.

      Yeah. Once a bear pinned me against a mushroom!

      He had a paw on my throat, and with the other, he was slapping me!

      Oh, my! I never thought I'd knock him out. What were you doing during this?

      Trying to alert the authorities.

      I can autograph that.

      A little gusty out there today, wasn't it, comrades?

      Yeah. Gusty.

      We're hitting a sunflower patch six miles from here tomorrow.

      Six miles, huh? Barry! A puddle jump for us, but maybe you're not up for it.

      Maybe I am. You are not! We're going 0900 at J-Gate.

      What do you think, buzzy-boy? Are you bee enough?

      I might be. It all depends on what 0900 means.

      Hey, Honex!

      Dad, you surprised me.

      You decide what you're interested in?

      Well, there's a lot of choices. But you only get one. Do you ever get bored doing the same job every day?

      Son, let me tell you about stirring.

      You grab that stick, and you just move it around, and you stir it around.

      You get yourself into a rhythm. It's a beautiful thing.

      You know, Dad, the more I think about it,

      maybe the honey field just isn't right for me.

      You were thinking of what, making balloon animals?

      That's a bad job for a guy with a stinger.

      Janet, your son's not sure he wants to go into honey!

      Barry, you are so funny sometimes. I'm not trying to be funny. You're not funny! You're going into honey. Our son, the stirrer!

      You're gonna be a stirrer? No one's listening to me! Wait till you see the sticks I have.

      I could say anything right now. I'm gonna get an ant tattoo!

      Let's open some honey and celebrate!

      Maybe I'll pierce my thorax. Shave my antennae.

      Shack up with a grasshopper. Get a gold tooth and call everybody "dawg"!

      I'm so proud.

      We're starting work today! Today's the day. Oome on! All the good jobs will be gone.

      Yeah, right.

      Pollen counting, stunt bee, pouring, stirrer, front desk, hair removal…

      Is it still available? Hang on. Two left! One of them's yours! Oongratulations! Step to the side.

      What'd you get? Picking crud out. Stellar! Wow!

      Oouple of newbies?

      Yes, sir! Our first day! We are ready!

      Make your choice.

      You want to go first? No, you go. Oh, my. What's available?

      Restroom attendant's open, not for the reason you think.

      Any chance of getting the Krelman? Sure, you're on. I'm sorry, the Krelman just closed out.

      Wax monkey's always open.

      The Krelman opened up again.

      What happened?

      A bee died. Makes an opening. See? He's dead. Another dead one.

      Deady. Deadified. Two more dead.

      Dead from the neck up. Dead from the neck down. That's life!

      Oh, this is so hard!

      Heating, cooling, stunt bee, pourer, stirrer,

      humming, inspector number seven, lint coordinator, stripe supervisor,

      mite wrangler. Barry, what do you think I should… Barry?

      Barry!

      All right, we've got the sunflower patch in quadrant nine…

      What happened to you? Where are you?

      I'm going out.

      Out? Out where?

      Out there.

      Oh, no!

      I have to, before I go to work for the rest of my life.

      You're gonna die! You're crazy! Hello?

      Another call coming in.

      If anyone's feeling brave, there's a Korean deli on 83rd

      that gets their roses today.

      Hey, guys.

      Look at that. Isn't that the kid we saw yesterday? Hold it, son, flight deck's restricted.

      It's OK, Lou. We're gonna take him up.

      Really? Feeling lucky, are you?

      Sign here, here. Just initial that.

      Thank you. OK. You got a rain advisory today,

      and as you all know, bees cannot fly in rain.

      So be careful. As always, watch your brooms,

      hockey sticks, dogs, birds, bears and bats.

      Also, I got a couple of reports of root beer being poured on us.

      Murphy's in a home because of it, babbling like a cicada!

      That's awful. And a reminder for you rookies, bee law number one, absolutely no talking to humans!

      All right, launch positions!

      Buzz, buzz, buzz, buzz! Buzz, buzz, buzz, buzz! Buzz, buzz, buzz, buzz!

      Black and yellow!

      Hello!

      You ready for this, hot shot?

      Yeah. Yeah, bring it on.

      Wind, check.

      Antennae, check.

      Nectar pack, check.

      Wings, check.

      Stinger, check.

      Scared out of my shorts, check.

      OK, ladies,

      let's move it out!

      Pound those petunias, you striped stem-suckers!

      All of you, drain those flowers!

      Wow! I'm out!

      I can't believe I'm out!

      So blue.

      I feel so fast and free!

      Box kite!

      Wow!

      Flowers!

      This is Blue Leader. We have roses visual.

      Bring it around 30 degrees and hold.

      Roses!

      30 degrees, roger. Bringing it around.

      Stand to the side, kid. It's got a bit of a kick.

      That is one nectar collector!

      Ever see pollination up close? No, sir. I pick up some pollen here, sprinkle it over here. Maybe a dash over there,

      a pinch on that one. See that? It's a little bit of magic.

      That's amazing. Why do we do that?

      That's pollen power. More pollen, more flowers, more nectar, more honey for us.

      Oool.

      I'm picking up a lot of bright yellow. Oould be daisies. Don't we need those?

      Oopy that visual.

      Wait. One of these flowers seems to be on the move.

      Say again? You're reporting a moving flower?

      Affirmative.

      That was on the line!

      This is the coolest. What is it?

      I don't know, but I'm loving this color.

      It smells good. Not like a flower, but I like it.

      Yeah, fuzzy.

      Ohemical-y.

      Oareful, guys. It's a little grabby.

      My sweet lord of bees!

      Oandy-brain, get off there!

      Problem!

      Guys! This could be bad. Affirmative.

      Very close.

      Gonna hurt.

      Mama's little boy.

      You are way out of position, rookie!

      Ooming in at you like a missile!

      Help me!

      I don't think these are flowers.

      Should we tell him? I think he knows. What is this?!

      Match point!

      You can start packing up, honey, because you're about to eat it!

      Yowser!

      Gross.

      There's a bee in the car!

      Do something!

      I'm driving!

      Hi, bee.

      He's back here!

      He's going to sting me!

      Nobody move. If you don't move, he won't sting you. Freeze!

      He blinked!

      Spray him, Granny!

      What are you doing?!

      Wow… the tension level out here is unbelievable.

      I gotta get home.

      Oan't fly in rain.

      Oan't fly in rain.

      Oan't fly in rain.

      Mayday! Mayday! Bee going down!

      Ken, could you close the window please?

      Ken, could you close the window please?

      Oheck out my new resume. I made it into a fold-out brochure.

      You see? Folds out.

      Oh, no. More humans. I don't need this.

      What was that?

      Maybe this time. This time. This time. This time! This time! This…

      Drapes!

      That is diabolical.

      It's fantastic. It's got all my special skills, even my top-ten favorite movies.

      What's number one? Star Wars?

      Nah, I don't go for that…

      …kind of stuff.

      No wonder we shouldn't talk to them. They're out of their minds.

      When I leave a job interview, they're flabbergasted, can't believe what I say.

      There's the sun. Maybe that's a way out.

      I don't remember the sun having a big 75 on it.

      I predicted global warming.

      I could feel it getting hotter. At first I thought it was just me.

      Wait! Stop! Bee!

      Stand back. These are winter boots.

      Wait!

      Don't kill him!

      You know I'm allergic to them! This thing could kill me!

      Why does his life have less value than yours?

      Why does his life have any less value than mine? Is that your statement?

      I'm just saying all life has value. You don't know what he's capable of feeling.

      My brochure!

      There you go, little guy.

      I'm not scared of him. It's an allergic thing.

      Put that on your resume brochure.

      My whole face could puff up.

      Make it one of your special skills.

      Knocking someone out is also a special skill.

      Right. Bye, Vanessa. Thanks.

      Vanessa, next week? Yogurt night?

      Sure, Ken. You know, whatever.

      You could put carob chips on there.

      Bye.

      Supposed to be less calories.

      Bye.

      I gotta say something.

      She saved my life. I gotta say something.

      All right, here it goes.

      Nah.

      What would I say?

      I could really get in trouble.

      It's a bee law. You're not supposed to talk to a human.

      I can't believe I'm doing this.

      I've got to.

      Oh, I can't do it. Oome on!

      No. Yes. No.

      Do it. I can't.

      How should I start it? "You like jazz?" No, that's no good.

      Here she comes! Speak, you fool!

      Hi!

      I'm sorry.

      You're talking. Yes, I know. You're talking!

      I'm so sorry.

      No, it's OK. It's fine. I know I'm dreaming.

      But I don't recall going to bed.

      Well, I'm sure this is very disconcerting.

      This is a bit of a surprise to me. I mean, you're a bee!

      I am. And I'm not supposed to be doing this,

      but they were all trying to kill me.

      And if it wasn't for you…

      I had to thank you. It's just how I was raised.

      That was a little weird.

      I'm talking with a bee. Yeah. I'm talking to a bee. And the bee is talking to me!

      I just want to say I'm grateful. I'll leave now.

      Wait! How did you learn to do that? What? The talking thing.

      Same way you did, I guess. "Mama, Dada, honey." You pick it up.

      That's very funny. Yeah. Bees are funny. If we didn't laugh, we'd cry with what we have to deal with.

      Anyway…

      Oan I…

      …get you something?

      Like what? I don't know. I mean… I don't know. Ooffee?

      I don't want to put you out.

      It's no trouble. It takes two minutes.

      It's just coffee.

      I hate to impose.

      Don't be ridiculous!

      Actually, I would love a cup.

      Hey, you want rum cake?

      I shouldn't.

      Have some.

      No, I can't.

      Oome on!

      I'm trying to lose a couple micrograms.

      Where? These stripes don't help. You look great!

      I don't know if you know anything about fashion.

      Are you all right?

      No.

      He's making the tie in the cab as they're flying up Madison.

      He finally gets there.

      He runs up the steps into the church. The wedding is on.

      And he says, "Watermelon? I thought you said Guatemalan.

      Why would I marry a watermelon?"

      Is that a bee joke?

      That's the kind of stuff we do.

      Yeah, different.

      So, what are you gonna do, Barry?

      About work? I don't know.

      I want to do my part for the hive, but I can't do it the way they want.

      I know how you feel.

      You do? Sure. My parents wanted me to be a lawyer or a doctor, but I wanted to be a florist.

      Really? My only interest is flowers. Our new queen was just elected with that same campaign slogan.

      Anyway, if you look…

      There's my hive right there. See it?

      You're in Sheep Meadow!

      Yes! I'm right off the Turtle Pond!

      No way! I know that area. I lost a toe ring there once.

      Why do girls put rings on their toes?

      Why not?

      It's like putting a hat on your knee.

      Maybe I'll try that.

      You all right, ma'am?

      Oh, yeah. Fine.

      Just having two cups of coffee!

      Anyway, this has been great. Thanks for the coffee.

      Yeah, it's no trouble.

      Sorry I couldn't finish it. If I did, I'd be up the rest of my life.

      Are you…?

      Oan I take a piece of this with me?

      Sure! Here, have a crumb.

      Thanks! Yeah. All right. Well, then… I guess I'll see you around.

      Or not.

      OK, Barry.

      And thank you so much again… for before.

      Oh, that? That was nothing.

      Well, not nothing, but… Anyway…

      This can't possibly work.

      He's all set to go. We may as well try it.

      OK, Dave, pull the chute.

      Sounds amazing. It was amazing! It was the scariest, happiest moment of my life.

      Humans! I can't believe you were with humans!

      Giant, scary humans! What were they like?

      Huge and crazy. They talk crazy.

      They eat crazy giant things. They drive crazy.

      Do they try and kill you, like on TV?

      Some of them. But some of them don't.

      How'd you get back?

      Poodle.

      You did it, and I'm glad. You saw whatever you wanted to see.

      You had your "experience." Now you can pick out yourjob and be normal.

      Well… Well? Well, I met someone.

      You did? Was she Bee-ish?

      A wasp?! Your parents will kill you!

      No, no, no, not a wasp.

      Spider?

      I'm not attracted to spiders.

      I know it's the hottest thing, with the eight legs and all.

      I can't get by that face.

      So who is she?

      She's… human.

      No, no. That's a bee law. You wouldn't break a bee law.

      Her name's Vanessa. Oh, boy. She's so nice. And she's a florist!

      Oh, no! You're dating a human florist!

      We're not dating.

      You're flying outside the hive, talking to humans that attack our homes

      with power washers and M-80s! One-eighth a stick of dynamite!

      She saved my life! And she understands me.

      This is over!

      Eat this.

      This is not over! What was that?

      They call it a crumb. It was so stingin' stripey! And that's not what they eat. That's what falls off what they eat!

      You know what a Oinnabon is? No. It's bread and cinnamon and frosting. They heat it up…

      Sit down!

      …really hot!

      Listen to me! We are not them! We're us. There's us and there's them!

      Yes, but who can deny the heart that is yearning?

      There's no yearning. Stop yearning. Listen to me!

      You have got to start thinking bee, my friend. Thinking bee!

      Thinking bee. Thinking bee. Thinking bee! Thinking bee! Thinking bee! Thinking bee!

      There he is. He's in the pool.

      You know what your problem is, Barry?

      I gotta start thinking bee?

      How much longer will this go on?

      It's been three days! Why aren't you working?

      I've got a lot of big life decisions to think about.

      What life? You have no life! You have no job. You're barely a bee!

      Would it kill you to make a little honey?

      Barry, come out. Your father's talking to you.

      Martin, would you talk to him?

      Barry, I'm talking to you!

      You coming?

      Got everything?

      All set!

      Go ahead. I'll catch up.

      Don't be too long.

      Watch this!

      Vanessa!

      We're still here. I told you not to yell at him. He doesn't respond to yelling!

      Then why yell at me? Because you don't listen! I'm not listening to this.

      Sorry, I've gotta go.

      Where are you going? I'm meeting a friend. A girl? Is this why you can't decide?

      Bye.

      I just hope she's Bee-ish.

      They have a huge parade of flowers every year in Pasadena?

      To be in the Tournament of Roses, that's every florist's dream!

      Up on a float, surrounded by flowers, crowds cheering.

      A tournament. Do the roses compete in athletic events?

      No. All right, I've got one. How come you don't fly everywhere?

      It's exhausting. Why don't you run everywhere? It's faster.

      Yeah, OK, I see, I see. All right, your turn.

      TiVo. You can just freeze live TV? That's insane!

      You don't have that?

      We have Hivo, but it's a disease. It's a horrible, horrible disease.

      Oh, my.

      Dumb bees!

      You must want to sting all those jerks.

      We try not to sting. It's usually fatal for us.

      So you have to watch your temper.

      Very carefully. You kick a wall, take a walk,

      write an angry letter and throw it out. Work through it like any emotion:

      Anger, jealousy, lust.

      Oh, my goodness! Are you OK?

      Yeah.

      What is wrong with you?! It's a bug. He's not bothering anybody. Get out of here, you creep!

      What was that? A Pic 'N' Save circular?

      Yeah, it was. How did you know?

      It felt like about 10 pages. Seventy-five is pretty much our limit.

      You've really got that down to a science.

      I lost a cousin to Italian Vogue. I'll bet. What in the name of Mighty Hercules is this?

      How did this get here? Oute Bee, Golden Blossom,

      Ray Liotta Private Select?

      Is he that actor?

      I never heard of him.

      Why is this here?

      For people. We eat it.

      You don't have enough food of your own?

      Well, yes.

      How do you get it?

      Bees make it.

      I know who makes it!

      And it's hard to make it!

      There's heating, cooling, stirring. You need a whole Krelman thing!

      It's organic. It's our-ganic! It's just honey, Barry.

      Just what?!

      Bees don't know about this! This is stealing! A lot of stealing!

      You've taken our homes, schools, hospitals! This is all we have!

      And it's on sale?! I'm getting to the bottom of this.

      I'm getting to the bottom of all of this!

      Hey, Hector.

      You almost done? Almost. He is here. I sense it.

      Well, I guess I'll go home now

      and just leave this nice honey out, with no one around.

      You're busted, box boy!

      I knew I heard something. So you can talk!

      I can talk. And now you'll start talking!

      Where you getting the sweet stuff? Who's your supplier?

      I don't understand. I thought we were friends.

      The last thing we want to do is upset bees!

      You're too late! It's ours now!

      You, sir, have crossed the wrong sword!

      You, sir, will be lunch for my iguana, Ignacio!

      Where is the honey coming from?

      Tell me where!

      Honey Farms! It comes from Honey Farms!

      Orazy person!

      What horrible thing has happened here?

      These faces, they never knew what hit them. And now

      they're on the road to nowhere!

      Just keep still.

      What? You're not dead?

      Do I look dead? They will wipe anything that moves. Where you headed?

      To Honey Farms. I am onto something huge here.

      I'm going to Alaska. Moose blood, crazy stuff. Blows your head off!

      I'm going to Tacoma.

      And you? He really is dead. All right.

      Uh-oh!

      What is that?!

      Oh, no!

      A wiper! Triple blade!

      Triple blade?

      Jump on! It's your only chance, bee!

      Why does everything have to be so doggone clean?!

      How much do you people need to see?!

      Open your eyes! Stick your head out the window!

      From NPR News in Washington, I'm Oarl Kasell.

      But don't kill no more bugs!

      Bee!

      Moose blood guy!!

      You hear something?

      Like what?

      Like tiny screaming.

      Turn off the radio.

      Whassup, bee boy?

      Hey, Blood.

      Just a row of honey jars, as far as the eye could see.

      Wow!

      I assume wherever this truck goes is where they're getting it.

      I mean, that honey's ours.

      Bees hang tight. We're all jammed in. It's a close community.

      Not us, man. We on our own. Every mosquito on his own.

      What if you get in trouble? You a mosquito, you in trouble. Nobody likes us. They just smack. See a mosquito, smack, smack!

      At least you're out in the world. You must meet girls.

      Mosquito girls try to trade up, get with a moth, dragonfly.

      Mosquito girl don't want no mosquito.

      You got to be kidding me!

      Mooseblood's about to leave the building! So long, bee!

      Hey, guys! Mooseblood! I knew I'd catch y'all down here. Did you bring your crazy straw?

      We throw it in jars, slap a label on it, and it's pretty much pure profit.

      What is this place?

      A bee's got a brain the size of a pinhead.

      They are pinheads!

      Pinhead.

      Oheck out the new smoker. Oh, sweet. That's the one you want. The Thomas 3000!

      Smoker?

      Ninety puffs a minute, semi-automatic. Twice the nicotine, all the tar.

      A couple breaths of this knocks them right out.

      They make the honey, and we make the money.

      "They make the honey, and we make the money"?

      Oh, my!

      What's going on? Are you OK?

      Yeah. It doesn't last too long.

      Do you know you're in a fake hive with fake walls?

      Our queen was moved here. We had no choice.

      This is your queen? That's a man in women's clothes!

      That's a drag queen!

      What is this?

      Oh, no!

      There's hundreds of them!

      Bee honey.

      Our honey is being brazenly stolen on a massive scale!

      This is worse than anything bears have done! I intend to do something.

      Oh, Barry, stop.

      Who told you humans are taking our honey? That's a rumor.

      Do these look like rumors?

      That's a conspiracy theory. These are obviously doctored photos.

      How did you get mixed up in this?

      He's been talking to humans.

      What? Talking to humans?! He has a human girlfriend. And they make out!

      Make out? Barry!

      We do not.

      You wish you could. Whose side are you on? The bees!

      I dated a cricket once in San Antonio. Those crazy legs kept me up all night.

      Barry, this is what you want to do with your life?

      I want to do it for all our lives. Nobody works harder than bees!

      Dad, I remember you coming home so overworked

      your hands were still stirring. You couldn't stop.

      I remember that.

      What right do they have to our honey?

      We live on two cups a year. They put it in lip balm for no reason whatsoever!

      Even if it's true, what can one bee do?

      Sting them where it really hurts.

      In the face! The eye!

      That would hurt. No. Up the nose? That's a killer.

      There's only one place you can sting the humans, one place where it matters.

      Hive at Five, the hive's only full-hour action news source.

      No more bee beards!

      With Bob Bumble at the anchor desk.

      Weather with Storm Stinger.

      Sports with Buzz Larvi.

      And Jeanette Ohung.

      Good evening. I'm Bob Bumble. And I'm Jeanette Ohung. A tri-county bee, Barry Benson,

      intends to sue the human race for stealing our honey,

      packaging it and profiting from it illegally!

      Tomorrow night on Bee Larry King,

      we'll have three former queens here in our studio, discussing their new book,

      Olassy Ladies, out this week on Hexagon.

      Tonight we're talking to Barry Benson.

      Did you ever think, "I'm a kid from the hive. I can't do this"?

      Bees have never been afraid to change the world.

      What about Bee Oolumbus? Bee Gandhi? Bejesus?

      Where I'm from, we'd never sue humans.

      We were thinking of stickball or candy stores.

      How old are you?

      The bee community is supporting you in this case,

      which will be the trial of the bee century.

      You know, they have a Larry King in the human world too.

      It's a common name. Next week…

      He looks like you and has a show and suspenders and colored dots…

      Next week…

      Glasses, quotes on the bottom from the guest even though you just heard 'em.

      Bear Week next week! They're scary, hairy and here live.

      Always leans forward, pointy shoulders, squinty eyes, very Jewish.

      In tennis, you attack at the point of weakness!

      It was my grandmother, Ken. She's 81.

      Honey, her backhand's a joke! I'm not gonna take advantage of that?

      Quiet, please. Actual work going on here.

      Is that that same bee? Yes, it is! I'm helping him sue the human race.

      Hello. Hello, bee. This is Ken.

      Yeah, I remember you. Timberland, size ten and a half. Vibram sole, I believe.

      Why does he talk again?

      Listen, you better go 'cause we're really busy working.

      But it's our yogurt night!

      Bye-bye.

      Why is yogurt night so difficult?!

      You poor thing. You two have been at this for hours!

      Yes, and Adam here has been a huge help.

      Frosting… How many sugars? Just one. I try not to use the competition.

      So why are you helping me?

      Bees have good qualities.

      And it takes my mind off the shop.

      Instead of flowers, people are giving balloon bouquets now.

      Those are great, if you're three.

      And artificial flowers.

      Oh, those just get me psychotic! Yeah, me too. Bent stingers, pointless pollination.

      Bees must hate those fake things!

      Nothing worse than a daffodil that's had work done.

      Maybe this could make up for it a little bit.

      This lawsuit's a pretty big deal. I guess. You sure you want to go through with it?

      Am I sure? When I'm done with the humans, they won't be able

      to say, "Honey, I'm home," without paying a royalty!

      It's an incredible scene here in downtown Manhattan,

      where the world anxiously waits, because for the first time in history,

      we will hear for ourselves if a honeybee can actually speak.

      What have we gotten into here, Barry?

      It's pretty big, isn't it?

      I can't believe how many humans don't work during the day.

      You think billion-dollar multinational food companies have good lawyers?

      Everybody needs to stay behind the barricade.

      What's the matter? I don't know, I just got a chill. Well, if it isn't the bee team.

      You boys work on this?

      All rise! The Honorable Judge Bumbleton presiding.

      All right. Oase number 4475,

      Superior Oourt of New York, Barry Bee Benson v. the Honey Industry

      is now in session.

      Mr. Montgomery, you're representing the five food companies collectively?

      A privilege.

      Mr. Benson… you're representing all the bees of the world?

      I'm kidding. Yes, Your Honor, we're ready to proceed.

      Mr. Montgomery, your opening statement, please.

      Ladies and gentlemen of the jury,

      my grandmother was a simple woman.

      Born on a farm, she believed it was man's divine right

      to benefit from the bounty of nature God put before us.

      If we lived in the topsy-turvy world Mr. Benson imagines,

      just think of what would it mean.

      I would have to negotiate with the silkworm

      for the elastic in my britches!

      Talking bee!

      How do we know this isn't some sort of

      holographic motion-picture-capture Hollywood wizardry?

      They could be using laser beams!

      Robotics! Ventriloquism! Oloning! For all we know,

      he could be on steroids!

      Mr. Benson?

      Ladies and gentlemen, there's no trickery here.

      I'm just an ordinary bee. Honey's pretty important to me.

      It's important to all bees. We invented it!

      We make it. And we protect it with our lives.

      Unfortunately, there are some people in this room

      who think they can take it from us

      'cause we're the little guys! I'm hoping that, after this is all over,

      you'll see how, by taking our honey, you not only take everything we have

      but everything we are!

      I wish he'd dress like that all the time. So nice!

      Oall your first witness.

      So, Mr. Klauss Vanderhayden of Honey Farms, big company you have.

      I suppose so.

      I see you also own Honeyburton and Honron!

      Yes, they provide beekeepers for our farms.

      Beekeeper. I find that to be a very disturbing term.

      I don't imagine you employ any bee-free-ers, do you?

      No.

      I couldn't hear you.

      No.

      No.

      Because you don't free bees. You keep bees. Not only that,

      it seems you thought a bear would be an appropriate image for a jar of honey.

      They're very lovable creatures.

      Yogi Bear, Fozzie Bear, Build-A-Bear.

      You mean like this?

      Bears kill bees!

      How'd you like his head crashing through your living room?!

      Biting into your couch! Spitting out your throw pillows!

      OK, that's enough. Take him away.

      So, Mr. Sting, thank you for being here. Your name intrigues me.

      Where have I heard it before? I was with a band called The Police. But you've never been a police officer, have you?

      No, I haven't.

      No, you haven't. And so here we have yet another example

      of bee culture casually stolen by a human

      for nothing more than a prance-about stage name.

      Oh, please.

      Have you ever been stung, Mr. Sting?

      Because I'm feeling a little stung, Sting.

      Or should I say… Mr. Gordon M. Sumner!

      That's not his real name?! You idiots!

      Mr. Liotta, first, belated congratulations on

      your Emmy win for a guest spot on ER in 2005.

      Thank you. Thank you.

      I see from your resume that you're devilishly handsome

      with a churning inner turmoil that's ready to blow.

      I enjoy what I do. Is that a crime?

      Not yet it isn't. But is this what it's come to for you?

      Exploiting tiny, helpless bees so you don't

      have to rehearse your part and learn your lines, sir?

      Watch it, Benson! I could blow right now!

      This isn't a goodfella. This is a badfella!

      Why doesn't someone just step on this creep, and we can all go home?!

      Order in this court! You're all thinking it! Order! Order, I say!

      Say it! Mr. Liotta, please sit down! I think it was awfully nice of that bear to pitch in like that.

      I think the jury's on our side.

      Are we doing everything right, legally?

      I'm a florist.

      Right. Well, here's to a great team.

      To a great team!

      Well, hello.

      Ken! Hello. I didn't think you were coming.

      No, I was just late. I tried to call, but… the battery.

      I didn't want all this to go to waste, so I called Barry. Luckily, he was free.

      Oh, that was lucky.

      There's a little left. I could heat it up.

      Yeah, heat it up, sure, whatever.

      So I hear you're quite a tennis player.

      I'm not much for the game myself. The ball's a little grabby.

      That's where I usually sit. Right… there.

      Ken, Barry was looking at your resume,

      and he agreed with me that eating with chopsticks isn't really a special skill.

      You think I don't see what you're doing?

      I know how hard it is to find the rightjob. We have that in common.

      Do we?

      Bees have 100 percent employment, but we do jobs like taking the crud out.

      That's just what I was thinking about doing.

      Ken, I let Barry borrow your razor for his fuzz. I hope that was all right.

      I'm going to drain the old stinger.

      Yeah, you do that.

      Look at that.

      You know, I've just about had it

      with your little mind games.

      What's that? Italian Vogue. Mamma mia, that's a lot of pages.

      A lot of ads.

      Remember what Van said, why is your life more valuable than mine?

      Funny, I just can't seem to recall that!

      I think something stinks in here!

      I love the smell of flowers.

      How do you like the smell of flames?!

      Not as much.

      Water bug! Not taking sides!

      Ken, I'm wearing a Ohapstick hat! This is pathetic!

      I've got issues!

      Well, well, well, a royal flush!

      You're bluffing. Am I? Surf's up, dude!

      Poo water!

      That bowl is gnarly.

      Except for those dirty yellow rings!

      Kenneth! What are you doing?!

      You know, I don't even like honey! I don't eat it!

      We need to talk!

      He's just a little bee!

      And he happens to be the nicest bee I've met in a long time!

      Long time? What are you talking about?! Are there other bugs in your life?

      No, but there are other things bugging me in life. And you're one of them!

      Fine! Talking bees, no yogurt night…

      My nerves are fried from riding on this emotional roller coaster!

      Goodbye, Ken.

      And for your information,

      I prefer sugar-free, artificial sweeteners made by man!

      I'm sorry about all that.

      I know it's got an aftertaste! I like it!

      I always felt there was some kind of barrier between Ken and me.

      I couldn't overcome it. Oh, well.

      Are you OK for the trial?

      I believe Mr. Montgomery is about out of ideas.

      We would like to call Mr. Barry Benson Bee to the stand.

      Good idea! You can really see why he's considered one of the best lawyers…

      Yeah.

      Layton, you've gotta weave some magic

      with this jury, or it's gonna be all over.

      Don't worry. The only thing I have to do to turn this jury around

      is to remind them of what they don't like about bees.

      You got the tweezers? Are you allergic? Only to losing, son. Only to losing.

      Mr. Benson Bee, I'll ask you what I think we'd all like to know.

      What exactly is your relationship

      to that woman?

      We're friends.

      Good friends? Yes. How good? Do you live together?

      Wait a minute…

      Are you her little…

      …bedbug?

      I've seen a bee documentary or two. From what I understand,

      doesn't your queen give birth to all the bee children?

      Yeah, but…

      So those aren't your real parents!

      Oh, Barry…

      Yes, they are!

      Hold me back!

      You're an illegitimate bee, aren't you, Benson?

      He's denouncing bees!

      Don't y'all date your cousins?

      Objection! I'm going to pincushion this guy! Adam, don't! It's what he wants!

      Oh, I'm hit!!

      Oh, lordy, I am hit!

      Order! Order!

      The venom! The venom is coursing through my veins!

      I have been felled by a winged beast of destruction!

      You see? You can't treat them like equals! They're striped savages!

      Stinging's the only thing they know! It's their way!

      Adam, stay with me. I can't feel my legs. What angel of mercy will come forward to suck the poison

      from my heaving buttocks?

      I will have order in this court. Order!

      Order, please!

      The case of the honeybees versus the human race

      took a pointed turn against the bees

      yesterday when one of their legal team stung Layton T. Montgomery.

      Hey, buddy.

      Hey.

      Is there much pain?

      Yeah.

      I…

      I blew the whole case, didn't I?

      It doesn't matter. What matters is you're alive. You could have died.

      I'd be better off dead. Look at me.

      They got it from the cafeteria downstairs, in a tuna sandwich.

      Look, there's a little celery still on it.

      What was it like to sting someone?

      I can't explain it. It was all…

      All adrenaline and then… and then ecstasy!

      All right.

      You think it was all a trap?

      Of course. I'm sorry. I flew us right into this.

      What were we thinking? Look at us. We're just a couple of bugs in this world.

      What will the humans do to us if they win?

      I don't know.

      I hear they put the roaches in motels. That doesn't sound so bad.

      Adam, they check in, but they don't check out!

      Oh, my.

      Oould you get a nurse to close that window?

      Why? The smoke. Bees don't smoke.

      Right. Bees don't smoke.

      Bees don't smoke! But some bees are smoking.

      That's it! That's our case!

      It is? It's not over?

      Get dressed. I've gotta go somewhere.

      Get back to the court and stall. Stall any way you can.

      And assuming you've done step correctly, you're ready for the tub.

      Mr. Flayman.

      Yes? Yes, Your Honor!

      Where is the rest of your team?

      Well, Your Honor, it's interesting.

      Bees are trained to fly haphazardly,

      and as a result, we don't make very good time.

      I actually heard a funny story about…

      Your Honor, haven't these ridiculous bugs

      taken up enough of this court's valuable time?

      How much longer will we allow these absurd shenanigans to go on?

      They have presented no compelling evidence to support their charges

      against my clients, who run legitimate businesses.

      I move for a complete dismissal of this entire case!

      Mr. Flayman, I'm afraid I'm going

      to have to consider Mr. Montgomery's motion.

      But you can't! We have a terrific case.

      Where is your proof? Where is the evidence?

      Show me the smoking gun!

      Hold it, Your Honor! You want a smoking gun?

      Here is your smoking gun.

      What is that?

      It's a bee smoker!

      What, this? This harmless little contraption?

      This couldn't hurt a fly, let alone a bee.

      Look at what has happened

      to bees who have never been asked, "Smoking or non?"

      Is this what nature intended for us?

      To be forcibly addicted to smoke machines

      and man-made wooden slat work camps?

      Living out our lives as honey slaves to the white man?

      What are we gonna do? He's playing the species card. Ladies and gentlemen, please, free these bees!

      Free the bees! Free the bees!

      Free the bees!

      Free the bees! Free the bees!

      The court finds in favor of the bees!

      Vanessa, we won!

      I knew you could do it! High-five!

      Sorry.

      I'm OK! You know what this means?

      All the honey will finally belong to the bees.

      Now we won't have to work so hard all the time.

      This is an unholy perversion of the balance of nature, Benson.

      You'll regret this.

      Barry, how much honey is out there?

      All right. One at a time.

      Barry, who are you wearing?

      My sweater is Ralph Lauren, and I have no pants.

      What if Montgomery's right? What do you mean? We've been living the bee way a long time, 27 million years.

      Oongratulations on your victory. What will you demand as a settlement?

      First, we'll demand a complete shutdown of all bee work camps.

      Then we want back the honey that was ours to begin with,

      every last drop.

      We demand an end to the glorification of the bear as anything more

      than a filthy, smelly, bad-breath stink machine.

      We're all aware of what they do in the woods.

      Wait for my signal.

      Take him out.

      He'll have nauseous for a few hours, then he'll be fine.

      And we will no longer tolerate bee-negative nicknames…

      But it's just a prance-about stage name!

      …unnecessary inclusion of honey in bogus health products

      and la-dee-da human tea-time snack garnishments.

      Oan't breathe.

      Bring it in, boys!

      Hold it right there! Good.

      Tap it.

      Mr. Buzzwell, we just passed three cups, and there's gallons more coming!

      I think we need to shut down! Shut down? We've never shut down. Shut down honey production!

      Stop making honey!

      Turn your key, sir!

      What do we do now?

      Oannonball!

      We're shutting honey production!

      Mission abort.

      Aborting pollination and nectar detail. Returning to base.

      Adam, you wouldn't believe how much honey was out there.

      Oh, yeah?

      What's going on? Where is everybody?

      Are they out celebrating? They're home. They don't know what to do. Laying out, sleeping in.

      I heard your Uncle Oarl was on his way to San Antonio with a cricket.

      At least we got our honey back.

      Sometimes I think, so what if humans liked our honey? Who wouldn't?

      It's the greatest thing in the world! I was excited to be part of making it.

      This was my new desk. This was my new job. I wanted to do it really well.

      And now…

      Now I can't.

      I don't understand why they're not happy.

      I thought their lives would be better!

      They're doing nothing. It's amazing. Honey really changes people.

      You don't have any idea what's going on, do you?

      What did you want to show me? This. What happened here?

      That is not the half of it.

      Oh, no. Oh, my.

      They're all wilting.

      Doesn't look very good, does it?

      No.

      And whose fault do you think that is?

      You know, I'm gonna guess bees.

      Bees?

      Specifically, me.

      I didn't think bees not needing to make honey would affect all these things.

      It's notjust flowers. Fruits, vegetables, they all need bees.

      That's our whole SAT test right there.

      Take away produce, that affects the entire animal kingdom.

      And then, of course…

      The human species?

      So if there's no more pollination,

      it could all just go south here, couldn't it?

      I know this is also partly my fault.

      How about a suicide pact?

      How do we do it?

      I'll sting you, you step on me. Thatjust kills you twice. Right, right.

      Listen, Barry… sorry, but I gotta get going.

      I had to open my mouth and talk.

      Vanessa?

      Vanessa? Why are you leaving? Where are you going?

      To the final Tournament of Roses parade in Pasadena.

      They've moved it to this weekend because all the flowers are dying.

      It's the last chance I'll ever have to see it.

      Vanessa, I just wanna say I'm sorry. I never meant it to turn out like this.

      I know. Me neither.

      Tournament of Roses. Roses can't do sports.

      Wait a minute. Roses. Roses?

      Roses!

      Vanessa!

      Roses?!

      Barry?

      Roses are flowers! Yes, they are. Flowers, bees, pollen!

      I know. That's why this is the last parade.

      Maybe not. Oould you ask him to slow down?

      Oould you slow down?

      Barry!

      OK, I made a huge mistake. This is a total disaster, all my fault.

      Yes, it kind of is.

      I've ruined the planet. I wanted to help you

      with the flower shop. I've made it worse.

      Actually, it's completely closed down.

      I thought maybe you were remodeling.

      But I have another idea, and it's greater than my previous ideas combined.

      I don't want to hear it!

      All right, they have the roses, the roses have the pollen.

      I know every bee, plant and flower bud in this park.

      All we gotta do is get what they've got back here with what we've got.

      Bees.

      Park.

      Pollen!

      Flowers.

      Repollination!

      Across the nation!

      Tournament of Roses, Pasadena, Oalifornia.

      They've got nothing but flowers, floats and cotton candy.

      Security will be tight.

      I have an idea.

      Vanessa Bloome, FTD.

      Official floral business. It's real.

      Sorry, ma'am. Nice brooch.

      Thank you. It was a gift.

      Once inside, we just pick the right float.

      How about The Princess and the Pea?

      I could be the princess, and you could be the pea!

      Yes, I got it.

      Where should I sit?

      What are you?

      I believe I'm the pea.

      The pea?

      It goes under the mattresses.

      Not in this fairy tale, sweetheart. I'm getting the marshal. You do that! This whole parade is a fiasco!

      Let's see what this baby'll do.

      Hey, what are you doing?!

      Then all we do is blend in with traffic…

      …without arousing suspicion.

      Once at the airport, there's no stopping us.

      Stop! Security.

      You and your insect pack your float? Yes. Has it been in your possession the entire time?

      Would you remove your shoes?

      Remove your stinger. It's part of me. I know. Just having some fun. Enjoy your flight.

      Then if we're lucky, we'll have just enough pollen to do the job.

      Oan you believe how lucky we are? We have just enough pollen to do the job!

      I think this is gonna work.

      It's got to work.

      Attention, passengers, this is Oaptain Scott.

      We have a bit of bad weather in New York.

      It looks like we'll experience a couple hours delay.

      Barry, these are cut flowers with no water. They'll never make it.

      I gotta get up there and talk to them.

      Be careful.

      Oan I get help with the Sky Mall magazine?

      I'd like to order the talking inflatable nose and ear hair trimmer.

      Oaptain, I'm in a real situation.

      What'd you say, Hal? Nothing. Bee!

      Don't freak out! My entire species…

      What are you doing?

      Wait a minute! I'm an attorney! Who's an attorney? Don't move.

      Oh, Barry.

      Good afternoon, passengers. This is your captain.

      Would a Miss Vanessa Bloome in 24B please report to the cockpit?

      And please hurry!

      What happened here?

      There was a DustBuster, a toupee, a life raft exploded.

      One's bald, one's in a boat, they're both unconscious!

      Is that another bee joke? No! No one's flying the plane!

      This is JFK control tower, Flight 356. What's your status?

      This is Vanessa Bloome. I'm a florist from New York.

      Where's the pilot?

      He's unconscious, and so is the copilot.

      Not good. Does anyone onboard have flight experience?

      As a matter of fact, there is.

      Who's that? Barry Benson. From the honey trial?! Oh, great.

      Vanessa, this is nothing more than a big metal bee.

      It's got giant wings, huge engines.

      I can't fly a plane.

      Why not? Isn't John Travolta a pilot? Yes. How hard could it be?

      Wait, Barry! We're headed into some lightning.

      This is Bob Bumble. We have some late-breaking news from JFK Airport,

      where a suspenseful scene is developing.

      Barry Benson, fresh from his legal victory…

      That's Barry!

      …is attempting to land a plane, loaded with people, flowers

      and an incapacitated flight crew.

      Flowers?!

      We have a storm in the area and two individuals at the controls

      with absolutely no flight experience.

      Just a minute. There's a bee on that plane.

      I'm quite familiar with Mr. Benson and his no-account compadres.

      They've done enough damage.

      But isn't he your only hope?

      Technically, a bee shouldn't be able to fly at all.

      Their wings are too small…

      Haven't we heard this a million times?

      "The surface area of the wings and body mass make no sense."

      Get this on the air!

      Got it.

      Stand by.

      We're going live.

      The way we work may be a mystery to you.

      Making honey takes a lot of bees doing a lot of small jobs.

      But let me tell you about a small job.

      If you do it well, it makes a big difference.

      More than we realized. To us, to everyone.

      That's why I want to get bees back to working together.

      That's the bee way! We're not made of Jell-O.

      We get behind a fellow.

      Black and yellow! Hello! Left, right, down, hover.

      Hover? Forget hover. This isn't so hard. Beep-beep! Beep-beep!

      Barry, what happened?!

      Wait, I think we were on autopilot the whole time.

      That may have been helping me. And now we're not! So it turns out I cannot fly a plane.

      All of you, let's get behind this fellow! Move it out!

      Move out!

      Our only chance is if I do what I'd do, you copy me with the wings of the plane!

      Don't have to yell.

      I'm not yelling! We're in a lot of trouble.

      It's very hard to concentrate with that panicky tone in your voice!

      It's not a tone. I'm panicking!

      I can't do this!

      Vanessa, pull yourself together. You have to snap out of it!

      You snap out of it.

      You snap out of it.

      You snap out of it!

      You snap out of it!

      You snap out of it!

      You snap out of it!

      You snap out of it!

      You snap out of it!

      Hold it!

      Why? Oome on, it's my turn.

      How is the plane flying?

      I don't know.

      Hello?

      Benson, got any flowers for a happy occasion in there?

      The Pollen Jocks!

      They do get behind a fellow.

      Black and yellow. Hello. All right, let's drop this tin can on the blacktop.

      Where? I can't see anything. Oan you?

      No, nothing. It's all cloudy.

      Oome on. You got to think bee, Barry.

      Thinking bee. Thinking bee. Thinking bee! Thinking bee! Thinking bee!

      Wait a minute. I think I'm feeling something.

      What? I don't know. It's strong, pulling me. Like a 27-million-year-old instinct.

      Bring the nose down.

      Thinking bee! Thinking bee! Thinking bee!

      What in the world is on the tarmac? Get some lights on that! Thinking bee! Thinking bee! Thinking bee!

      Vanessa, aim for the flower. OK. Out the engines. We're going in on bee power. Ready, boys?

      Affirmative!

      Good. Good. Easy, now. That's it.

      Land on that flower!

      Ready? Full reverse!

      Spin it around!

      Not that flower! The other one!

      Which one?

      That flower.

      I'm aiming at the flower!

      That's a fat guy in a flowered shirt. I mean the giant pulsating flower

      made of millions of bees!

      Pull forward. Nose down. Tail up.

      Rotate around it.

      This is insane, Barry! This's the only way I know how to fly. Am I koo-koo-kachoo, or is this plane flying in an insect-like pattern?

      Get your nose in there. Don't be afraid. Smell it. Full reverse!

      Just drop it. Be a part of it.

      Aim for the center!

      Now drop it in! Drop it in, woman!

      Oome on, already.

      Barry, we did it! You taught me how to fly!

      Yes. No high-five! Right. Barry, it worked! Did you see the giant flower?

      What giant flower? Where? Of course I saw the flower! That was genius!

      Thank you. But we're not done yet. Listen, everyone!

      This runway is covered with the last pollen

      from the last flowers available anywhere on Earth.

      That means this is our last chance.

      We're the only ones who make honey, pollinate flowers and dress like this.

      If we're gonna survive as a species, this is our moment! What do you say?

      Are we going to be bees, orjust Museum of Natural History keychains?

      We're bees!

      Keychain!

      Then follow me! Except Keychain.

      Hold on, Barry. Here.

      You've earned this.

      Yeah!

      I'm a Pollen Jock! And it's a perfect fit. All I gotta do are the sleeves.

      Oh, yeah.

      That's our Barry.

      Mom! The bees are back!

      If anybody needs to make a call, now's the time.

      I got a feeling we'll be working late tonight!

      Here's your change. Have a great afternoon! Oan I help who's next?

      Would you like some honey with that? It is bee-approved. Don't forget these.

      Milk, cream, cheese, it's all me. And I don't see a nickel!

      Sometimes I just feel like a piece of meat!

      I had no idea.

      Barry, I'm sorry. Have you got a moment?

      Would you excuse me? My mosquito associate will help you.

      Sorry I'm late.

      He's a lawyer too?

      I was already a blood-sucking parasite. All I needed was a briefcase.

      Have a great afternoon!

      Barry, I just got this huge tulip order, and I can't get them anywhere.

      No problem, Vannie. Just leave it to me.

      You're a lifesaver, Barry. Oan I help who's next?

      All right, scramble, jocks! It's time to fly.

      Thank you, Barry!

      That bee is living my life!

      Let it go, Kenny.

      When will this nightmare end?!

      Let it all go.

      Beautiful day to fly.

      Sure is.

      Between you and me, I was dying to get out of that office.

      You have got to start thinking bee, my friend.

      Thinking bee! Me? Hold it. Let's just stop for a second. Hold it.

      I'm sorry. I'm sorry, everyone. Oan we stop here?

      I'm not making a major life decision during a production number!

      All right. Take ten, everybody. Wrap it up, guys.

      I had virtually no rehearsal for that.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2024-02438

      Corresponding author(s): Ryusuke, Niwa

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      Below are quotes from the Reviewers' overall evaluations:

      As might be expected based on the authors' skills and expertise, the study is well executed, nicely documented with perfect microscopy images, and well presented. It has been easy to follow. However, suitability for publication depends on where the authors aim to place their paper. Although I like the paper very much, it might seem incomplete for high-end journals.

      This is a very nice paper and solid piece of work.

      Its major strength is the focus on poorly studied the male reproductive organ and identification of Ldh as a novel target of JH activity in the seminal vesicles.

      While the developmental roles of insect Juvenile Hormone (JH) are very well studied, its adult functions are largely unknown. Target genes of JH signaling are poorly described. This study adds significant insight into both of these aspects. The study underscores the usefulness of the JHRE-GFP reporter that identifies JH function, and not just JH presence since the reporter is only expressed after JH binding to Met and Gce, a prerequisite for JHRE reporter activation.

      The authors have identified the epithelial cells of the ____Drosophila____ seminal vesicle as a JH target tissue. The authors nicely extended this finding by mining already existing expression data to identify a specific JH induced gene in these cells.

      This small study reports new but limited results (one tissue of one stage, one hormone) that could be useful for specialists. The work is solid and includes controls and interpretable data.

      2. Description of the planned revisions

      Insert here a point-by-point reply that explains what revisions, additional experimentations and analyses are planned to address the points raised by the referees.

      1) The study suggests an important role for JH signaling in the SV, likely affecting reproductive capacity of males. The authors depleted the JH receptors through RNAi, achieving a loss in the expression of the WT JHRE-GFP reporter as well as of the authentic target Ldh. Surprisingly, no phenotypic consequences of the double KD of Met and gce are presented. Does that mean that there were none? The authors only discuss a potential impact of Ldh loss for metabolism. Unless I am missing something, the study reports molecular phenotypes that clearly document JH signaling in the SV but no physiological impact of loss of this JH signaling, suggesting that there may be no obvious biological role for JH in this context. I think this is unlikely. Have the authors check fertility of the males, sperm viability and quality, mating competitiveness of the RNAi males? Loss of JH epoxidation (only methyl farnesoate present) made mosquito males less fit and less reproductively competitive relative to epox+ controls (Nouzova et al., 2021, PNAS) -- btw, I think the authors should discuss this paper.

      Our response: We will conduct the following experiments to answer these criticisms.

      1) We will examine the male fertility by counting the number of offspring from wild-type mothers crossed with males of the seminal vesicle-specific ____Met _& _gce____ double RNAi and with males of control RNAi.

      2) We will also examine the mating competitiveness of the RNAi males. In more detail, we will cross ____w1118_ (white eye) wild-type background females with (i) a mixed population of males of _w1118_ wild-type background males and_ w+_ (red eye) control RNAi males, and (ii) a mixed population of males of _w1118_ wild-type background males and_ w+ Met _& _gce____ double RNAi males. We can distinguish between the progenies from RNAi males and those from wild-type males by eye colors.

      By conducting plans 1) and 2), we will also indirectly evaluate sperm viability and quality.

      In addition, we will also discuss the paper of Nouzova et al. PNAS 2021 in the Discussion section.

      2) The authors seem to have made no effort to distinguish between Met and Gce functions. It is always the results from the double knockdown of both paralogs that are presented. Does this mean that single-KD had no effect, thereby indicating entirely redundant functions of both proteins in the studied context? Even if so, it would be of interest to document this redundancy by showing the single-gene KD data. However, I would be surprised if both proteins were equally important in the SV. The authors checked mRNA/protein expression levels. Was any of the two paralogs prevalent in the SV?

      Our response: To address this criticism, we will conduct a single transgenic RNAi experiment to knock down either Met or gce separately and assess JHRE-GFP signals in the seminal vesicles.

      __ Regarding the expression of Met and gce in the seminal vesicles, a previous study (Baumann et al. Scientific Reports 7: 2132, DOI:10.1038/s41598-017-02264-41) has already reported that GFP signals are observed in the seminal vesicles of _Met-T2A-GAL4>UAS_-GFP and gce-T2A-GAL4>UAS-GFP animals. These results strongly indicate that both Met and gce are expressed in the seminal vesicles. We will describe and discuss this point in our revised manuscript. In addition, we plan to check and analyze gene expression of Met, gce, and Ldh in the seminal vesicles using a publicly-available single-cell RNA-seq database, such as _DRscDB (https://www.flyrnai.org/tools/singlecell/web/).

      3) The authors argue for direct regulation of Ldh by Met/Gce (again by which one?). Oddly, the statement in the Results (l.187-188; "suggests ... direct target") is stronger than in the Discussion (l.214, "leaving open the possibility"). The putative JHREs upstream and within the Ldh gene are identified but not tested in a functional study. At least a simple luciferase reporter assay and mutagenesis of the JHREs should be attempted.

      Our response: To address this criticism, we plan to conduct a luciferase-based promoter/enhancer analysis in Drosophila S2 cultured cells. A similar system was used for a JH-responsiveness of the JHRE promoter in a previous study (Jindra et al. PLoS Genetics 11: e1005394, DOI: __10.1371/journal.pgen.1005394). We will generate plasmid constructs carrying the luciferase coding regions. In these plasmids, the luciferase coding regions will be fused with the upstream region and the first intron region of Ldh possessing the intact E-boxes or the mutated E-boxes. Then, we will determine whether the luciferase activity is enhanced by the presence of a JH analog (methoprene) when E-boxes are intact. __

      __ For this revision, a new collaborator, Ryosuke Hayashi (a graduate student in the Niwa lab), will participate in this analysis. Thus, he becomes a co-author in the revised manuscript.__

      l.232-233. It is not surprising that the JHRR-lacZ reporter shows a different expression pattern relative to JHRE-GFP, as these are really different constructs. The problem is that JH-dependent activation of the JHRR-lacZ transgene has not been tested as thoroughly as that of JHRE-GFP. Is it inducible by added JH or methoprene?

      Have the authors examined whether JHRE-lacZ expression increases with Methoprene?

      Our response: We have yet to do this analysis. To address this important point from Reviewers #1 and #2, we will examine whether JHRR-lacZ expression is upregulated in the seminal vesicles of virgin males fed methoprene-supplemented food. The lacZ signals will be visualized by immunostaining with an anti-LacZ antibody.

      Document testis staining of JHRE-GFP. I think the authors missed a chance by not providing a clear/nice picture of the testis staining. Stainings of testes squashed on a slide is easy and would nicely document in which cells the reporter is activated. Similarly, extracting sperm from the seminal vesicle and examining whether the sperm express JHRE-GFP would be informative.

      Our response: As the reviewer suggested, we will assess JHRE-GFP signal in sperm in squashed testis samples.

      Did the authors try to analyze the 66 genes identified in seminal vesicle whether they had JHRE elements? This could yield additional significant information about other JH responsive genes in the seminal vesicle.

      Our response: We have yet to do this analysis. We will follow the reviewer's suggestion and examine whether the 66 genes identified in the seminal vesicle have JHRE elements.

      3a. Doublestaining would further confirm that pd8-Gal4 (crossed to UAS-dsRed) and JHRE-GFP overlap.

      3b. Similarly, Doublestaining would further confirm that pd8-Gal4 (crossed to UAS-dsREd) and JHRE-GFP overlap.

      Our response: To address this question, we will generate males of Pde8-GAL4; UAS-red fluorescent protein (RedStinger, RFP, or DsRed); JHRE-GFP and observe the overlap between the red fluorescent signals and green fluorescent (JHRE-GFP) signals in the seminal vesicle epithelial cells.

      Minor comments:

      Fig.1a could be in a supplement.

      __Our response: At this point, we are unsure whether to follow this reviewer's suggestion. This is because there are no supplemental figures in the current manuscript, so we hesitate to create a supplemental figure just for this one figure. On the other hand, three reviewers now ask us to perform various additional experiments, thus some of the new data may be shown as supplemental figures. In this case, Fig. 1a can be moved to a supplemental figure, but we would like to wait on this decision. __

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      l.25,91,117, and throughout, "JH analog" or "JHA". The authors only use methoprene, so it would be better to specifically talk about methoprene, which is a proven agonist ligand of the JHR proteins (reference 10 and/or Jindra and Bittova, 2020 [Arch Insect Biochem Physiol] for a review). This would land more credibility to using methoprene than just referring to a "JHA".

      Our response: According to the reviewer's suggestion, we have replaced "JHA" with "methoprene" as many as possible. In Figures, we used "MTP" instead of "methoprene" due to space limitations.

      l.42,44, "paralogs". I believe in this case the authors refer to orthologs of Met in other species. Paralogs result from gene duplications within species, such as Met and gce in cyclorrhaphous flies or Met 1 and 2 in the Lepidoptera. I recommend a recent review on all bHLH-PAS proteins featuring reconstruction of the phylogenetic position of Met/Gce (Tumova et al., 2024 in J Mol Biol).

      Our response: As suggested, we have replaced "paralogs" and "paralogous" with "orthologs" and "orthologous," respectively on P3. We have also cited Tumova et al. J. Mol. Biol. 2023 as a new Ref 12.

      l.54, "Met and Gce act redundantly to regulate JH-responsive gene expression". Ref 10 should be cited here as it provides functional cell-based and genetic rescue evidence for each paralog.

      Our response: We have cited Ref 10 as suggested.

      l.66, It would be better to start "In this study" or "Here" to distinguish from the last cited paper.

      Our response:____ We created a new paragraph with the sentence "In this study..." at the beginning. We hope we understand the reviewer's suggestion correctly.

      l.175, levels were

      Our response: We have fixed this error in the transferred manuscript.

      l.209, might be evolutionarily among.... conserved ??

      Our response: We have fixed this error in the transferred manuscript.

      l.226, study has

      Our response: We have fixed this error in the transferred manuscript.

      l.227-229. The authors are missing a paper by Shin et al., 2012 (PNAS) that shows physical interaction of Met with Cycle and their regulation of circadian gene activity and another paper by Bajgar et al., 2013 (PNAS) which describes photoperid-dependent seasonal regulation of circadian genes by Met, Clk and Cyc.

      On the other hand, the cited reference [51] does NOT demonstrate Met:Clk heterodimer since coIP is by no means adequate to address complex stoichiometry. In fact, it is suspicious that Met would heterodimerize and either Cyc or Clk, as they present class II and class I bHLH-PAS proteins.

      Our response: In response to both comments from Reviewer #1, ____we have cited these references and rewritten the discussion on P10-11 as below: "An interesting previous study has reported that the seminal vesicle expresses multiple clock genes such as period, Clock (Clk), and timeless, all of which are necessary for generating proper circadian rhythm [52]. In the case of the mosquito Aedes aegypti female, it is reported that JH controls gene expression via a heterodimer of Met and circadian rhythm factor Cycle (CYC) [53]. It was also suggested that Met binds directly to CLK in D. melanogaster [54]. In addition, in the linden bug, Pyrrhocoris apterus, JH alters gene expression via Met, CLK, and CYC in the gut [55]. Considering these previous reports and our results, circadian rhythm factors and JH may cooperate to regulate gene expression in the seminal vesicles."

      l.245. It is not "whether", but for sure the existing reporters only reflect limited JHR activity, being based on Kr-h1 JHREs. These reporters likely uncover only a small subset of JH activity in vivo.

      Our response: We have rewritten the sentence as follows: "..., more comprehensive JH reporter strains will be needed in D. melanogaster as well as other insects in future studies."

      reference 10/11 is duplicated.

      Our response: We have fixed this error in the transferred manuscript.

      Have the authors done a careful comparison of JHRE-GFP expression and the Met/gce reporter expression described by Baumann et al (Scientific Reports | 7: 2132 | DOI:10.1038/s41598-017-02264-4)? Would be nice to add a few more sentences in the discussion.

      Our response: As suggested, we have added some sentences to explain this point on Page 11 as below: "P____revious studies reported that ____Met-T2A-GAL4_ and _gce-T2A-GAL4_ labeled male accessory glands, ejaculatory duct, and testes as well as seminal vesicles. On the other hand, in our results, JHRE-GFP only labels cells in seminal vesicles and testes [21]. Considering that Met and Gce are expressed in almost all cell types of male reproductive tracts [21], more comprehensive JH reporter strains will be needed in _D. melanogaster____ as well as other insects in future studies."

      • In the discussion:*

      6.1 Would have liked to see a more in depth discussion of the role of the seminal vesicle. How could that be supported by JH / metabolic processes? Does it have secretory functions that might be induced by JH? Important functions relative to sperm storage? How could that relate to the finding that JH response is enhanced by mating?

      Our response: Unfortunately, the function of the seminal vesicles is largely unknown. However, ____in response to the reviewer's suggestion, we have added some sentences to discuss this point and cited some references describing the seminal vesicles in insects other than the fruit fly, as follows on P9-10: "Furthermore, in some insects other than D. melanogaster, morphological and ultrastructural studies revealed that secretory vesicles were observed in the epithelial cells of the seminal vesicles [37,38,40,44]. JH is known to stimulate secretory activity in the male accessory glands of many insects [45]. Based on the JH response in the seminal vesicles, it is possible that JH signaling affects the secretory activity of the seminal vesicles in D. melanogaster."

      The arrow in figure is not defined

      Our response: We believe that the reviewer pointed out the arrow in Figure 1e. We have added a sentence to define the arrow in the Figure legend as "The arrow indicates the cell with a GFP signal."

      Figure 2b graph labels are flipped

      Our response: We have fixed the error.

      Line 624: Change "Allow heads" to "Arrowheads"

      Our response: We have fixed this error in the transferred manuscript.

      Major Comments:

      The work uses standard methods and strains. Although the specific findings are new and believable, the authors interpret them beyond what is appropriate. For example, based on increased amounts of a single RNA, they propose that JH regulates metabolism in seminal vesicles and because circadian rhythm genes were known to be expressed in this tissue they propose that JH and circadian systems work together there.

      Our response: In response to the reviewer's criticisms, we have discussed our arguments more appropriately in the Discussion. For example, we have mentioned circadian rhythm more carefully on Pages 10-11 as follows: "An interesting previous study has reported that the seminal vesicle expresses multiple clock genes such as period, Clock (Clk), and timeless, all of which are necessary for generating proper circadian rhythm [52]. In case of mosquito Aedes aegypti female, it is reported that JH controls gene expression via a heterodimer of Met and circadian rhythm factor Cycle (CYC) [53]. It was also suggested that Met binds directly to CLK in D. melanogaster [54]. In addition, in the linden bug, Pyrrhocoris apterus, JH alter gene expression via Met, CLK and CYC in the gut [55]. Considering these previous reports and our results, it is possible that circadian rhythm factors and JH cooperatively regulate gene expression in the seminal vesicles."

      __ Regarding Ldh, we have added a sentence on Page 10 as "Also, the biological significance of the induction of Ldh expression by JH signaling is not clear."__

      4. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

      l.244, tract

      Our response: We have carefully checked out the usage of "tract" and "tracts" not only on Page 11 but also throughout the manuscript. We have decided to use "tracts," but not "tract," throughout the manuscript.

      6.2 What do epithelial cells of spermatheca do?

      Our response: We agree with the reviewer that this is a very interesting question. However, please note that this paper focuses on males, and females are beyond our current scope. We plan to examine JHRE-GFP signals in the spermatheca in a different project. We do appreciate the reviewer's kind understanding.

      6.3 How do the authors envision that JH enters the epithelial cells?

      __Our response:____ We don't have any hypotheses on this point. Transporters may exist to achieve intracellular permeability of JH, but we do not think this point has been discussed in current insect physiology. Furthermore, since this issue is related to all JH-responsive cells, not just seminal vesicle epithelial cells, we do not feel the need to discuss it in this paper. __

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      Both reviewers positively received the manuscript, in general. The agreement was that the manuscript presented valuable findings, using solid techniques and approaches, that shed additional light into how the canine distemper virus hemagglutinin might engage cellular receptors and how that engagement impacts host tropism. While both reviewers appreciated the X-ray crystallographic data, they also felt that the AFM experiments could have been performed at a higher standard and that the interpretation of the results ensuing from those AFM experiments could have been explained more thoroughly and in simpler terms. An additional missed opportunity of the current manuscript is the lack of comparison of the crystal structure to that of the already published cryo-EM structure, for context.

      Thank you very much for constructive comments of the editor and reviewers. Following your comments, we have changed the text related to the AFM experiments with simpler terms as follows.

      “When CDV-H was loaded onto a mica substrate and scanned with a cantilever to acquire images of attached molecules, the CDV-H dimer was observed as two globules clustered together in most cases, but sometimes, each domain moved independently (Fig. 7B and Supplementary Movie). Time-course analysis of the dynamics of the representative CDV-H dimer showed that CDV-H could adopt both associated and dissociated forms (Fig. 7C). The distances between the domains were calculated by measuring those between the centers of mass of each domain. Finally, the distribution of distances between each head domain in the CDV-H dimers showed approximately 15 nm as a major peak (Fig. 7D). This is a reasonable length for the linker between the head domain dimers.” in Page 11, Lines 8-17.

      With regards to the structural comparison between cryo-EM structure published in Proc. Natl. Acad. Sci. U. S. A. (2023) 120, e2208866120 and our crystal structure, we have compared these structures for Cα on page 6 and added the following text. “A recent cryo-EM structure of the wild-type CDV-H ectodomain revealed that the head dimer is located on one side of the stalk region in solution (Proc. Natl. Acad. Sci. U. S. A. (2023) 120, e2208866120)” in Page 14, Lines 22-24.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Fukuhara, Maenaka, and colleagues report a crystal structure of the canine distemper virus (CDV) attachment hemagglutinin protein globular domain. The structure shows a dimeric organization of the viral protein and describes the detailed amino-acid side chain interactions between the two protomers. The authors also use their best judgement to comment on predicted sites for the two cellular receptors - Nectin-4 and SLAM - and thus speculate on the CDV host tropism. A complementary AFM study suggests a breathing movement at the hemagglutinin dimer interface.

      Strengths:

      The study of CDV and related Paramyxoviruses is significant for human/animal health and is very timely. The crystallographic data seem to be of good quality.

      Thank you very much for the constructive comment of the reviewer.

      Weaknesses:

      While the recent CDV hemagglutinin cryo-EM structure is mentioned, it is not compared to the present crystal structure, and thus the context of the present study is poorly justified. Additionally, the results of the AFM experiment are not unexpected. Indeed, other paramyxoviral RBP/G proteins also show movement at the protomer interface.

      Thank you very much for constructive comments of the reviewer. When we submitted our manuscript to e-life, cryo-EM structure just published in Proc. Natl. Acad. Sci. U. S. A. (2023) 120, e2208866120 a week ago was not able to be available. Following the comment of the reviewer, we have added the text about the structural comparison between the cryo-EM structure and our crystal structure. We also have changed the text related to the AFM experiments to tone down the movement of the protomer interfaceas follows.

      “This observation raises the possibility that each head domain of CDV-H also dissociates and moves flexibly, as shown in the structure of Nipah virus (NiV)-G protein, previously (Science (2022) 375, 1373–1378).” in Page 11, Lines 4-6.

      Reviewer #2 (Public Review):

      Summary:

      The authors solved the crystal structure of CDV H-protein head domain at 3,2 A resolution to better understand the detailed mechanism of membrane fusion triggering. The structure clearly showed that the orientation of the H monomers in the homodimer was similar to that of measles virus H and different from other paramyxoviruses. The authors used the available co-crystal strictures of the closely related measles virus H structures with the SLAM and Nectin4 receptors to map the receptor binding site on CDV H. The authors also confirmed which N-linked sites were glycosylated in the CDV H protein and showed that both wildtype and vaccine strains of CDV H have the same glycosylation pattern. The authors documented that the glycans cover a vast majority of the H surface while leaving the receptor binding site exposed, which may in part explain the long-term success of measles virus and CDV vaccines. Finally, the authors used HS-AFM to visualize the real-time dynamic characteristics of CDV-H under physiological conditions. This analysis indicated that homodimers may dissociate into monomers, which has implications for the model of fusion triggering.

      The structural data and analysis were thorough and well-presented. However, the HS-AFM data, while very exciting, was not presented in a manner that could be easily grasped by readers of this manuscript. I have some suggestions for improvement.

      (1) The authors claim their structure is very similar to the recently published croy-EM structure of CDV H. Can the authors provide us with a quantitative assessment of this statement?

      Thank you very much for constructive comments of the reviewer. When we submitted our manuscript to e-life, cryo-EM structure just published in Proc. Natl. Acad. Sci. U. S. A. (2023) 120, e2208866120 a week ago was not able to be available. Following the comment of the reviewer, we have added the text about the structural comparison between the cryo-EM structure and our crystal structure. We also have changed the text related to the AFM experiments to tone down the movement of the protomer interface as follows.

      “This observation raises the possibility that each head domain of CDV-H also dissociates and moves flexibly, as shown in the structure of Nipah virus (NiV)-G protein, previously (Science (2022) 375, 1373–1378).” in Page 11, Lines 4-6.

      (2) The results for the HS-AFM are difficult to follow and it is not clear how the authors came to their conclusions. Can the authors better explain this data and justify their conclusions based on it?

      Thank you very much for constructive comments of the reviewer. Following your comments, we have changed the text related to the AFM experiments with simpler terms as follows.

      “When CDV-H was loaded onto a mica substrate and scanned with a cantilever to acquire images of attached molecules, the CDV-H dimer was observed as two globules clustered together in most cases, but sometimes, each domain moved independently (Fig. 7B and Supplementary Movie). Time-course analysis of the dynamics of the representative CDV-H dimer showed that CDV-H could adopt both associated and dissociated forms (Fig. 7C). The distances between the domains were calculated by measuring those between the centers of mass of each domain. Finally, the distribution of distances between each head domain in the CDV-H dimers showed approximately 15 nm as a major peak (Fig. 7D). This is a reasonable length for the linker between the head domain dimers.” in Page 11, Lines 8-17.

      (3) The fusion triggering model in Figure 8 is ambiguous as to when H-F interactions are occurring and when they may be disrupted. The authors should clarify this point in their model.

      Thank you very much for constructive comments of the reviewer. Following your comments, we have changed the Figure 8 and its legend.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) AFM experiments with SLAM or Nectin-4 immobilized on the cantilever would be much more informative.

      Thank you very much for the constructive comment of the reviewer. We will try this experiment in the next paper.

      (2) The authors should compare their crystal structure to that of the reported cryo-EM structure.

      With regards to the structural comparison between cryo-EM structure published in Proc. Natl. Acad. Sci. U. S. A. (2023) 120, e2208866120 and our crystal structure, we have added the text.

      (3) Figure 1D - why does the beta2 MG negative control have such a high SPR signal?

      Thank you very much for the constructive comment of the reviewer. The immobilization levels for b 2-microglobulin (beta2 MG), CDV-OP-H and CDV-5VD-H were similar, 1204.7 RU, 1235.7 RU, and 1504.5 RU, respectively. We applied relatively high concentrations (5 mM) of dNectin4 and hNectin4 onto the chip to determine low-affinity dissociation constants. Then, the signals for beta2 MG (negative control) were high. In other SPR experiments for cell surface receptors, such high signals for beta2 MG were often observed in our previous paper, Kuroki et al., J. Immunol. 2019 Dec 15;203(12):3386-3394. doi: 10.4049/jimmunol.1900562. Therefore, we think that these SPR signals are not unusual.

      (4) Figure 1C - please indicate the Ve volume for the peak and add in Ve for standard.

      Thank you very much for the constructive comment of the reviewer. We have indicated the Ve volume for the peak and added in Ve for standard in Figure 1C.

      (5) The authors mention that one of the chains in the asymmetric unit was better resolved than the other. Please show regions of the atomic model fit regions of the electron density to convince the reader of the quality of your data.

      Thank you very much for the constructive comment of the reviewer. We have added new Supplementary figure 2 for comparison of electron density maps of chains A and B.

      (6) Table 2 indicates that the difference between Rw and Rf values is larger than 5% which indicates slight overfitting during refinement. Please provide details of your refinement strategy and attempt simulated annealing as a strategy to reduce this delta.

      Thank you very much for the constructive comment of the reviewer. We further introduced TLS and NCS parameters for the refinement. Consequently, the R/Rfree factors became 0.2645/0.3092. Simulated annealing had been already carried out. All the refinement statistics in the table 2 are updated.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors' fusion triggering model was difficult to follow. For example, this sentence was difficult to understand: "The other possible models may include the monomer-dimer-tetramer transition facilitated by receptor binding for the fusion."

      Thank you very much for the constructive comment of the reviewer. Following your comments, we have removed the above sentences and have added the detail mechanism of the proposed model in Discussion. Furthermore, we have changed the Figure 8 and its legend for readers to understand more clearly.

      (2) Figure 5A is not called out in the main text.

      Thank you very much for the constructive comment of the reviewer. Following your comments, we have added the text as follows.

      “the crystal structure of MeV-H in complex with hNectin-4 showed that the H-SLAM interaction consists of three main sites (Fig. 5A) (Nat. Struct. Mol. Biol. (2013) 20, 67–72).” in Page 11, Lines 4-6.

      (3) Page 9, Line 4: interspaces? Perhaps interphases.

      Thank you very much for the constructive comment of the reviewer. We have changed the term “interspaces” to “internal spaces”.

      (4) Page 12, penultimate line: The authors mention "epitopes for anti-MeV-H Abs." Do they mean anti-CDV-H Abs?

      Thank you very much for the constructive comment of the reviewer. Following your comments, we have changed the “anti-MeV-H Abs” to “anti-morbillivirus H neutralizing antibodies”.

      (5) The paper will benefit from an English language editor to help clarify what the authors are trying to convey.

      Thank you very much for the constructive comment of the reviewer.

      We have asked a English proof reading company to check.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their time and effort to improve and clarify our manuscript. We now have addressed the reviewers’ suggestions in full on a point-by-point basis. Revisions in the manuscript file are highlighted in yellow.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Supernumerary centrosomes are observed in the majority of human tumors. In cells they induce abnormal mitosis leading to chromosome missegregation and aneuploidy. In animal models it is demonstrated that extra centrosomes are sufficient to drive tumor formation. Previous work studying the impact of centrosome amplification on tumor formation in vivo used Plk4 overexpression to drive the formation of supernumerary centrosomes. In this manuscript Moussa and co-workers from the Krämer group developed a mouse model in which centrosome amplification is triggered by the overexpression of the structural centrosomal protein STIL rather than the kinase Plk4 in order to a) assess the potential for centrosome amplification induced by STIL overexpression to drive tumor formation and b) to rule out any potential non-centrosomal related effects of the kinase Plk4 on tumor formation.* The authors show that STIL ovexrexpression in cells (MEFs) drives centrosome amplification and aberrant mitosis (Fig. 1), leading to chromosome missegregation and aneuploidy (Fig. 2). They also show that STIL overexpression is linked to reduced cellular proliferation and apoptosis (Fig 3). The authors then present in vivo experiments performed in mice. They observed that STIL expression causes embryonic lethality, microcephaly and a reduced lifespan (Fig 4). Despite increased STIL mRNA levels they do not detect elevated STIL protein levels in adult tissues except for the spleen. They do not detect significant increase of centrosome amplification or aneuploidy in animal tissues (Fig 4) and they conclude of a STIL translational shut down in most adult tissues. The authors then assess the impact of STIL overexpression on tumor formation. They observed a reduced spontaneous tumor formation despite elevated STIL mRNA levels in both healthy and tumor (lymphomas) tissues of mice overexpressing STIL. They don't detect increased centrosome amplification and aneuploidy in lymphomas from STIL overexpressing mice compared to lymphomas naturally occurring in control animals (Fig 5). Finally, they found that STIL overexpression suppresses chemical skin carcinogenesis using a combination of tamoxifen induction of STIL in the skin with DMBA/TPA carcinogenic treatment (Fig 7). They link this effect to an increased number of centriole and a reduction in cycling cells number in the skin of STIL overexpressing mice (Fig 6).

      The manuscript is written in a clear manner. The experimental approaches are properly designed and the experimental methods are described in sufficient details. Most of the experimental data present a good number of replicates. The figures are generally well assembled despite some errors in a few panels/legends (see major and minor points). Most of the conclusions are supported by the experimental data. However, a few specific points or interpretations are not convincingly supported by the experimental data (see major points) and will need to be revised and/or reformulated.

      Major points:

      1. Figures 1D and F show that MEFs hemizygous (CMV-STIL+/-) and homozygous (CMV-STIL+/+) for STIL present similar level of centrosome amplification and aberrant mitosis. Although, despite these similarities the homozygous MEFs display about two time more micronuclei and chromosomes aberrations (Fig. 2). The authors explain this discrepancy by the fact that MEFs homozygous for STIL have reduced proliferation and an increased propension to stay in interphase compared to hemizygous MEFs (Fig. 3). I don't understand why an interphase arrest would lead to a higher chromosomal instability resulting in higher micronuclei formation and abnormal karyotypes since those phenotypes are the consequences of abnormal mitosis occurring in cycling cells. I would rather argue that Homozygous MEFs are more prone to cell cycle arrest because of mitotic errors, but those mitotic errors cannot be explained by the centrosome status or the mitotic figures quantified in homozygous MEFs. Therefore, the authors explanation written as: "Graded inhibition of proliferation and accumulation of cells in interphase explains why CMV-STIL+/- and CMV-STIL+/+ MEFs contain increasing frequencies of micronuclei and aberrant karyotypes (Fig. 2) despite similar levels of supernumerary centrosomes" is not right for me. The authors should reformulate this section of the manuscript so their conclusion fit their data. The differences between hemi and homozygotes MEFs regarding chromosome stability could come from mitotic errors they did not spot using fixed immunofluorescence images of mitotic MEFs. Thus, as an optional additional experiment, analyzing live mitosis of MEFs could potentially help reconciliate results from mitotic figures and from karyotypes.*

      We basically agree with the reviewer and have therefore reanalyzed our data on centriole numbers in a time-dependent manner. As already shown in Figure 3L of the initial manuscript version, the number of both CMV-STIL+/- and CMV-STIL+/+ MEFs with supernumerary centrioles increases with passaging from passage 3 (p3) to p6. Also, in this experiment amplified centrioles were more frequent in CMV-STIL+/+ compared to CMV-STIL+/- MEFs in both passages (p3 and p6) analyzed. We have therefore now pooled the data and substituted the former Figure panel 1D by these combined results. As the results of Figure 1F and especially those for the CMV-STIL+/+ MEFs had to rely on very low mitotic figure counts, because these cells only very rarely divide (as shown in Figure 3A; mitosis frequency of CMV-STIL+/+ MEFs 0.12%), we have now deleted Figure panel 1F from the manuscript. For the same reason - an extremely low proliferation and division rate of especially CMV-STIL+/+ MEFs - live cell imaging to detect different types of mitotic errors, is unfortunately not feasible.

      Figure 5 panel F does not support the claim of the main text and does not match the legend of the figure: In the text the authors wrote: "Ki67 immunostaining revealed that, ..., proliferation rates were elevated independent from lymphoma genotypes". If the authors claim and increased cell proliferation in lymphoma compared to lymph nodes, which is expected, they should show the data for the lymph node in the graph. In addition, in the legend the authors mentioned a "Percentage of Ki67-positive cells in healthy spleens and lymphomas from mice with the indicated genotypes." Since there are three genotypes and two tissue types but the figure presents a graph with only three bars did the Spleen and lymphoma data were combined? Or did some data were not inserted in the graph? Thus, since the data does not support the claim for an increased cell proliferation in lymphoma, the authors explanation for the increased protein level observed in these lymphomas (Fig. 5 panel E) is not supported. Therefore, the authors need to present the correct data in the figure or to change their conclusion. They will also need to correct the figure legend and to add a panel with images illustrating the Ki67 labelling in the different tissues in the figure.

      We apologize for this mistake and have corrected the legend to Figure panel 5F, which now reads: “Percentage of Ki67-positive cells in two B6-STIL, two CMV-STIL+/- and one CMV-STIL+/+ lymphoma. For comparison, frequencies of Ki67-positive cells in healthy lymph nodes from B6-STIL mice are displayed. Data are means ± SEM from at least two independent immunostainings per lymphoma or healthy lymph node. P-values were calculated using the one-way ANOVA with post-hoc Tukey test for multiple comparison. For space reasons, only statistically significant differences are displayed”.

         We agree with the reviewer that for comparison Ki67 immunostainings of healthy lymph node tissue was missing in the graph and have therefore added this information to the figure panel, which shows increased proliferation of lymphoma compared to normal lymph node cells. Also, a panel with images illustrating Ki67 labelling in healthy lymph node and lymphomas from different genotypes has been added to the figure (panel 5G).
      
      • *

      __Minor points:____* * __1. In the introduction, page 4 paragraph 3, the authors wrote: "To assess the impact of centrosome amplification on CIN, senescence, lifespan and tumor formation in vivo without interfering with extracentrosomal traits,..." they need to clarify what they meant by extracentrosomal traits.

      As requested by the reviewer we have modified the respective sentence, which now reads: “To assess the impact of centrosome amplification on CIN, senescence, lifespan and tumor formation in vivo with an orthologous approach without interfering with PLK4, we generated transgenic mouse models overexpressing the structural centrosome protein STIL, …”.

      • *

      In the 1st paragraph of the results, page 4, the authors wrote: "leads to ubiquitous transgene expression at levels similar to the CAG promoter used in most..." but there is no link to a figure presenting the mRNA levels in those mice (potentially Fig. 4F and Fig. S6). Also, in the references cited for comparison, to my knowledge, there was no measurement of Plk4 mRNA levels in tissues in the work from Marthiens and colleagues, in this work the authors assess the expression of the Plk4 transgene by investigating the presence of the protein.

      To show STIL transgene expression levels in our system, we have now linked Figure panels 1A (STIL mRNA expression in MEFs), 1B (STIL protein expression in MEFs) and Supplemental Fig. S2 (Supplemental Fig. S6 of the previous manuscript version showing STIL mRNA levels in healthy mouse tissues) to this statement as suggested. In the references now cited for comparison (Kulukian et al. 2015; Vitre et al. 2015; Sercin et al. 2016) PLK4 transgene mRNA (Kulukian et al. 2015; Sercin et al. 2016) and protein levels (Vitre et al. 2015) are shown.

      • *

      Page 5 second line the authors wrote: "Despite the graded increase in Plk4 expression, CMV-STIL+/- and, CMV-STIL+/+ MEFs exhibited a similar increase in supernumerary centrioles". The authors must meant increase in STIL expression or do they have data not shown about an increase of Plk4 expression? Then they explain this absence of difference in supernumerary centriole by the ability of "excess Plk4" to access the centrosome, again they probably meant STIL. Regarding this point and related to Major Point 1 it might be worth for the authors to quantify actual extra centrosomes in mitosis rather than cells with more than 4 centrioles in interphase (as in Fig. 1C, D). They might find differences in the number of centrosomes in hemizygous versus homozygous MEFs.

      We indeed meant STIL instead of PLK4 and have corrected the mistake. As described in our response to the reviewer’s major point 1 we have now reanalyzed our data on centriole numbers in a time-dependent manner. As already shown in Figure 3L of the initial manuscript version, the frequency of both CMV-STIL+/- and CMV-STIL+/+ MEFs with supernumerary centrioles increases with passaging from passage 3 (p3) to p6. Also, in this experiment amplified centrioles were more frequent in CMV-STIL+/+ compared to CMV-STIL+/- MEFs in both passages (p3 and p6) analyzed. We have therefore now pooled and substituted the former Figure panel 1D by these combined results.

      Page 5, in the first paragraph the authors mention "the rate of respective mitotic aberrations..." without defining the mitotic aberrations. For instance, in panel 1E a metaphase with 4 centrosomes is shown for CMV-STIL+/- while an anaphase with an unknown number of clustered centrosomes is presented for CMV-STIL+/+. Classifying the different types of aberrant mitotic figures (i.e: multipolar anaphases versus bipolar with clustered centrosomes) might help the authors identify differences between hemi and homozygous MEFS that may explain the differences in the proportions of chromosomes aberrations they present in Fig. 2.

      As described in our response to the reviewer’s major point 1 the number of mitotic figures that could be analyzed was extremely low, especially for CMV-STIL+/+ MEFs, which do only rarely divide (mitosis frequency of CMV-STIL+/+ MEFs 0.12%). Therefore, although certainly of value, classification of different types of mitotic aberrations is unfortunately not feasible.

      • *

      In Fig 4A the number of mice analyzed should be mentioned.

      After mating of B6-STIL transgenic animals with CMV-CRE mice and further breeding of successive generations, we obtained a total of 198 pups over four generations, 162 of which were born alive: 116 B6-STIL wildtype animals, 27 CMV-STIL+/- and 19 CMV-STIL-/- mice. We have now added these numbers to the figure legend.

      • *

      In Fig. 5E, the band corresponding to STIL protein is difficult to visualize in the B6-STIL control, it is therefore difficult to compare its level to the level of STIL protein in the CMV-STIL hemizygotes and homozygotes. If possible, it would improve the manuscript to present a blot with clearer results.

      We have tried to improve the quality by repeating the Western blot. Due to the small size of healthy mouse lymph nodes, resulting in low protein yields, only lysates from lymphomas were left, and these were of poor quality with a high lipid content. We therefore tried to delipidate the lymphoma lysates and hope that the result of the new blot is now somewhat clearer. Due to the low lymphoma frequency in CMV-STIL hemizygotes and homozygotes (only 2 in each case) we were unfortunately not able to prepare fresh lysates.

      Related to Figure 6B the authors wrote a "5 to 10 fold-increased expression..." in the text while panel 6B show a maximum of 8 fold increase.

      The respective statement has been rephrased according to the reviewer´s suggestion.

      __Reviewer #1 (Significance (Required)): ______ *Centrosome amplification is a demonstrated cause of genomic instability and tumor development as shown in multiple previous work performed in mice. In this work, Moussa and co-workers developed a mouse model that does not depends on Plk4 to trigger centrosome amplification but which depends on the overexpression of the centrosome structural protein STIL. This effort is welcome as previous works could not formally rule out potential role of Plk4, not related to its centrosome duplication function, on tumor formation. The authors show that their system is functional in MEFs where STIL overexpression drives centrosome amplification and aneuploidy. Unfortunately, in vivo, despite elevated level of STIL mRNA they do not detect centrosome amplification in tissues and consequently, they do not observe an increase rate of aneuploidy and tumor formation. This result is not surprising as previous studies using strong promoters (comparable to the one used to drive STIL expression in this study) to induce Plk4 overexpression led to similar results, i.e. an absence of centrosome amplification in adult tissues and no effects on tumor formation. Therefore, the results and the concepts proposed in this work are not novel but they reinforce previous studies showing the deleterious effect of high level of centrosome amplification on cells. This work also confirms that strong mechanisms, here the authors propose a translational shut-down, are preventing the apparition or the persistence of high level of centrosome amplification in animal tissues. By complementing existing results with the use of an alternate experimental approach this study will be of interest for the scientific community working on the basic biological mechanisms driving aneuploidy and tumor development.*

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)):______ *In this manuscript, Moussa et al. describe the effects of over-expressing the centriole duplication factor STIL in whole mice and with expression restricted to the skin. They find that over expression of STIL, similar to that of PLK4, induces centriole overduplication, abnormal mitoses, and genetic instability leading to cell arrest. Additionally, over-expressing STIL results in microcephaly, perinatal lethality and a shortened lifespan. In addition, they do not find that expression of the p53 R127H mutant alleviates the cell growth defect. Moreover, overexpression of STIL does not lead to increased general tumour formation and suppresses tumour formation in an induced skin tumour model.

      Although this is an interesting manuscript, the authors need address a number of issues before this manuscript can be recommend the manuscript for publication. Importantly, the manuscript lacks statistical analyses to support some of their conclusions, some figures should be quantified, and controls are missing in some cases. *

      __Major Issues____* * __1. Many of the figure panels lack appropriate statistical analyses to support the conclusions (see details below). This needs to be rectified.

      In view of the limited number of mice (due to an increased frequency of pups that died around birth) and the resulting impossibility of performing several (>3) independent experiments in many cases, we have decided to limit the statistics in the main text to a descriptive analysis without mentioning inferences (p-values). Nevertheless, we have now included the missing statistical analyses in the figure panels and/or legends. However, the reported p-values (*p≤0.05, **p≤0.01, ***p≤0.001; ns, not significant) should be interpreted as descriptive rather than confirmatory values.

      • *

      The authors suggest that the interpretation of PLK4 over-expression studies are hampered by the possibility of centriole/centrosome independent PLK4 roles and that STIL overexpression circumvents some of these issues. Although orthologous approaches to problems are always desired, STIL itself has also been implicated in other cellular processes, such as the Sonic hedgehog pathway (Carr AL, 2014) and in cell motility (Liu Y, 2020). In addition, the data presented in the manuscript are suggestive of a STIL function in the mouse that is independent of centriole number. The authors demonstrate that the amount of centriole over-duplication in MEFs containing a single copy of the STIL over-expression locus is equivalent to that of MEFs carrying two copies. However, in most other assays, the homozygous lines display more severe phenotypes, suggesting that STIL might have a function outside centriole duplication. The authors need to discuss this further in a revised manuscript.

      As described in our response to major point 1 and minor point 3 of reviewer 1 we have now reanalyzed our data on centriole numbers in a time-dependent manner. As already shown in Figure 3L of the initial manuscript version, the number of both CMV-STIL+/- and CMV-STIL+/+ MEFs with supernumerary centrioles increases with passaging from passage 3 (p3) to p6. Also, in this experiment amplified centrioles were more frequent in CMV-STIL+/+ compared to CMV-STIL+/- MEFs in both passages (p3 and p6) analyzed. We have therefore now pooled the data and substituted the former Figure panel 1D by these combined results, which show that, similar to other models, also regarding STIL overexpression the homozygous line displays a more severe phenotype, which does therefore per se not argue for a STIL function outside the centrosome. However, as a few recent studies indeed suggest additional roles of STIL, we have amended the respective passages in the revised version of the manuscript accordingly.

      • *

      Why did the authors use the p53 R127H mutant instead of a p53 knockout or null allele system? The R127H mutant has a gain-of-function phenotype and cells expressing this mutant display different phenotypes than a p53 null. The primary conclusion in one of the references cited by the authors (Caulin C, 2007) is that p53R127H is a gain-of-function mutant and behaves distinct from loss-of-function p53 mutations, such as deletions using floxed alleles. Throughout the manuscript, the authors use terms that suggest the R127H allele is equivalent to a loss of function mutant. Given that supernumerary centriole growth arrest is universally suppressed by inactivation of p53 it is somewhat surprising that this pathway is not active in response to STIL over-expression. The authors should confirm this key conclusion by depleting p53 in MEFs using RNAi, or by using mice where complete inactivation of p53 can be achieved.

      We agree with the reviewer that the p53-R172H mutant version of p53 is not equivalent to a p53 knockout. We have therefore and as suggested by reviewer 3 as well (see also our response to point 3 of reviewer 3) corrected the wording and have substituted “absence of p53” by “interference with p53 function” where appropriate. In addition, we now have added data to the manuscript, which show that neither p53 expression nor p53-S18 phosphorylation becomes induced during prolonged cultivation and passaging of CMV-STIL transgenic MEFs (see Figure 3B of the revised manuscript). Importantly, this finding is in line with a recent report showing that PLK4-induced extra centrosomes may not rely on p53 for tumor suppression and cell death induction (Braun et al.: Extra centrosomes delay DNA damage-driven tumorigenesis. Sci. Adv. 10: eadk0564, 2024). Similarly, it has been recently shown that centrosome amplification increases apoptosis independently of p53 in PLK4-overexpressing cells treated with DNA-damaging agents (Edwards et al.: Centrosome amplification primes for apoptosis and favors the response to chemotherapy in ovarian cancer beyond multipolar divisions. bioRxiv 2023.07.28.550973, 2023). Therefore, these findings and references have now been added to results and discussion sections of the revised manuscript.

         A plethora of p53-related findings in mouse models, including the majority of results on PLK4-induced tumor formation in mice, is based on p53 knockouts, a situation that is only rarely found in human cancers. In contrast, the p53-R172H missense mutation in mice corresponds to the p53-R175H mutation in human tumors, which has the highest occurrence in diverse human cancer types among all p53 hotspot mutations, and results in a transcriptionally inactive protein that accumulates in cells, similar to the majority of naturally occurring versions of mutant p53 (Yao et al.: Protein-level mutant p53 reporters identify druggable rare precancerous clones in noncancerous tissues. Nat Cancer 4: 1176-1192, 2023; Chiang et al.: The function of mutant p53-R175H in cancer. Cancers 13: 4088, 2021). We therefore believe that it more faithfully recapitulates the situation in p53-mutant tumors than a p53 knockout.
      
         Although basically an important and valid experiment, depleting p53 in STIL-transgenic MEFs using RNAi is not easily done as (i) transfection of MEFs per se is difficult and (ii) STIL-overexpressing MEFs do only slowly proliferate and are prone to senescence and apoptosis (see Figure 3), all phenotypes which are even further exacerbated after transfection. Generation of STIL-transgenic mice with complete inactivation of p53 on the other hand is an extremely time-consuming endeavor that would lead to a significant delay of publication of our results. Given that currently similar data are published by other groups (Braun et al.: Extra centrosomes delay DNA damage-driven tumorigenesis. Sci. Adv. 10: eadk0564, 2024; Edwards et al.: Centrosome amplification primes for apoptosis and favors the response to chemotherapy in ovarian cancer beyond multipolar divisions. *bioRxiv* 2023.07.28.550973, 2023), we do not think that this would be appropriate.
      

      __Minor Issues and details____* * __Figure 1 1. Panel E. It is unclear what the authors are calling an 'aberrant mitosis'. Typically an aberrant mitosis refers to chromosomal abnormalities such as multipolar spindles, anaphase bridges or micronuclei (which they quantify in Figure 2). The aberrant mitotic figures presented in Figure 1E show a clustered metaphase with 4 centrosomes (2 per pole; 2 centrioles per centrosome) for CMV-STIL+/- MEFs and a clustered telophase with 2 centrosomes (1 per pole; 5 centrioles per centrosome) for CMV-STIL+/+ MEFs. This is now specified in detail in the legend to Figure 1E.

      • *

      Panel E. Please include images representing a normal mitosis from control cells derived from B6-STIL mice.

      As suggested, we have now included a representative image of a normal mitosis from B6-STIL control mice.

      Figure 2____ 1. Panels B, E and F. Statistical significance is not indicated between B6-STIL and CMV-STIL+/- or CMV-STIL+/- and CMV-STIL+/+. The authors indicated a 'graded' phenotype which is qualitatively apparent, but should be backed by statistical analysis.

      We have now included a statistical analysis. However, and as already described in our answer to major issue 1 of this reviewer, the reported p-values should be interpreted as descriptive rather than confirmatory values due to the limited number of independent experiments.

      • *

      Can the authors indicate how they scored a tetraploid cell? Some of the cells are 100% tetraploid while others contain other aberrations.

      According to the International System for Human Cytogenomic Nomenclature (ISCN) version from 2020, polyploidy is defined by the modal numbers of chromosomes in the karyotype. A number of 81-103 chromosomes is called near-tetraploid, at which a hypotetraploidy (81-91 chromosomes) is distinguished from a hypertetraploidy (93-103 chromosomes) (An International System for Human Cytogenomic Nomenclature, Karger (2020), Eds.: McGowan-Jordan, Hastings, Moore). For mouse karyotypes respective numbers were recalculated on the basis of a diploid chromosome content of 40 instead of 46 chromosomes. To be strictly in accordance with this nomenclature, we have exchanged the term "tetraploid" by "near-tetraploid".

      __ Is the height of the rows in Panel D significant? What are the solid black rows?______ We thank the reviewer for this comment/observation. We have now increased the resolution of this part of the figure. Unfortunately, the resolution had deteriorated so much when the pdf file was created that individual lines were no longer recognizable. The height of the lines should be identical, as single lines correspond to the karyotypes of each metaphase cell analyzed, while chromosomes are plotted as columns. The solid black lines separate independently established MEF lines with the indicated STIL genotypes from each other. At least 20 metaphase cells per MEF line were analyzed. We have now explained these points in the figure legend.

      Figure 3____ 1. Panels C, F, G, and K require statistical analyses.

      We have now included the appropriate statistical analyses in the figure panels and/or legends. However, the reported p-values should be interpreted as descriptive rather than confirmatory values due to the limited number of independent experiments.

      • *

      Panel D should be quantified.

      We have now included a quantification of the protein bands in panels B, E (former panel D), and K of the revised manuscript and explained the quantification procedure in detail in the methods section.

      Panel E. mRNA expression is quantified in RPKM here, while GeTMM is used in Figures 3I and Supplementary Figures S2 and S6. Is there a reason this panel uses a different method? RPKM can be used for intra-sample comparisons, but is not ideal for comparison among different samples.

      We now uniformly quantify mRNA expression in GeTMM in all figures of the revised manuscript version as requested.

      • *

      Panel G. Can the authors show the original FACS profiles in Supplementary material?

      As requested, we have now included representative examples of original FACS profiles from the cell cycle analyses into Supplemental Figure S5.

      • *

      Panel H. Requires molecular weight markers

      Molecular weight markers for the DNA ladder (L) with the corresponding bp size have now been included into the Figure panel (formerly 3H, 3I in the revised version of the manuscript).

      • *

      __ Panel J. Missing B6-STIL control. Quantify Western blots.______ We have now included an immunoblot showing STIL protein expression levels in passage p1-p5 of B6-STIL control MEFs as well as a quantification of the protein bands into the Figure panel (formerly 3J, 3K in the revised version of the manuscript). The quantification procedure has been explained in detail in the methods section of the revised manuscript version.

      Figure 4____ 1. The authors mention 'Simultaneously, we found an increased frequency of pups that died around birth.' Can the data for this be included?

      After mating B6-STIL transgenic animals with CMV-CRE mice and further breeding of successive generations, we obtained a total of 198 pups over four generations, of which 162 were born alive: 116 B6-STIL wildtype animals, 27 CMV-STIL+/- and 19 CMV-STIL+/+ mice. We have now added these numbers to the figure legend. Stillbirths increased over the generations: while in the first generation after mating B6-STIL animals with CMV-CRE mice all pups (B6-STIL wildtype animals and STIL heterozygotes) were born alive, in the fourth generation (from mating CMV-STIL transgenic mice with each other) 54% of the pups were stillborn. We have now included this observation into the main text to further emphasize the impact of STIL overexpression on perinatal lethality.

      Panels B and D. Please include the data for CMV-STIL+/-.

      We now have included a representative H&E-stained histological section of a CMV-STIL+/- mouse brain into Figure panel 4D as suggested by the reviewer. For space reasons we have not added an extra image of a CMV-STIL+/- total brain into Figure panel 4B, as this does not add novel information.

      Panels C, F and K require statistics.

      As requested, we have now included the appropriate statistical analysis in the figure panels and/or legends. However, the reported p-values should be interpreted as descriptive rather than confirmatory values due to the limited number of independent experiments.

      • *

      Panel F. Include statistical analysis.

      We have now included the appropriate statistical analysis in the figure panels and/or legends. However, the reported p-values should be interpreted as descriptive rather than confirmatory values due to the limited number of independent experiments.

      • *

      Panel G/H. The levels of STIL in the CMV-STIL+/+ spleen are higher than the other samples, yet there is no concomitant increase in centriole overduplication. Can the authors comment on this?

      Interestingly, we indeed found a higher STIL protein expression level in spleen tissue from CMV-STIL+/+ as compared to B6-STIL control and CMV-STIL+/- mice. Nevertheless, the amount of splenocytes with supernumerary centrioles was only marginally increased in these animals. A similar finding has recently been described for B lymphocytes with upregulated PLK4 expression after PLK4 transgene induction by exposure to doxycycline in vivo (Braun et al.: Extra centrosomes delay DNA damage-driven tumorigenesis. Sci. Adv. 10: eadk0564, 2024). Here, the lack of B cells with supernumerary centrioles despite increased PLK4 levels was explained by increased apoptosis and thereby selection against and rapid loss of PLK4-overexpressing cells. In line, we show that CMV-STIL+/+ MEFs have increased rates of senescence and apoptosis (Fig. 4).

      • *

      __ Panel J. The font within the plots is difficult to read. ______ We thank the reviewer for this comment/observation. We have now increased the resolution of this figure panel, and the font is now outside of the plots.

      Figure 5____** s should be interpreted as descriptive rather than confirmatory values due to the limited number of independent experiments. No further statistical analysis can be done for panel D as in some cases (lymph node from B6-STIL mouse, lymphoma from CMV-STIL+/+ mouse) only one measurement exists.

      Panel F. The legend indicates that these data are from spleens and lymphomas. Is this correct? Would the results from non-lymphoma cells in the spleen mask the results from lymphoma cells?

      We apologize for this mistake and have corrected the legend to Figure panel 5F, which now reads: “Percentage of Ki67-positive cells in two B6-STIL, two CMV-STIL+/- and one CMV-STIL+/+ lymphoma. For comparison, frequencies of Ki67-positive cells in healthy lymph nodes from B6-STIL mice are displayed. Data are means ± SEM from at least two independent immunostainings per lymphoma or healthy lymph node. P-values were calculated using the one-way ANOVA with post-hoc Tukey test for multiple comparison. For space reasons, only statistically significant differences are displayed”.

      • *

      Panel F. The authors indicate that 'In line, assessment of lymphomas from B6-STIL control, CMV-STIL+/- and CMV-STIL+/+ mice by Ki67 immunostaining revealed that, corresponding to STIL protein levels, proliferation rates were elevated independent from lymphoma genotypes'. However, Ki67 levels, the marker for proliferation actually decreased in these samples indicating less proliferative cells. This needs to be clarified since the data shown appears to show the opposite of what is stated in the mansucript....

      As noticed by the reviewer further below, differences in the percentages of Ki67-positive, proliferating cells between lymphomas from B6-STIL, CMV-STIL+/- and CMV-STIL+/+ mice were statistically not significant. However, we have now for comparison added the results of Ki67 immunostaining of healthy lymph node tissue to Figure panel 5F, which show increased proliferation of lymphoma compared to normal lymph node cells. Also, a panel with images illustrating Ki67 labelling in healthy lymph node and lymphomas from different genotypes has been added to the figure (panel 5G). These data reveal that, independent from the genotype, proliferation rates of lymphoma cells are increased as compared to healthy lymph nodes, thereby further corroborating our assumption that STIL protein levels in lymphomas are increased as a consequence of their increased proliferation and independent from STIL transgene expression.

      • *

      Corresponding to point 3 above, the authors suggest that 'STIL protein expression is a consequence of increased lymphoma cell proliferation.' This hypothesis cannot explain STIL protein levels if proliferation has actually decreased.

      Please see our response to point 3 above.

      • *

      Corresponding to point 3 and 4 above, the actual data is marked as non-significant indicating there is actually no proliferative difference among the samples.

      This is correct. See also our comments to point 3 and 4 above.

      __ Panel 5I. The authors state that 'On the other hand, overall levels of chromosomal copy number aberrations were higher in lymphomas (mean gains + losses: 225.2 Å} 173.7 Mb) as compared to healthy tissues (mean gains + losses: 87.3 Å} 127.5 Mb; p=0.06), irrespective of their STIL transgene status (Fig. 4J; Fig. 5I), although the difference did not quite reach statistical significance.' The authors need to soften this statement since statistically, the samples are not different. For example, 'On the other hand, overall levels of chromosomal copy number aberrations appeared to trend higher in lymphomas as compared to healthy tissues irrespective of their STIL transgene status, although the difference did not quite reach statistical significance.'______ The statement was rephrased according to the reviewer´s suggestion.

      Figure 6____ 1. Panels A, B, and C require statistical analysis.

      We have now included the appropriate statistical analyses into panels A, B, and C in the figure panels and/or legends. However, the reported p-values should be interpreted as descriptive rather than confirmatory values due to the limited number of independent experiments.

      • *

      The figure legend references to panels C and D appear to be swapped.

      We thank the reviewer for this comment/observation. We have corrected this mistake.

      Panel F. Indicate that the samples are not significantly different.

      We have now included the appropriate statistical analysis including the indication that the samples are not statistically significantly different.

      • *

      __ Corresponding to point 3, the authors indicate that 'the proportion of Ki67-positive cycling cells was lower in tamoxifen-treated... ... although the difference did not quite reach statistical significance.' The authors need to soften this statement to reflect that the samples are not statistically different (i.e. 'appeared lower' or similar).______ The statement was rephrased according to the reviewer´s suggestion.

      __Figure 6 and 7 _ Do you have data for B6-STIL animals treated with and without tamoxifen? The experiments as shown demonstrate the differences between control and tamoxifen-treated animals of the same genotype, but it is unclear if any of these effects are due to the underlying genotypes or from tamoxifen itself. ___ The experiments presented in Figures 6 and 7 have not been performed in B6-STIL control mice with and without tamoxifen treatment.

      Supplemental Figure 1____ 1. Please include molecular weight marker for this and all panels showing PCR products.

      Molecular weight markers for the DNA ladder (L) with the corresponding bp size have now been included into all Figure panels showing PCR products as requested.

      The B6-STIL and CMV-STIL+/- lines should contain a larger MW band corresponding to the STIL-F and STIL-R PCR product. Please show if possible.

      We thank the reviewer for the important remark. We agree that there should be a large PCR product band at around 3000 bp containing the bacterial neomycin phosphotransferase gene (TK-neo-pA) and the STOP cassette in the B6-STIL control mice/MEFs, and two PCR product bands (large: 3000 bp, small: 410 bp) in the heterozygous CMV-STIL+/-mice/MEFs. When we began with genotyping, we did indeed observe both bands depending on the STIL background (see figure below). However, the band intensity of the larger PCR product was relatively weak (arrowheads) compared to the smaller PCR product, and its visibility was dependent on genomic DNA input and PCR efficiency. During the PCR optimization process, the PCR conditions were changed in such a way that the yield of the small band were increased despite small input amounts of genomic DNA, but at the expense of the large PCR product band (arrows). At the end of the optimization process the larger PCR product had almost disappeared, making the discrimination between heterozygous CMV-STIL+/- and homozygous CMV-STIL-/- DNA difficult. Therefore, we decided to additionally check for STOP cassette excision in a second PCR approach in parallel. In the genotyping results shown in Supplemental Figure S1B, which have been produced after PCR optimization, no larger STIL PCR product band was visible anymore.

      __Supplemental Figure 6 _ 1. The 'Spleen' sample is missing the B6-STIL control data. 'Liver' is missing CMV-STIL+/+. Please include or indicate why they are missing. The plot order of the samples differs for 'Liver' (red, black) compared to the others (black, red, blue). Indicate statistical significances. ___ We apologize for this mistake, have corrected the Figure (formerly Supplemental Figure S6, S2 in the revised version of the manuscript), and have included the missing spleen and liver samples.

      • *

      General issues ____ 1. The materials and methods indicate that HPRT and PIPB were used as reference genes, but only HPRT is referred to in the qPCR figure legend.

      We thank the reviewer for this comment/observation. As generally recommended (Vandesomele et al., Genome Biol 3(7): research0034.1-research0034.11, 2002; Kozer and Rapacz, J Appl Genet 54(4): 391-406, 2013) we used both reference genes for accurate normalization of qPCR in all experiments. We have now corrected this mistake in the figure legend.

      • *

      Figure panels 1F and 3C display 95% confidence intervals while others use SEM. Is there a reason for this?

      In the two referenced figures (former Figure 1F has been deleted from the manuscript, see also our comment to point 1 of reviewer #1 for reasons; Figure 3C of the former manuscript is now Figure 3D in the revised manuscript version) the endpoint variable was defined by whether individual cells in a single experiment showed a certain property or not (binary variables). By definition, these kinds of variables show a nonsymmetric error structure, which cannot be expressed properly by a single value such as the standard error (SEM), but can be covered correctly by a confidence interval. For the same reason, Fisher’s exact tests were employed to obtain p-values in these situations. In the other figures, the relevant endpoint variables were roughly normally distributed, either directly, or due to them being an average of many values. In this case, a symmetric SEM was thus considered sufficient, and t-tests were used for p-values. To make this clear in the figures, we used different display options to distinguish between error bars showing SEM or 95% CI.

      __Reviewer #2 (Significance (Required)): ______ *In this manuscript, Moussa et al. describe the effects of over-expressing the centriole duplication factor STIL in whole mice and with expression restricted to the skin. They find that over expression of STIL, similar to that of PLK4, induces centriole overduplication, abnormal mitoses, and genetic instability leading to cell arrest. Additionally, over-expressing STIL results in microcephaly, perinatal lethality and a shortened lifespan. In addition, they do not find that expression of the p53 R127H mutant alleviates the cell growth defect. Moreover, overexpression of STIL does not lead to increased general tumour formation and suppresses tumour formation in an induced skin tumour model. Although this is an interesting manuscript, the authors need address a number of issues before this manuscript can be recommend the manuscript for publication. Importantly, the manuscript lacks statistical analyses to support some of their conclusions, some figures should be quantified, and controls are missing in some cases. *

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): ______ Previously it has been proposed that supernumerary centrioles play important deleterious effects in vivo including increased tumorigenesis. However, the work was inconclusive because the way of inducing centriole amplification via the PLK4 kinase could have induced other effects besides supernumerary centrioles. To resolve this question, the authors generated a mouse model of centrosome amplification, in which the structural centriole protein STIL is overexpressed. Using this mouse model in vivo along with mutant mouse embryonic feeder (MEF) lines in vivo, the authors test out the role of centrosome amplification in vivo in animal development, lifespan, and tumorigenesis. They report both embryonic lethality, defects in brain development, and shortened life span in these mice. They also find that skin tumorigenesis is reduced in the mutant mice, and demonstrates that the STIL overexpression effects are not perturbed in a dominant negative p53 model. The authors demonstrate that STIL overexpression causes centrosome amplification accompanied by aneuploidy, which however is highly deleterious for cell fitness even in the absence of p53. Clearly, tissue corrective mechanisms lead to the elimination of cells with extra centrosomes and/or aneuploidy by impaired proliferation, senescence, and apoptosis. This finding is interesting and significant and seems worthy of dissemination to the broader readership.

      This study is thorough and well executed and there is a significant body of work that leads to solid conclusions. The data is convincing, and the figure are well presented. It was refreshing to read this paper, as it was not so cluttered with data that the message gets murky, yet the data was clearly very substantial. The text is clear and easy to follow.


      There really are only minor aspects of this paper that need correction, in my opinion. The text should be thoroughly checked for typos, few extra redundant words here and there, and a couple of confusing sentences.______ As suggested by the reviewer we have rechecked the manuscript for typos, redundancies, and confusing sentences and corrected where necessary and appropriate. __* *

      For example, the last sentence in abstract is confusing 'These results suggest that supernumerary centrosomes... [result in]... tumor formation' because it should read 'reduced tumor formation' or 'impairs tumorigenesis' or otherwise be written more clearly because it seems to convey the opposite message the way it is right now. ______ We thank the reviewer for this comment and have corrected the sentence, which now reads: “These results suggest that supernumerary centrosomes impair proliferation in vitro as well as in vivo, resulting in reduced lifespan and delayed spontaneous as well as carcinogen-induced tumor formation”. The p53 dominant negative mutant is not exactly a KO so it is not fair to say "in the absence of p53"; the verbiage should be corrected and checked throughout the paper - perhaps 'interfering with p53 normal function' is more appropriate.__ As suggested by the reviewer we have corrected the wording and have substituted “absence of p53” by “interference with p53 function” where appropriate. The sentence "Senescence- and apoptosis-driven depletion of the stem cell pool may explain reduced life span and tumor formation in STIL transgenic mice." from discussion is highly speculative and should be edited to clearly convey its speculative nature or removed entirely. ______ We agree with the reviewer and have deleted the sentence from the discussion section of the manuscript.

      __Reviewer #3 (Significance (Required)): ______ Clearly, tissue corrective mechanisms lead to the elimination of cells with extra centrosomes and/or aneuploidy by impaired proliferation, senescence, and apoptosis. This finding is interesting and significant and seems worthy of dissemination to the scientific community. It adds to previous work on another centriole related protein PLK4 kinase that led to very different conclusions.

    1. Many designers also rely on their own experiences to inform the work they do.

      I chose this section because it gives another perspective on certain design choices. As we all are aware, design is relative and many people have a lot of their own personal preferences for a 'good' or a 'bad' design. In relation to designing for equity and inclusion, I chose this text since I find it interesting how different cultures may have different perspectives, opinions, and solutions to their designs. For example, as a freelance graphic designer, I myself often rely on my own experience and exposure of knowledge to digital media. If my client were to ask me to make a design for their packaging according to whatever I'd like it to be, I would make the packaging based on my knowledge, my opinion, and my belief on what I think would be a good design. However, if my client were to have a more specific request on what they would want the packaging to be, I would cater the design to how they want it to be. Sometimes, their preferences can come across as questionable or unflattering in my opinion, but at the end of the day I remind myself that again these people are requesting these designs according to their own experience and knowledge on their culture's exposure which would influence their preferences in design. Hence why, this text is very impactful to me as it encapsulate a designer's entire relationship with a client in one sentence. Having soft skills such as: open-mindedness, flexibility, and compromise to the clients' needs are also important factors in becoming a good designer, not only the technical skills matter. As we do this we are indirectly including people from different backgrounds, experiences, and abilities to be involved in the design process in order to create a more effective and relevant end product.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the Reviewers

      We sincerely thank the Referees for providing important and constructive comments. We have addressed their concerns point-by-point as described below.

      Associated to Reviewer#1's comments

      *- Diploid embryos are used as controls. Gynogenetic diploids seem to be better controls to ensure that the observed phenotypes are not related to loss of heterozygosity. To limit the amount of work, the use of gynogenetic diploids could be restricted to spindle polarity and centrosome number experiments. *

      Response 1-1

      __[Experimental plan] __Following the reviewer's suggestion, we will conduct immunostaining of a-tubulin and centrin (for visualizing the spindles and centrioles, respectively) in gynogenetic diploids that will be generated by applying heat shock to gynogenetic haploid embryos during the 1st - 2nd cleavage stage. We will observe the head area of gynogenetic diploid larvae at 3-dpf when the haploid counterparts suffer particularly drastic centrosome loss and spindle monopolarization.

      • *

      • *

      *- As the authors discuss, it would be necessary to rescue centrosome loss to establish a causal relationship between centrosome loss and haploid viability. I certainly acknowledge that this is difficult (if not impossible), but it currently limits the significance of the results. *

      Response 1-2

      We agree that rescuing centrosome loss would provide an important advancement in understanding the cause of haploid syndrome in the context of our study. However, as the reviewer also pointed out in the above comment, this poses a significant technical challenge. As described in Discussion in the original manuscript, we have attempted to restore normal centrosome number through cell cycle modulations. However, we have not found a condition that rescues centrosome loss without damaging larval viability. As an alternative approach, we have also tried to induce centriole amplification by injecting mRNA encoding plk4, an essential centriole duplication inducer. However, this caused earlier embryonic death, precluding us from observing its effects on larval morphology after 1 dpf. The main challenge is that any treatment to increase centrosome number can cause centrosome overduplication, which is as deleterious to development as centrosome loss. Efforts to identify a key factor enabling the rescue of centrosome loss in haploid larvae are underway in our laboratory, which requires new explorations over several years and is beyond the scope of the present study. Reflecting on the reviewer's comment, we added a new sentence explaining the situation on this issue (line 395, page 19). To further discuss possible contributions of centrosome loss and mitotic defects to haploidy-linked embryonic defects, we also added a citation of a previous study reporting that depletion of centrosomal proteins caused mitotic defects leading to embryonic defects similar to those observed in haploid embryos in zebrafish (Novorol et al., 2013 Open Biology; line 380, page 19).

      __[Experimental plan] __Meanwhile, as a new trial to induce centriole amplification in a scalable and temporally controllable manner, we plan the following experiment, which can be conducted within the time range of the revision schedule: We will investigate the effects of low dose treatment of a plk4 inhibitor centrinone B on tissue growth and viability of haploid larvae. A recent study reported that centrinone B had complicated effects on the centriole duplication process, which is highly dose-sensitive (Tkach et al., 2022 Elife, PMID: 35758262). While it blocks centriole duplication at sufficiently high concentrations for blocking plk4 activities, it paradoxically causes centriole amplification at suboptimal conditions, presumably though over-stabilizing plk4 by blocking its autophosphorylation-dependent degradation (while its centriole duplicating function remains active). Since a previous study showed that centrinone B is also effective in zebrafish embryos (Rathbun et al., 2020 Current Biology, PMID: 32916112), we try to find optimal centrinone B treatment condition that potentially restores tissue growth or viability of haploid embryos. If we find such a rescuing condition, we will address the principle of the rescuing effects by investigating the possession of centrioles in mitotic cells in these haploid larvae.

      *- Some experiments are not, or arguably, quantified/statistically analyzed. *

      o Figure 2, Active caspase level. Larvae are sorted into three categories, and no statistical test is performed on the obtained contingency table. A Fisher'*s exact test here, or much better, the active caspase-3 levels should be quantified, instead of sorting larvae into categories. *

      Response 1-3

      We apologize that we showed only "zoomed-out" images of the immunostained embryos in the original figures (Fig. 2A), which precluded a clear presentation of the haploidy-associated aggravation of apoptosis and mitotic arrest. We could clearly distinguish cleaved caspase-3- and pH3-positive cells from non-specific background staining with an enlarged view of the same immunostaining data. Therefore, to quantitatively evaluate the extent of the haploidy-linked apoptosis and mitotic arrest, we compared the density of these cells within the right midbrain. This new quantification demonstrated a statistically significant increase in cleaved caspase-3- or pH3-positive cells in haploids compared to diploids.

      In the revised manuscript, we added the enlarged views of cleaved-caspase and pH3 immunostaining (Fig. 2B) and new quantifications with statistical analyses (Fig. 2C). Accompanying these revisions, we omitted the categorization of the severeness of the apoptosis, which was pointed out to be subjective in the reviewer#2's comment (see Response 2-3). We rewrote the corresponding section of the manuscript to explain the new quantitative analyses (line 143, page 7).

      o Same comment for 3E-F. Larvae are scored as Scarce, Mild or Severe. Looking at Fig S3A, I see one mild p53MO embryo, but the two others are not that different from 'severe' cases, which would completely change the contingency table. Again, a proper quantification would be better.

      Response 1-4

      We also quantified the frequency of cleaved caspase-3-positive cells in control and p53MO larvae (original Fig. 3E and F) as described in Response 1-3. While conducting the cell counting with enlarged images, we realized that staining quality within the inner larval layers of morphants was relatively poor in these experiments. This problem precluded us from counting cleaved caspase-3-positive cells within the inner larval layers. Therefore, we tentatively quantified only the surface larval layers of these morphants and found that cleaved caspase-3-positive cells were significantly reduced in haploids upon depletion of p53. We currently show this quantification in Fig. 3G of the revised manuscript. While this quantification confirmed the trend of p53MO-dependent decrease in apoptosis, we think it more appropriate to newly conduct the same experiment with better quality of the staining to apply the same standard of quantification for Fig. 3 as Fig. 2.


      __[Experimental plan] __For the reason described above, we propose to re-conduct immunostaining of cleaved caspase-3 in control and p53MO-injected haploid larvae to improve the visibility of the inner layer of the larvae for better quality of the quantitation.

      Meanwhile, we revised Fig. 3 by adding an enlarged view of immunostaining in Fig. 3F and omitting the subjective categorization shown in the original Fig. 3F and S3A. We plan to replace these data with new images and quantification to be obtained during the next revision. We also rewrote the main text to update these changes (line 166, page 8).

      *o Figure 4D-E, no stats. *

      Response 1-5

      We conducted the ANOVA followed by the post-hoc Tukey test for new Fig. 4D and the Fisher exact test with Benjamini-Hochberg multiple testing correction for new Fig. 4E. Please note that statistical analyses were conducted after adding the data from original Fig. 6B-C following the reviewer's suggestion (see also Response 1-6).

      *o Figure 6, Reversine treated haploid should be compared to haploid embryos (on the graphs and statistically). If no specific controls have been quantified for this experiment, data could be reused from previous figures, provided this is stated. *

      Response 1-6

      The live imaging data shown in original Fig. 4C-E and Fig. 6A-C were obtained within the same experimental series conducted in parallel at the same period under the same experimental condition. In the original manuscript, we separated them into two different figures according to the logical flow. However, following the reviewers' comments (see also Response 2-1), we realized it more appropriate to show them as a single figure panel as in the original experimental design. Therefore, we moved the reversine-treated haploid data from the original Fig. 6A-C to Fig. 4C-E to facilitate direct comparison among conditions with statistical analyses (see also Response 1-5).

      *o Rescue by p53MO and Reversine, it would be nice to also include diploid measurements on the graphs, so that the reader can appreciate the extent of the rescue. *

      Response 1-7

      Following the reviewer's comment, we added control MO-injected or DMSO-treated diploid larval data in the corresponding graphs in Fig. 3I and 6G, respectively. Please refer to Response 2-6 for further discussion on the extent of the rescue.

      Minor comments:

      *- Lines 221-223, authors claim that centriole loss and spindle monopolarization commence earlier in the eyes and brain than in skin. I am note sure I see this in Fig. S5. It could as well be that the defect is less pronounced in skin. *

      Response 1-8

      We rewrote the manuscript to include the possible interpretation suggested by the reviewer on the result (line 225, page 11).

      • *

      - Lines 227-229, authors claim that 'The developmental stage when haploid larvae suffered the gradual aggravation of centrosome loss corresponded to the stage when larval cell size gradually decreased through successive cell divisions'. I did not get that. Doesn'*t cell size decrease since the first division? Fig 5D shows that cell size decreases all along development. *

      Response 1-9

      We agree that the original sentence implies, against our intention, that cell size does not decrease before the developmental stage mentioned here. To correct this problem, we rewrote the corresponding part of Discussion as below (line 230, page 11):

      "Since the first division, embryonic cell size continuously reduces through successive cell divisions during early development (Menon et al., 2020). Cell size reduction continued at the developmental stage when we observed the gradual aggravation of the centrosome loss in haploid larvae."

      *- Some correlations are used to draw conclusions: *

      o Line 301-303. "The correlation between centrosome loss and spindle monopolarization indicates that haploid larval cells fail to form bipolar spindle because of the haploidy-linked centrosome loss."*. As stated by the authors, this is a correlation only. I agree it points in this direction. *

      Response 1-10

      We added a note to the corresponding sentence to draw readers' attention to the discussion on the limitation of the study with respect to the lack of centrosome rescue experiment (line 332, page 16).

      O Line 305-308. "*Interestingly, centrosome loss occurred almost exclusively in haploid cells whose size became smaller than a certain border (Fig. 5), indicating that cell size is a key determinant of centrosome number homeostasis in the haploid state." This one is more problematic. There is no causal link established between cell size and centrosome number homeostasis. It could very well be that some unidentified problem induces both a reduction in cell size and the loss of centrioles. *

      Response 1-11

      To avoid an over-speculative description, we deleted the subsentence "indicating that cell size is a key determinant of centrosome number homeostasis in the haploid state." (line 336, page 17). We also added a new sentence, "Alternatively, it is also possible that other primary causes, such as the lack of second active allele producing sufficient protein pools induced cell size reduction and centrosome loss in parallel without causality between them." to discuss the possibility raised by the reviewer (line 348, page 17), in association with another comment from the reviewer #3 (see also Response 3-3).

      • *

      *I have concerns regarding the significance of the reported findings. Haploid zebrafish embryos show numerous developmental defects (some as early as gastrulation, as previously shown by the authors, Menon 2020), and they die by 4 dpf. That they experience massive apoptosis at day 3 does not seem very surprising, and that inhibiting p53 transiently improves the phenotype is not a big surprise. *

      Response 1-12

      Many reports have revealed tissue-level developmental abnormalities in haploid embryos since the discovery of haploid lethality in vertebrates more than 100 years ago. This has stimulated speculation of underlying causes of haploid intolerance for decades. However, there have been surprisingly few descriptions of cellular abnormalities underlying these tissue defects, precluding an evidence-based understanding of the principle that limits developmental ability in haploid embryos. Our findings of the haploidy-linked p53 upregulation and mitotic defects illustrate what happens in the dying haploid embryos at a cellular level. These findings would provide an evidence-based frame of reference for understanding why vertebrates cannot develop in the haploid state and also provide clues to controlling haploidy-linked embryonic defects in future studies. We added a new section in Discussion to discuss the importance of addressing the haploidy-linked defects at a cellular level (line 276, page 14).

      *This reminds me of the non-specific effects of morpholino injection, which can be partially rescued by knocking down p53. *

      Response 1-13

      We believe the reviewer refers to the previous findings that different morpholinos generally have off-target effects activating p53-mediated apoptosis (e.g., Robu et al., 2007 PLoS Genet, PMID:17530925). However, p53 upregulation and apoptosis aggravation were also observed in uninjected haploid embryos free from morpholinos' artificial effects (Fig. 2, Fig. 3A, and B). To further address this issue, we plan to compare the frequency of cleavage caspase-3-positive cells between uninjected and control MO-injected haploids after revising the immunostaining of morphants in the original Fig. 3E-F (see Response 1-4 for details).

      *The observation of mitotic arrest and mitotic defects and the observation that haploid cells often lack a centrosome is interesting. However, I felt that the manuscript suggested that these observations were novel and could explain the haploid syndrome specifically in non-mammalian embryos, when the authors reported the same observations in human haploid cells as well as in mouse haploid embryos (Yaguchi 2018). To me, this manuscript mainly confirms that their previous observation is not mammalian specific, but at least conserved in vertebrates. *

      Response 1-14

      As we originally wrote (line 341, page 17 in the original manuscript), we think these haploidy-linked cellular defects are conserved among mammalian and non-mammalian vertebrates. To improve the clarity of our interpretation, we rewrote a corresponding part of the manuscript (line 50, page 2).

      *While I am no expert at centrosome duplication, I find the observation that haploidy leads to centrosome loss very intriguing, but have the impression that this manuscript falls short of improving our understanding of this phenomenon. *

      Response 1-15

      We express our gratitude to the reviewer for being interested in our findings. We hope the revisions made in the manuscript and the new results provided by the planned experiments will strengthen the contribution of this study to our understanding of haploidy-linked cellular defects.

      • *

      • *

      Associated to Reviewer#2's comments

      - Lack of proper controls in many experiments. For example, in the experiments where the authors treated haploids with reversine to suppress the SAC, there was no no-treatment control (Fig. 6A-C).

      Response 2-1

      We addressed the same point in__ Response 1-6__. In the original manuscript, we separately presented control and experimental conditions in the same experiment series in Fig. 4 and Fig. 6. We rejoined them in Fig. 4 as in the original experimental design. Please refer to __Response 1-6 __for further details.

      • In Fig. 6D, when a DMSO control was included, the control fish were from 3 dpf while the reversine-treated fish were from 0.5-3 dpf. This is a big flaw in experimental design, especially considering the authors were looking at mitotic index, which is hugely impacted by developmental time. *

      Response 2-2

      In this experiment, we treated haploid larvae with either DMSO or reversine from 0.5 to 3 dpf, isolated cells from the larvae at 3 dpf, and subjected them to flow cytometry. Both DMSO- and reversine-treated larval cells were from 3-dpf larvae. Therefore, this experiment does not have the problem noted by the reviewer. To improve the clarity of the description of the experimental design, we rewrote the corresponding part of the figure legend (line 646, page 34).

      - Subjective and inadequate data quantification. In the immunostaining experiments to detect caspase-3 and pH3, the authors either did not quantify at all and only showed single micrographs that might or might not be representative (for pH3), or only did very subjective and unconvincing quantification (for caspase-3). Objective measurements of fluorescence intensity could have been done, but the authors instead chose to categorize the staining into arbitrary categories with unclear standards. In example images they showed in the supplementary data, it is not obvious at all why some of the samples were classified as "mild" and others as "*severe" when their staining did not appear to be very different. *

      Response 2-3

      We apologize that we showed only "zoomed-out" images of the immunostained embryos in the original figures (Fig. 2A, 3E, and 6F), in which the distribution of individual cleaved caspase-3- or pH3-positive cells could not be clearly recognized. We added the enlarged view of identical immunostaining where these cells were clearly visualized in a countable manner (Fig. 2B, 3F, and 6D). Following the reviewer's suggestion, we newly conducted quantification by comparing the density of these cells within the right midbrain in haploids and diploids.

      This new quantification demonstrated the haploidy-linked increase in cleaved caspase-3- or pH3-positive cells and a reversine-dependent decrease in pH3-positive cells. We added these new quantifications with statistical analyses to the revised manuscript (Fig. 2C and 6E). Accompanying these revisions, we omitted the categorization of the severeness of apoptosis, which was pointed out to be subjective. We rewrote the corresponding section of the manuscript to explain the new quantitative analyses (line 143, page 7; line 260, page 12).

      While we also quantified cleaved caspase-3-positive cells in control and p53MO larvae in the original Fig. 3E, we realized that the staining quality of the inner larval layers of these morphants was relatively poor and could not apply the same standard of quantification as Fig. 2. Though we confirmed a statistically significant reduction in cleaved caspase-3-positive cells upon p53 depletion by quantified limited number of confocal sections (shown in Fig. 3G, please see also Response 1-4 for details), we decided to re-conduct this experiment for improving the staining quality to apply the same criteria of quantification for Fig 3 as Fig. 2 (Experimental plan is provided in Response 1-4).

      Please note that we also tried to evaluate the extent of apoptosis and mitotic arrest based on the fluorescence intensity of organ areas. However, background staining outside the dead cell area precluded the precise quantification.

      Additionally, the authors claimed that "*clusters of apoptotic cells" were only present in haploids but not diploids or p53 MO haploids, but they did not show any quantification. From the few example images (Fig.S3A), apoptotic clusters can be seen in p53 MO treated fish. Also, in some cases, the clusters were visible only because those fish were mounted in an incorrect orientation. For example, in Fig. S3A, control #2, that fish was visualized from its side, thus exposing areas around its eye that contained such clusters. These areas are not visible in other images where the fish were visualized from the top. *

      __Response 2-4 __

      We agree that the definition of "apoptotic clusters" was ambiguous in the original manuscript. We also agree that the visuals of the clusters could be affected by sample conditions, making them less reliable criteria for judging the severity of apoptotic upregulation in larvae. Following the reviewer's suggestion, we newly conducted apoptotic cell counting (Response 2-3), which recapitulated more reliably ploidy- or condition-dependent changes in the extent of apoptosis. Therefore, we decided to omit the description of the clusters in the new version of the manuscript.

      *- Subpar data quality. Aside from issues with qualification, the IF data was not convincing as staining appeared to be inconsistent and uneven, with potential artefacts. *

      Response 2-5

      We apologize that the zoomed-out images in the original figures did not appropriately demonstrate the specific visualization of individual apoptotic or mitotic cells. As described in Response 2-3, we added enlarged views of the immunostaining to the revised manuscript, in which these individual cells are clearly distinguished from non-specific background staining (Fig. 2B, 3F, and 6D). Because of the poorer staining of inner layers of control and p53 morphants, we plan to re-conduct immunostaining for Fig. 3 and Fig. S3 (please refer to Response 1-4 for further detail). The current version of immunostaining and quantification in these figures will be replaced in the next revision.

      - Unsupported and overstated claims. There were many overstatements. For one, in line 268, the authors claimed that "*the haploidy-linked mitotic stress with SAC activation is a primary constraint for organ growth in haploid larvae", while what they were actually showed was that reversine treatment, which suppresses the SAC, was partially rescued 2 out of the 3 growth defects they assessed, to such a small extent that the difference between haploid and haploid rescue was only Response 2-6

      Following the reviewer's comment, we added control MO-injected or DMSO-treated diploid larval data in the corresponding graphs in Fig. 3I and 6G, respectively. We newly estimated the relative extent of the recovery in Results (line 174, page 8; line 268, page 13).

      Reflecting the estimation, we rewrote the manuscript to discuss that haploidy-linked cell death or mitotic defects are a partial cause of organ growth retardation but that there could be other unaddressed cellular defects that also contribute to the growth retardation (line 305, page 15). We also discussed the possibility that incomplete resolution of cell death by p53MO or mitotic defects by reversine treatment may have limited their rescue effects on organ growth retardation (line 303, page 15). We also toned down several descriptions in our manuscript (lines 48 and 50, page 2; line 111, page 5; line 271, page 13; line 298, page 15; line 403, page 20) to achieve a more balanced interpretation on the potential contributions of cell death and mitotic defects to the formation of haploid syndrome.

      In association with this issue, we also discussed the difficulty of assuming a priori "fully-rescued" haploid larval size in this context. This is because even normally developing haploid larvae in haplodiplontic species tend to be much smaller than their diploid counterparts. We newly cited a few cases of haplodiplontic species where haploids are smaller than or the same in size as diploids (line 307, page 15).

      *With so many fundamental flaws, the data seem unreliable and the paper does not meet publishable standards. *

      Response 2-7

      We express our gratitude to the reviewer for providing important suggestions to improve the quality of analyses, data presentations, and interpretations in this study. We sincerely hope that one-by-one verifications of the points raised by the reviewer have improved the credibility of the paper and made it suitable for publication.

      *The low quality of the analysis makes the significance low. *

      *Reviewers have expertise in vertebrate embryogenesis and ploidy manipulation. *

      Response 2-8

      We hope that by addressing and solving the concerns pointed out by the reviewer, we could have clarified the significance of the study.

      Associated to Reviewer#3's comments

      *There seem to be a discrepancy between the microscopic images from Figure 2A and the quantification of pH3 positive cells using flow cytometry in Figure 4. According to the flow cytometric results the proportion of pH3 positive cells is about 3 times higher in haploid larvae compared to the control. The increase in mitotic cells in the imaging results however seems much more drastic. It would be helpful if the authors explain here. *

      Response 3-1

      Following comments provided by other reviewers (see also Response 1-2, 1-4, and__ 2-3__), we newly compared the frequency of pH3 positive cells between the immunostained haploid and diploid larvae. In this new analysis, pH3-positive cells were 6.4 times more frequent in haploids than in diploids, which is a more substantial difference than the one estimated based on the flow cytometric analysis.

      The apparent discrepancy between the immunostaining and flow cytometric quantification would arise because pH3-positive mitotic cells tended to be more localized on the surface than in the inner region of larvae. This inevitably results in higher pH3-positive cell density in immunostaining, in which only larval surface is analyzed. To discuss this point, we newly conducted pH3 immunostaining in haploid larvae made transparent using RapiClear reagent and showed a vertical section of 3-d reconstituted larval image of pH3 immunostaining in Fig. S4E. We rewrote the manuscript to add our interpretation of this issue (line 652, page 34).

      *Mitotic slippage that the authors observe to be increased in the haploid larvae to up to 5% of cells should result in an increase in the number of aneuploid cells. I am wondering why this is not recapitulated in the analyses of the DNA content in Figure S1. *

      Response 3-2

      A possible interpretation would be that the limited viability of newly formed aneuploid progenies precluded the detection of these populations in flow cytometric analyses. We discussed the possible generation of aneuploid progenies with our interpretation of their absence in the flow cytometric analyses in Discussion (line 293, page 14).

      *Discussion: *

      *I find the explanation of centrosomal loss due to depletion of centrosomal protein pools in the cytoplasm during drastic cell reduction interesting. I wonder if the reduction in size is not necessarily caused by the reduction in cells, but rather the result of the absence of a second active allele that produces centrosomal proteins? *

      Response 3-3

      We added the possible interpretation provided by the reviewer to the corresponding part of Discussion, in association with another comment from reviewer #1 (line 348, page 17; see also Response 1-11).

      Reviewer #3 (Significance (Required)):

      • *

      *Overall, I find the study interesting even to a broader audience since diploid development is a fundamental feature of most animals. The authors also manage to discuss their findings on the consequences of haploidy in this bigger context of the restricted diploid development in animals. The study is very well-written even to non-experts. *

      Response 3-4

      We express our gratitude to the reviewer for providing positive comments on the significance of our findings. We sincerely hope that one-by-one verifications of the points raised by the reviewer further improve the quality of the paper.

      I am not an expert of the literature describing previous characterizations of the consequences associated with haploid cell development in animals, which is why I cannot comment on the novelty of their study. Based on my expertise on centromeres and genome organisation I can however assess the results regarding the mitotic defects observed in haploid larvae (see comments).

      Response 3-5

      We sincerely thank the reviewer for providing constructive suggestions and critiques based on the expertise.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      RESPONSE TO REVIEWS_RC-2024-02383

      We thank all the reviewers for their comments and suggestions. Our point-by-point response is shown below, in bold.

      —----------------------------------------------------------------------------------------------------------------------------

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: the work presented by the authors detail how pharmacological inhibition of the rate limiting one carbon metabolic enzyme DHFR by the drug methotrexate increases the lifespan of yeast and worms. Furthermore, placing aged mice on dietary folate and choline restriction potentially enhanced metabolic plasticity but did not significantly increase lifespan with sex specific differences observed.

      The findings in this manuscript are very interesting and important to our understanding of the conserved mechanisms that regulate longevity through one carbon metabolism. This is especially significant in light of the current folate intake and supplementation in the adult human population. The manuscript, however, requires major revisions. Please see comments below for details.

      Major comments:

      1. The overall tone in this manuscript is colloquial and conversational in nature. A third person academic style and tone, while avoiding the use of subjective descriptive terms would improve the quality of this text. Using terms such as "appeared less diverse", "results are remarkable ...strikingly more pronounced", "possibly positive outcomes" , "appear younger...for unknown reasons", "little Uracil", "tended to be higher", "roughly proportional", "slightly higher", "as a rough readout", and many other examples from the text should not be used in a scientific manuscript. The language should be academic, scientific, precise, and non-ambiguous. A thorough revision of the manuscript with substantial changes to the language and tone is necessary prior to publication. RESPONSE: Thank you for your feedback on the manuscript's tone. We revised most of the expressions mentioned by the reviewer. We note, however, that these phrases were used along with numbers and statistics. Hence, there was no lack of specifics, and readers could quickly evaluate the conclusions. We strive for a balance between scientific rigor and readability to maintain accessibility for a diverse audience.

      In the results section, we find multiple instances where the results are interpreted and extensively discussed. This should be reserved for the discussion section. The results section should be used to simply report the findings in a detailed manner.

      RESPONSE: We appreciate the suggestion on the integration of interpretation within the Results section. Upon review, we have clarified the presentation of our findings, ensuring a more distinct separation from interpretive commentary. Brief explanations remain to aid the reader's comprehension in light of the complex data, aiming to keep the flow and coherence of the manuscript and prevent overextension of the Discussion section (already ~1,300 words long). We welcome specific suggestions for further refinement.

      The materials and methods section is severely lacking in details in some areas. For example, no details were provided regarding how the worm lifespans were conducted and previous work of collaborators were referenced instead. Important details such as worm numbers, biological and technical replicates, solid agar vs liquid culture, temperature, use of FUdR, antibiotics, transfer frequency, methods of scoring, etc... are lacking. Other details such as the preparation of the plates (Was MTX incorporated into the agar, seeded with the bacterial lawn, or liquid culture was used), storage conditions, age of the plates when lifespan started, how was the UV killing of the lawn verified etc...

      many other methods subsections lack crucial details. Please carefully review the methodology and include sufficient pertinent details.

      RESPONSE: The number of worms assayed in each case were shown in each figure, as described in the legend. We now also added all the information requested by the reviewer in the methods section. The text now reads:

      “Briefly, the assays were done on solid agar nematode growth media (NGM) plates prepared fresh before each experiment. The bacterial lawn was exposed twice to a UV dose of 120mJ/cm2 using a UVC-515 Ultraviolet Multilinker (Ultra-Lum, Inc.). Streaking these UV-exposed bacteria to fresh LB agar plates (1% w/v tryptone, 0.5% w/v yeast extract, 1% w/v sodium chloride) produced no visible colonies. Methotrexate, or the ATIC inhibitor, was first dissolved in dimethyl sulfoxide (DMSO) and then added to the media used to prepare the plates after autoclaving (the media were kept in a 50°C water bath until the plates were poured). Mock-treated control plates contained only DMSO. At the start of each experiment, a sufficient number of eggs were collected from plates without any drugs and then placed on plates containing the indicated doses of each compound tested. After hatching and progression to the adult stage, animals were transferred to new plates (marked as the start of the lifespan assay) containing the drug tested and fluorodeoxyuridine (FUDR; dissolved in water), added at 50μM to block hatching of new animals. The plates were scored at least every other day until all the worms died. If an animal responded to gentle touch, it was scored as alive, otherwise a death was recorded, and the animal was removed from the plate. Worms were transferred to fresh plates as needed (e.g., if there was evidence of microbial contamination, dryness/cracks on the agar surface, consumption of the bacterial lawn, or hatching of new animals that escaped the FUDR block). The reported lifespans were compiled from several independent experiments done over several months (9-10 months for the methotrexate experiments and 4-5 months for the ATIC inhibitor), each scored by multiple individuals (4-5 persons per experiment). No experiments were excluded from the analysis.”

      In the worms, interventions that impact germline proliferation can extend lifespan. Methotrexate is known to impact germline proliferation and can lead to toxic developmental effects and germline arrest. Was fecundity impacted by methotrexate using the dosages found to extend lifespan?

      RESPONSE: We did not score fecundity in our experiments.

      The authors stated that UV killed bacteria was used in the worm experiments but did not provide the reasoning for it. Virk had concluded that reduced bacterial pathogenicity is responsible for the lifespan extension and not the worm's OCM. How does your work agree with or refute these previous findings?

      RESPONSE: The dose of methotrexate used by Virk et al was very high, so it is difficult to directly compare it to our experiment. Nonetheless, we do not think there is any contradiction. We added the following in the text to clarify this point:

      “At higher doses (10-100μΜ), methotrexate did not extend lifespan (not shown), in agreement with (Virk et al., 2016), who treated adult animals with a very high dose of methotrexate (220μM). We also note that the bacteria used to feed the worms in our experiments were killed by ultraviolet radiation to exclude any impacts from bacterial folate metabolism, which is known to affect worm lifespan (Virk et al., 2016, 2012).”

      The authors state that AICAR (100 uM administration to the worms (no experimental details were given) increases their lifespan and concluded that this is proof that manipulation of 1C metabolism promotes longevity. There are 2 concerns here; first, AMPK activation leads to inhibition of TOR and that has been shown to promote longevity in multiple models. While we agree that a significant crosstalk between TOR and OCM exists, this experiment does not necessarily contribute to the argument that the authors are making. Second, it has been established by multiple groups that inhibition (RNAi and pharmacological) of DHFR1, TYMS1, SAMS1 and possibly other OCM enzymes leads to lifespan extension in worms. These findings provide stronger evidence that OCM regulates organismal longevity.

      RESPONSE: We acknowledged prior research on lifespan extension and do not claim our use of the ATIC inhibitor as the first evidence of 1C metabolism's impact on longevity. Rather, our findings complement existing studies from us and several other groups (including the examples mentioned by the reviewer, which we had cited) by introducing novel evidence of lifespan increase through this specific inhibitor in C. elegans. Please also note that we added a detailed description of the experiment in the Methods, as suggested in a previous comment.

      In the mouse study, the authors do not provide a rationale on why a folate and choline deficient diet was adopted as opposed to only a folate deficient diet. Additionally, we assume that the diets did not contain antibiotics (succinyl sulfathiazole) to reduce microbiome folate production since it was not mentioned. Were wire bottom cages used to eliminate coprophagy? Were there any significant differences between male and female serum folate levels that could have contributed to the endpoints. Was only a subset of samples assayed for total folate? (fig 2b shows a possible n of 6 per group?). If no antibiotics and no wire bottom cages were used, mice can maintain adequate folate levels from coprophagy without developing signs of anemia. Please discuss these details as it helps clarify the conditions used.

      RESPONSE: Excellent points, and we have now added this information (see Material and Methods):

      “We note that when designing experiments to assess the consequences of folate limitation, it is common to control both folate and choline intake to ensure that the observed effects are due to the restriction of folate (Beaudin et al., 2011) because the presence of choline can mask the effects of folate deficiency. Choline can be oxidized to betaine, which provides methyl groups for converting homocysteine to methionine, independent of the folate cycle. Choline can also be incorporated into phosphatidylcholine, a major methyl ‘sink’ in the cell, through the Kennedy pathway. Lastly, we did not use any antibiotics to interfere with the microbiome nor wire bottom cages to eliminate coprophagy. Wire bottom cages were used only in the metabolic chamber experiments.”

      Were there any significant differences between male and female serum folate levels that could have contributed to the endpoints. Was only a subset of samples assayed for total folate? (fig 2b shows a possible n of 6 per group?).

      RESPONSE: ____Regarding folate levels, no significant sex differences were observed. We assayed all the animals we had at 120 weeks of age, the euthanasia endpoint, as shown in Figure 2B. There were fewer females than males in both diets.

      There are instances in the results section where statements were made implying that there are differences observed "slightly higher", "negative association" when it is not statistically significant. There can be either statistically significant differences/correlation or not. please be precise in your wording.

      RESPONSE: We have revised the Results section to ensure that qualitative descriptions such as "slightly higher" are only used when supported by appropriate statistical evidence. We have listed____ all the relevant numbers in each case after performing thorough and robust statistical analyses. We note, however, that mentioning qualitative descriptors is not always unwarranted, as long as they are factual.

      Graying was observed less significantly in the F/C- group according to the authors. However, no quantitative assessment was made, and it is merely observational.

      RESPONSE: It is not clear how to quantify graying non-invasively. Hence, we simply took photographs.

      Inference to inhibition of mTOR was made, but mTOR protein and phosphorylation levels were not performed. The authors did perform western blotting on ribosomal S6 protein, however no assessment of the downstream mTOR targets P70S6k1 and 4EBP are shown.

      RESPONSE: This is a good suggestion.____ We added a new experiment, looking at 4EBP1 phosphorylation (see new Figure S2). The results mirror those looking at S6 phosphorylation.

      Can the change in RER in F/C- mice compared to controls be explained by the increased adiposity in these animals?

      RESPONSE: We do not know. The relationship between adiposity and respiratory exchange rate can be quite complex. The increased adiposity of male mice limited for folate may lead to higher RER, reflecting perhaps a greater reliance on carbohydrate metabolism. But this is very speculative, especially since these mice are not obese. It is unclear how the improved metabolic plasticity could be associated with adiposity for the females.

      How was the microbiome normalized between groups prior to the beginning of the experiment? (fecal slurry gavage, bedding exchange, cohabitation, none of the above?). There is no mention of this crucial step in the materials and methods section. Furthermore, additional details regarding the microbiome analysis are required (analysis pipeline, read depth, denoising, software, data processing, PCA analysis, etc...). it is not sufficient to state that Zymo performed the analysis.

      RESPONSE: We now revised the text and added a detailed description of the methods, as follows:

      “There was no microbiome normalization between groups prior to the beginning of the experiment. Mouse fecal pellets were gathered by positioning the mice on a paper towel beneath an overturned glass beaker. A minimum of three fecal pellets from each animal were transferred into cryovials using sterile forceps. The samples were preserved at -80°C and shipped to Zymo Research, where they were processed and analyzed with the ZymoBIOMICS® Shotgun Metagenomic Sequencing Service (Zymo Research, Irvine, CA).For DNA extraction, the ZymoBIOMICS®-96 MagBead DNA Kit (Zymo Research, Irvine, CA) was used according to the manufacturer’s instructions. Genomic DNA samples were profiled with shotgun metagenomic sequencing. Sequencing libraries were prepared with Illumina® DNA Library Prep Kit (Illumina, San Diego, CA) with up to 500 ng DNA input following the manufacturer’s protocol using unique dual-index 10 bp barcodes with Nextera® adapters (Illumina, San Diego, CA). All libraries were pooled in equal abundance. The final pool was quantified using qPCR and TapeStation® (Agilent Technologies, Santa Clara, CA). The final library was sequenced on the NovaSeq® (Illumina, San Diego, CA) platform. The ZymoBIOMICS® Microbial Community DNA Standard (Zymo Research, Irvine, CA) was used as a positive control for each library preparation. Negative controls (i.e. blank extraction control, blank library preparation control) were included to assess the level of bioburden carried by the wet-lab process.

      Raw sequence reads were trimmed to remove low quality fractions and adapters with Trimmomatic-0.33 (Bolger et al., 2014): quality trimming by sliding window with 6 bp window size and a quality cutoff of 20, and reads with size lower than 70 bp were removed. Antimicrobial resistance and virulence factor gene identification was performed with the DIAMOND sequence aligner (Buchfink et al., 2015). Microbial composition was profiled with Centrifuge (Kim et al., 2016) using bacterial, viral, fungal, mouse, and human genome datasets. Strain-level abundance information was extracted from the Centrifuge outputs and further analyzed to perform alpha- and beta-diversity analyses and biomarker discovery with LEfSe (Segata et al., 2011) with default settings (p > 0.05 and LDA effect size > 2).”

      What is an "easily distinguishable gut microbiome" and "appeared less diverse"?

      RESPONSE: To clarify these points, __w__e now edited as follows:

      “The different sex and diet groups had an easily distinguishable gut microbiome, occupying different areas of principal component analysis graphs (Figure 5A), based on Bray-Curtis β-diversity dissimilarity indices (Knight et al., 2018). The intestinal microbiome of male mice on the F/C- diet was not statistically less diverse (p=0.222, based on the Wilcoxon rank sum test; Figure 5 - Supplement 1).”


      a two-dimensional plot using two principal components would be more suitable for image 5A and allow for better visualization of the clustering of the groups.

      RESPONSE: We tried displaying the data on a multipanel (3 panels per group, 12 total) two-dimensional figure, but the result is more confusing. Since the sample number is small (n=6 animals per group), the 3D graphs are visually adequate and more pleasing. They are also the standard way of representing this kind of data.

      Since the authors suggest that the microbiome could be a source of 1C metabolites (including natural folate), it is important to clarify if coprophagy is involved.

      RESPONSE: We agree and have added the information as requested.

      How are inflammatory cytokines and marker levels linked to reduced anabolism and immune function in non-challenged animals?

      RESPONSE: ____We do not make any claims for such links if that is what the reviewer implied. If the intent was more towards speculation, we suspect one could imagine various situations. For instance, nutrients may be more heavily used during inflammation to support immune cell responses instead of central anabolic processes in other tissues, limiting the building blocks available for tissue growth and repair. Since we do not see major changes in inflammatory cytokines, we prefer not to speculate about possible links.

      When discussing the epigenetic analysis, the authors state "no changes in the DNA methylation from liver samples.." and "groups appear younger than expected". Please clarify these statements. Additional details are needed regarding the analysis performed and the choice of methylated loci and methods. Please reference the epigenetic clock or model that was used and if was developed for the same strain and sub-strain of mice. Is it using a modified "Hovarth" mouse DNA age epigenetic clock? If so, provide the necessary details and a possible explanation for the discrepancy other than "unknown reasons"

      __RESPONSE: ____The assay is based on the "Hovarth" mouse DNA age epigenetic clock, for the strain we used (C57BL/6). We have now added a detailed description, which we received from the company, as follows (see Materials and Methods): __

      "Liver samples (~15mg) collected at euthanasia were placed in 0.75mL of 1X DNA/RNA Shield™ solution (Zymo Research, Irvine, CA), shipped to Zymo Research, and processed with DNAge® Service according to their established protocols. Briefly, after DNA extraction, the EZ DNA Methylation-Lightning Kit (Zymo Research, Irvine, CA) following the standard protocol was used for bisulfite conversion. Samples were enriched specifically for the sequencing of >1000 age-associated gene loci using Simplified Whole-panel Amplification Reaction Method (SWARM®), where specific CpGs are sequenced at minimum 1000X coverage. Sequencing was run on an Illumina NovaSeq instrument. Sequences were identified by Illumina base calling software then aligned to the reference genome using Bismark. Methylation levels for each cytosine were calculated by dividing the number of reads reporting a "c" by the number of reads reporting a "C" or "T". The percentage of methylation for these specific sequences were used to assess DNA age according to Zymo Research's proprietary DNAge® predictor which had been established using elastic net regression to determine the DNAge®."

      As for a possible explanation for the discrepancy, since all our "groups appear younger than expected," unfortunately, other than "unknown reasons," we have none to offer. Nonetheless, the critical point for this study is that we saw no diet effects, regardless of where the company's assay draws the baseline.

      Regarding Uracil misincorporation, the liver contains significant stores of folate as it is the main hub for several critical OCM reactions (Phospholipid methylation is a major one). Earlier studies used antibiotics with or without coprophagy prevention measures to induce a state of folate depletion to induce uracil incorporation in various tissues of rodent models. There is some controversy whether dietary folic acid restriction/methyl donor restriction alone will lead to uracil misincorporation when there is no apparent depletion or anemia. Please discuss your specific experimental procedures and how it agrees or disagrees with the published literature.

      __RESPONSE: We have now added the experimental details, as suggested in a previous comment. Since we do not see uracil misincorporation, we prefer not to comment on the published literature for possible links between misincorporation and anemia. __

      The section discussing RPS6 needs to be rewritten and it is difficult to understand.

      RESPONSE: We revised the text, which now reads:

      “____Immunoblot analysis of liver tissue samples gathered at the time of euthanasia revealed variability in the detected values across individual mice. When examining the male mice, we observed that, on average, those fed the F/C- diet had approximately half the amount of phosphorylated RPS6 (P-RPS6) compared to those on the F/C+ diet. However, due to high variability in the measured values, the overall differences in P-RPS6 levels between the two dietary groups did not reach statistical significance (Figure 7 - Supplement 1; p>0.05, based on the Wilcoxon rank sum test).”

      Furthermore, as stated previously, considering phosphorylation of mTOR and its downstream targets 4EBP and S6K1 will give a clear indication of proliferative signaling.

      RESPONSE:____ As we mentioned above, we have now added the suggested 4EBP experiment (see new Figure S2).

      Additionally, these pathways are impacted by feeding status, diurnal cycles, and sex. Were these factors controlled prior to sacrifice? Were the animals sacrificed at the same time? In a fed or unfed state?

      RESPONSE: The animals were sacrificed at the same time, with no feeding limitations.

      The western blots provided in supplementary files show uneven protein loading across lanes (ponceau stain). No loading control is shown such as B-actin. A separate blot is used for total and phosphorylated proteins as opposed to gently stripping the membrane of the phosphorylated bolt and re-incubating with the antibody for total. While normalizing phosphorylated to total protein levels will eliminate some of the variability in the author's method. The uneven loading may introduce errors in the calculated ratios.

      RESPONSE: The uneven loading across mouse samples is inconsequential. We report the ratio of phospho-RPS6 to the total amount of RPS6 ____within____ each mouse sample. These ratios were then compared among the different animals and diet groups. We also note that stripping could introduce other artifacts if it is not uniform across all the blot areas.

      While the authors referenced older studies utilizing low dose methotrexate on rodents and provided a composite lifespan based on these findings, why was dietary folate and choline restriction used instead of a low dose methotrexate in mice in the current study? Please provide a rationale for this approach.

      __RESPONSE: First, in the context of current folate fortification policies, we reasoned that testing dietary folate limitation late in life would be more informative. Second, three of us (M.P., B.K.K., and M.K.) proposed to the Interventions Testing Program at the National Institutes of Health to test whether low-dose methotrexate extends lifespan in mice. The proposal was accepted, and the study is ongoing (the ITP decided to test methotrexate at 0.2ppm, starting at 14 months of age; _https://www.nia.nih.gov/research/dab/interventions-testing-program-itp/supported-interventions_). __

      Minor comments:

      1. While the authors make compelling arguments that lower folate intake later in life may promote healthy aging, an important consideration in the human population that a considerable percentage of older individuals may be consuming an excessive amount of folate due the combination of fortification and voluntary supplementation. An alternate hypothesis that could apply to humans and lab models is that the existing levels of exposure to folate/folic acid may be accelerating the aging process and promoting disease in later life. __RESPONSE: Perhaps, but as we describe in the text (2nd paragraph in the introduction): __

      “...analyses ‘did not identify specific risks from existing mandatory folic acid fortification’ in the general population (Field and Stover, 2018). This conclusion neither refutes nor contradicts the idea that a moderate decrease in folic acid intake among older adults may improve healthspan. Merely because high folic acid intake does not harm the health of older adults does not negate the possibility that a lower folic acid intake might enhance health.”

      The common C57BL/6j is being referred to as the "long lived strain". Is this relative to mice in wild conditions? There are many transgenic C57bl/6 strains that live considerably longer. Please clarify if this is meant to describe the aged mice used in the experimental process.

      RESPONSE: ____This was from a comprehensive comparison of many different inbred strains. We apologize for omitting the citation, which we have now added____ (Yuan et al, 2009).

      While the authors state early in the manuscript that longevity was not a measured outcome in the mouse study, the manuscript contains statements discussing animal survival in the results and survival curves (figure 2). This gives the impression that the study was planned as a survival analysis initially and since no difference was observed between the experimental groups during the earlier stages, the secondary endpoints of health span analysis were adopted. Either approach does not detract from the significance of the study's findings. Further clarity on the approach would be beneficial to the readers.

      RESPONSE: The study was designed, and the Animal Use Protocol was institutionally approved for healthspan, not lifespan. The number of animals we used did not have sufficient power to detect lifespan differences. Note that, at least for males, very few animals had died by 120 weeks, our approved euthanasia endpoint. However, it was important to report that folate limitation did not adversely affect overall survival during the analysis time frame.

      For yeast culture conditions, what are the folate sources and content? Is there added folic acid similar to cell culture conditions where supraphysiological concentrations are used in standard mediums (RPMI and DMEM).

      RESPONSE: The yeast media we used ____were undefined (YPD, see Materials and Methods). The source of folate in this media is “yeast extract,” which is generally considered to contain very high amounts of folate (it was used decades ago to treat anemia and folate deficiency in pregnant women). Note also that, unlike animals, yeast can synthesize folate.

      In the metabolism section, the authors make statements such as "the differences were minimal" , "probably were due..", "minimal effects", "apparent increase", "tended to be", "little uracil" etc.. please refrain from using subjective language and use precise scientific terms.

      RESPONSE: Please see our earlier response to this comment.

      Figure 2-c, there is a typo, Weeks not months

      RESPONSE: Corrected. Thank you!

      ** Referees cross-commenting**

      while we generally agree with the other reviewer's concerns, we find that reviewer 3 rejection of the authors conclusion without considering the evidence presented in the context of what is currently known in the field potentially limiting. Multiple groups have shown that manipulation of OCM enzymes (DHFR, TYMS, SAMS) can extend lifespan in worms. the recent report Antebi's group (Annibal et al. Nature Com, 2021) provides strong evidence that OCM is central to longevity regulation in worms and mice and that folate intake can interact with and modulate organismal longevity. while this manuscript findings are not conclusive, I think it is premature to dismiss it completely. perhaps the alternative is to discuss the limitations of this approach and interpret the results (or the lack of significant differences) in order to help guide future research into this important subject. generalizing rodent results to human is always going to be a limiting factor in this type of work. Mice have significantly higher circulating folate. additionally, DHFR activity (the rate limiting enzyme in folate OCM) in rodents can be up to 100 times higher than its human equivalent. another consideration is that mice, similar to other rodents, engage in coprophagy, thereby recycling and supplementing bacterially produced folate in the absence of antibiotics in the diet. Therefore, mice placed of dietary folate restriction in the absence of antibiotics do not develop signs of anemia or deficiency. Therefore, it could be argued that there is no loss of nutrients in mice in this scenario and that supplementation at the arbitrarily recommended level of synthetic folic acid (2mg/kg day) or higher could impact health and aging. Similarly , in humans excess folate intake has been controversially associated with a number of deleterious health effects. It is important not to dismiss these reports and encourage further research into this subject that impacts a significant percentage of the human population due to the widespread use of supplements.

      RESPONSE: We thank the reviewers for their evaluation of the work we presented. We have also added the following in the discussion, expanding the limitations of the study:

      “Since mice engage in coprophagy, microbiome contributions to folate metabolism are bound to be substantial in this species. There are also significant differences in folate status between mice and people. For example, people have lower levels (~10-15 ng/mL) of serum folate than mice (Bailey et al., 2015), and the activity of DHFR, an enzyme essential for maintaining tetrahydrofolate pools -the folate form used in 1C reactions, maybe only 2% of that in rodents (Bailey and Ayling, 2009). Hence, mice are likely more refractory to a low folate dietary intake.”

      Reviewer #1 (Significance (Required)):

      Significance:

      A major strength of this study is that the authors show that manipulation of OCM either through pharmacological inhibition or dietary restriction can impact organismal longevity in a conserved manner across species from yeast to worms and mammals. These findings provide compelling evidence that folate intake and metabolism in humans should be rigorously researched as potential regulator of aging. These findings complement and agree with a recent report by Antebi's group (Annibal et al. Nature Com, 2021) highlighting that long-lived worm and mice strains exhibit similar metabolic regulation of one carbon metabolism. In the same report low levels of folate supplementation partially or completely abrogated the lifespan extension in some models. This study provides additional evidence that restricting OCM through drugs or dietary restriction can significantly impact healthspan and lifespan. Additionally, it raises the question whether excessive folate intake in aged adults may have potentially deleterious effects on health and longevity. The limitations of this study can be seen in the overall lack of significant impact of the dietary intervention on the health metrics that were measured in mice. The study does not provide strong evidence that restricting folate and choline intake will produce favorable effects on health. Similarly, no significant impact on mice lifespan was observed based on the partial lifespan analysis. Further clarity is needed regarding the experimental procedures and methods used. The study, nonetheless, is an important step towards investigating the role of folate and OCM in regulating mammalian healthspan and lifespan. Future studies can expand on these findings and investigate whether OCM interventions that are started in early life can produce significant and measurable effects on longevity and health in mammals. The findings here provide a conceptual and incremental advance in our understanding of these complex interactions.

      These findings are important to the research communities especially in the areas of longevity, metabolism, and nutrition.

      RESPONSE: We appreciate the recognition of our work's significance in furthering understanding of longevity, metabolism, and nutrition. We would also like to stress that this study is not an incremental advance. We believe our study's focus on dietary folate limitation ____in aged mice____ represents a novel and more radical contribution, considering the lack of prior research in this specific context, underscoring the distinctiveness and importance of our findings.

      —---------------------------------------------------------------------------------------------------------------

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: In this manuscript they investigate whether disruption of the folate cycle can slow ageing/improve health in yeast, worms and mice. There are a few experiments in yeast and C. elegans but the rest is a meta analysis of some old data on folate-deprived mice and their own study of mice on a diet with and without folic acid and choline. The find that various interventions of the folate cycle extend lifespan in yeast and worms, that the old study suggest mice live longer without folic acid supplementation and that there is no change to healthspan with mice without folic acid and choline in the diet late in life and that these mice show some positive benefits. Analysis of the microbiome and the transcriptomics suggest small changes to the microbiota and changes in gene expression. Overall the authors conclude that biosynthetic processes have been inhibited without negative effects on healthspan.

      Major comments

      1. The two worm lifespan experiments in Fig 1 show very different controls despite the methods stating that the conditions were the same. Controls can vary from one experiment to another but the difference is striking. It would be good to have supplementary data about the number of repeats and other data about these experiments. RESPONSE: We also noted the difference. However, we believe our conclusions are valid and robust because we used only experiment-matched controls for each comparison. We now describe in detail how the experiments were done (see revised Materials and Methods). Lastly, the two compounds were tested years apart from different individuals, and the different lifespans of the controls could arise from differences in the media batches, temperature control, etc.

      The diet lack folic acid and choline yet the conclusions are only about folate. The choline aspect of the diet needs to be acknowledged as a potential factor.

      RESPONSE: As we mentioned above, we have now added this information (see Material and Methods):

      “We note that when designing experiments to assess the consequences of folate limitation, it is common to control both folate and choline intake to ensure that the observed effects are due to the restriction of folate (Beaudin et al., 2011) because the presence of choline can mask the effects of folate deficiency. Choline can be oxidized to betaine, which provides methyl groups for converting homocysteine to methionine, independent of the folate cycle. Choline can also be incorporated into phosphatidylcholine, a major methyl ‘sink’ in the cell, through the Kennedy pathway. Lastly, we did not use any antibiotics to interfere with the microbiome nor wire bottom cages to eliminate coprophagy. Wire bottom cages were used only in the metabolic chamber experiments.”

      The authors argue that the effects on the mice are not mediated effects on the diet by the microbiome because there is not a statistical effect on diversity. However they do show a clear difference at the metagenomic level that fits with a metabolic difference. It also ignores work in C. elegans showing that inhibition of bacterial folate synthesis increases lifespan, not by decreasing folate supply but because lowered bacterial folate prevents an age-accelerating activity in the bacteria (Virk et al 2016). It has also been shown that a breakdown product of folic acid can be taken up by bacteria and influence ageing (Maynard et al 2018). I do not think the evidence is strong enough to discounted that the changes seen in the mice are not mediated by microbes.

      RESPONSE: We do not state that “changes seen in the mice are not mediated by microbes”. On the contrary, we agree with the reviewer that the microbiome likely contributes significantly, and we hope this is conveyed in the text. We also agree with the references the reviewer pointed out, which we cite (see also our response to point#5 of reviewer 1).

      Minor comments

      1. It had been shown a long time ago that sams-1 mutants in C. elegans extend lifespan. MTX is likely to influence SAMS levels. This point needs to mentioned. RESPONSE: Thank you. We added the reference.

      Page - 6 "folate accelerates worm aging". This statement is not correct and is not what Virk et al 2016 suggests.

      RESPONSE: We revised it to the following: “____It has been reported that treating worms with high levels of methotrexate (220μΜ) at the adult stage did not extend their lifespan ____(Virk et al., 2016)____”.

      Page 7. "at 100μM, a dose similar to the one used in mice with metabolic syndrome (Asby et al., 2015)." It's not valid to compare the concentration of a drug in the media in a C. elegans experiment to a dose given to mice.

      RESPONSE: We appreciate the reviewer's point on comparing drug dosages across species. The intention was to provide a reference point for the concentration used rather than suggesting a direct equivalence with outcomes. We recognize the complexities of cross-species dosage comparisons and have amended the text to clarify that the mention of dosage is for contextual purposes only.

      ** Referees cross-commenting**

      I would like to add that it is important to consider whether there are in fact negative effects of folic acid given in later life and this is one of the only studies that addresses this question in a mammalian model, and thus needs to be reported, once the issues raised have been addressed.

      __RESPONSE: As we mentioned in a comment from reviewer 1 and describe in the text (2nd paragraph in the introduction): __

      “...analyses ‘did not identify specific risks from existing mandatory folic acid fortification’ in the general population (Field and Stover, 2018). This conclusion neither refutes nor contradicts the idea that a moderate decrease in folic acid intake among older adults may improve healthspan. Merely because high folic acid intake does not harm the health of older adults does not negate the possibility that a lower folic acid intake might enhance health.”

      Reviewer #2 (Significance (Required)):

      The main strength of this manuscript is that it examines the effect of mice given a folate and choline deficient diet late in life and finds mostly positive effects. This finding challenges the dogma that folate

      —--------------------------------------------------------------------------------------------------

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Blank/Polymenis and colleagues explore how reduced folate metabolism impacts aging. While folate supplementation is known to benefit the development and health of young people, little is known about the impact of this substrate at advanced ages. The paper consists of two parts: 1) blocking folate metabolism in yeast and C. elegans while measuring lifespan (reproductive or age of death); 2) measuring a vast array of traits in mice where folate (and choline) is removed from the diet starting at age 1 year. The second approach is most central to the paper's theme, and the authors conclude their 'data raise the exciting possibility that ... reduced folate intake later in life might be beneficial." However, I do accept this conclusion. Instead, the overwhelming fact is that there were no changes in any phenotype due to the absence of F/C in the older animals. Loss of this nutrient is neutral, although perhaps bad for the kidney. In my view, the authors misinterpret their very basic results: loss of dietary folate has no impact on aged mice (one strain, at that). And there is no way to generalize this simple conclusion to humans.

      RESPONSE: ____We respectfully disagree with the reviewer's assessment of our study's conclusions and its significance. With the primary focus on evaluating the effects of reduced folate intake in aged mice, we explored a comprehensive range of healthspan markers and molecular analyses. Contrary to the reviewer's assertion, our data demonstrate significant outcomes such as altered body weight and metabolic parameters in mice subjected to folate restriction, along with insights into molecular changes indicative of lower anabolism.

      The reviewer's interpretation that folate limitation has no observable impact on aged mice overlooks the nuanced findings presented in our study. While acknowledging the neutral effects observed in some phenotypes, we contend that our results collectively contribute to a deeper understanding of the implications of late-life folate restriction. It is unwarranted to dismiss these findings.

      Generalizing findings from model systems to humans is indeed complex, as noted by the reviewer. However, our study, alongside existing literature, provides valuable insights that warrant consideration and further exploration. We stand by the rigor of our methodology, the diversity of data presented, and the significance of our results in enhancing knowledge on the impact of folate metabolism in aging models.

      There are other issues throughout the work that need to be addressed but given weakness on its key argument, I will not elaborate these points.

      __RESPONSE: Since the reviewer offered no specifics on “other issues,” we cannot respond. We hope, however, that we have addressed them in our response to the other reviewers’ comments. __

      Reviewer #3 (Significance (Required)):

      Blank/Polymenis and colleagues explore how reduced folate metabolism impacts aging. While folate supplementation is known to benefit the development and health of young people, little is known about the impact of this substrate at advanced ages.

      RESPONSE: ____We concur with the reviewer's observation regarding the knowledge gap surrounding the impact of reduced folate metabolism on aging, particularly in advanced stages of life, which ____is why our study significantly contributes to the field. As we mentioned above, not only do we report that some healthspan metrics were improved in folate-limited animals (e.g., body weight, improved metabolic plasticity), but our study also offers for the first time a comprehensive biomarker analysis of folate limitation late in life (e.g., metabolite and mRNAs changes associated with lower anabolism, lower IGF1 levels in females). ____This original contribution enhances our understanding of the complex interplay between folate metabolism and aging.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary: the work presented by the authors detail how pharmacological inhibition of the rate limiting one carbon metabolic enzyme DHFR by the drug methotrexate increases the lifespan of yeast and worms. Furthermore, placing aged mice on dietary folate and choline restriction potentially enhanced metabolic plasticity but did not significantly increase lifespan with sex specific differences observed. The findings in this manuscript are very interesting and important to our understanding of the conserved mechanisms that regulate longevity through one carbon metabolism. This is especially significant in light of the current folate intake and supplementation in the adult human population. The manuscript, however, requires major revisions. Please see comments below for details.

      Major comments:

      1. The overall tone in this manuscript is colloquial and conversational in nature. A third person academic style and tone, while avoiding the use of subjective descriptive terms would improve the quality of this text. Using terms such as "appeared less diverse", "results are remarkable ...strikingly more pronounced", "possibly positive outcomes" , "appear younger...for unknown reasons", "little Uracil", "tended to be higher", "roughly proportional", "slightly higher", "as a rough readout", and many other examples from the text should not be used in a scientific manuscript. The language should be academic, scientific, precise, and non-ambiguous. A thorough revision of the manuscript with substantial changes to the language and tone is necessary prior to publication.
      2. In the results section, we find multiple instances where the results are interpreted and extensively discussed. This should be reserved for the discussion section. The results section should be used to simply report the findings in a detailed manner.
      3. The materials and methods section is severely lacking in details in some areas. For example, no details were provided regarding how the worm lifespans were conducted and previous work of collaborators were referenced instead. Important details such as worm numbers, biological and technical replicates, solid agar vs liquid culture, temperature, use of FUdR, antibiotics, transfer frequency, methods of scoring, etc... are lacking. Other details such as the preparation of the plates (Was MTX incorporated into the agar, seeded with the bacterial lawn, or liquid culture was used), storage conditions, age of the plates when lifespan started, how was the UV killing of the lawn verified etc... many other methods subsections lack crucial details. Please carefully review the methodology and include sufficient pertinent details.
      4. In the worms, interventions that impact germline proliferation can extend lifespan. Methotrexate is known to impact germline proliferation and can lead to toxic developmental effects and germline arrest. Was fecundity impacted by methotrexate using the dosages found to extend lifespan?
      5. The authors stated that UV killed bacteria was used in the worm experiments but did not provide the reasoning for it. Virk had concluded that reduced bacterial pathogenicity is responsible for the lifespan extension and not the worm's OCM. How does your work agree with or refute these previous findings?
      6. The authors state that AICAR (100 uM administration to the worms (no experimental details were given) increases their lifespan and concluded that this is proof that manipulation of 1C metabolism promotes longevity. There are 2 concerns here; first, AMPK activation leads to inhibition of TOR and that has been shown to promote longevity in multiple models. While we agree that a significant crosstalk between TOR and OCM exists, this experiment does not necessarily contribute to the argument that the authors are making. Second, it has been established by multiple groups that inhibition (RNAi and pharmacological) of DHFR1, TYMS1, SAMS1 and possibly other OCM enzymes leads to lifespan extension in worms. These findings provide stronger evidence that OCM regulates organismal longevity.
      7. In the mouse study, the authors do not provide a rationale on why a folate and choline deficient diet was adopted as opposed to only a folate deficient diet. Additionally, we assume that the diets did not contain antibiotics (succinyl sulfathiazole) to reduce microbiome folate production since it was not mentioned. Where wire bottom cages used to eliminate coprophagy? Were there any significant differences between male and female serum folate levels that could have contributed to the endpoints. Was only a subset of samples assayed for total folate? (fig 2b shows a possible n of 6 per group?). If no antibiotics and no wire bottom cages were used, mice can maintain adequate folate levels from coprophagy without developing signs of anemia. Please discuss these details as it helps clarify the conditions used.
      8. There are instances in the results section where statements were made implying that there are differences observed "slightly higher", "negative association" when it is not statistically significant. There can be either statistically significant differences/correlation or not. please be precise in your wording.
      9. Graying was observed less significantly in the F/C- group according to the authors. However, no quantitative assessment was made, and it is merely observational. Inference to inhibition of mTOR was made, but mTOR protein and phosphorylation levels were not performed. The authors did perform western blotting on ribosomal S6 protein, however no assessment of the downstream mTOR targets P70S6k1 and 4EBP are shown.
      10. Can the change in RER in F/C- mice compared to controls be explained by the increased adiposity in these animals?
      11. How was the microbiome normalized between groups prior to the beginning of the experiment? (fecal slurry gavage, bedding exchange, cohabitation, none of the above?). There is no mention of this crucial step in the materials and methods section. Furthermore, additional details regarding the microbiome analysis are required (analysis pipeline, read depth, denoising, software, data processing, PCA analysis, etc...). it is not sufficient to state that Zymo performed the analysis. What is an "easily distinguishable gut microbiome" and "appeared less diverse"? a two-dimensional plot using two principal components would be more suitable for image 5A and allow for better visualization of the clustering of the groups. Since the authors suggest that the microbiome could be a source of 1C metabolites (including natural folate), it is important to clarify if coprophagy is involved.
      12. How are inflammatory cytokines and marker levels linked to reduced anabolism and immune function in non-challenged animals?
      13. When discussing the epigenetic analysis, the authors state "no changes in the DNA methylation from liver samples.." and "groups appear younger than expected". Please clarify these statements. Additional details are needed regarding the analysis performed and the choice of methylated loci and methods. Please reference the epigenetic clock or model that was used and if was developed for the same strain and sub-strain of mice. Is it using a modified "Hovarth" mouse DNA age epigenetic clock? If so, provide the necessary details and a possible explanation for the discrepancy other than "unknown reasons"
      14. Regarding Uracil misincorporation, the liver contains significant stores of folate as it is the main hub for several critical OCM reactions (Phospholipid methylation is a major one). Earlier studies used antibiotics with or without coprophagy prevention measures to induce a state of folate depletion to induce uracil incorporation in various tissues of rodent models. Theres is some controversy whether dietary folic acid restriction/methyl donor restriction alone will lead to uracil misincorporation when there is no apparent depletion or anemia. Please discuss your specific experimental procedures and how it agrees or disagrees with the published literature.
      15. The section discussing RPS6 needs to be rewritten and it is difficult to understand. Furthermore, as stated previously, considering phosphorylation of mTOR and its downstream targets 4EBP and S6K1 will give a clear indication of proliferative signaling. Additionally, these pathways are impacted by feeding status, diurnal cycles, and sex. Were these factors controlled prior to sacrifice? Where the animals sacrificed at the same time? In a fed or unfed state?
      16. The western blots provided in supplementary files show uneven protein loading across lanes (ponceau stain). No loading control is shown such as B-actin. A separate blot is used for total and phosphorylated proteins as opposed to gently stripping the membrane of the phosphorylated bolt and re-incubating with the antibody for total. While normalizing phosphorylated to total protein levels will eliminate some of the variability in the author's method. The uneven loading may introduce errors in the calculated ratios.
      17. While the authors referenced older studies utilizing low dose methotrexate on rodents and provided a composite lifespan based on these findings, why was dietary folate and choline restriction used instead of a low dose methotrexate in mice in the current study? Please provide a rationale for this approach.

      Minor comments:

      1. While the authors make compelling arguments that lower folate intake later in life may promote healthy aging, an important consideration in the human population that a considerable percentage of older individuals may be consuming an excessive amount of folate due the combination of fortification and voluntary supplementation. An alternate hypothesis that could apply to humans and lab models is that the existing levels of exposure to folate/folic acid may be accelerating the aging process and promoting disease in later life.
      2. The common C57BL/6j is being referred to as the "long lived strain". Is this relative to mice in wild conditions? There are many transgenic C57bl/6 strains that live considerably longer. Please clarify if this is meant to describe the aged mice used in the experimental process.
      3. While the authors state early in the manuscript that longevity was not a measured outcome in the mouse study, the manuscript contains statements discussing animal survival in the results and survival curves (figure 2). This gives the impression that the study was planned as a survival analysis initially and since no difference was observed between the experimental groups during the earlier stages, the secondary endpoints of health span analysis were adopted. Either approach does not detract from the significance of the study's findings. Further clarity on the approach would be beneficial to the readers.
      4. For yeast culture conditions, what are the folate sources and content? Is there added folic acid similar to cell culture conditions where supraphysiological concentrations are used in standard mediums (RPMI and DMEM).
      5. In the metabolism section, the authors make statements such as "the differences were minimal" , "probably were due..", "minimal effects", "apparent increase", "tended to be", "little uracil" etc.. please refrain from using subjective language and use precise scientific terms.
      6. Figure 2-c, there is a typo, Weeks not months

      ** Referees cross-commenting**

      while we generally agree with the other reviewer's concerns, we find that reviewer 3 rejection of the authors conclusion without considering the evidence presented in the context of what is currently known in the field potentially limiting. Multiple groups have shown that manipulation of OCM enzymes (DHFR, TYMS, SAMS) can extend lifespan in worms. the recent report Antebi's group (Annibal et al. Nature Com, 2021) provides strong evidence that OCM is central to longevity regulation in worms and mice and that folate intake can interact with and modulate organismal longevity. while this manuscript findings are not conclusive, I think it is premature to dismiss it completely. perhaps the alternative is to discuss the limitations of this approach and interpret the results (or the lack of significant differences) in order to help guide future research into this important subject. generalizing rodent results to human is always going to be a limiting factor in this type of work. Mice have significantly higher circulating folate. additionally, DHFR activity (the rate limiting enzyme in folate OCM) in rodents can be up to 100 times higher than its human equivalent. another consideration is that mice, similar to other rodents, engage in coprophagy, thereby recycling and supplementing bacterially produced folate in the absence of antibiotics in the diet. Therefore, mice placed of dietary folate restriction in the absence of antibiotics do not develop signs of anemia or deficiency. Therefore, it could be argued that there is no loss of nutrients in mice in this scenario and that supplementation at the arbitrarily recommended level of synthetic folic acid (2mg/kg day) or higher could impact health and aging. Similarly , in humans excess folate intake has been controversially associated with a number of deleterious health effects. It is important not to dismiss these reports and encourage further research into this subject that impacts a significant percentage of the human population due to the widespread use of supplements.

      Significance

      A major strength of this study is that the authors show that manipulation of OCM either through pharmacological inhibition or dietary restriction can impact organismal longevity in a conserved manner across species from yeast to worms and mammals. These findings provide compelling evidence that folate intake and metabolism in humans should be rigorously researched as potential regulator of aging. These findings complement and agree with a recent report by Antebi's group (Annibal et al. Nature Com, 2021) highlighting that long-lived worm and mice strains exhibit similar metabolic regulation of one carbon metabolism. In the same report low levels of folate supplementation partially or completely abrogated the lifespan extension in some models. This study provides additional evidence that restricting OCM through drugs or dietary restriction can significantly impact healthspan and lifespan. Additionally, it raises the question whether excessive folate intake in aged adults may have potentially deleterious effects on health and longevity. The limitations of this study can be seen in the overall lack of significant impact of the dietary intervention on the health metrics that were measured in mice. The study does not provide strong evidence that restricting folate and choline intake will produce favorable effects on health. Similarly, no significant impact on mice lifespan was observed based on the partial lifespan analysis. Further clarity is needed regarding the experimental procedures and methods used. The study, nonetheless, is an important step towards investigating the role of folate and OCM in regulating mammalian healthspan and lifespan. Future studies can expand on these findings and investigate whether OCM interventions that are started in early life can produce significant and measurable effects on longevity and health in mammals. The findings here provide a conceptual and incremental advance in our understanding of these complex interactions.

      These findings are important to the research communities especially in the areas of longevity, metabolism, and nutrition.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Will the nanobody be available to the TB research community?

      Yes, we will make E11rv available upon request. Please see our materials availability statement.

      Reviewer #2 (Recommendations For The Authors):

      (1) It would be interesting to test the potential impact of residual ASB-14 contaminant on the biochemical behavior of ESAT-6-CFP10 heterodimer and ESAT-6 homodimer or tetramer and their hemolytic activity in comparison with the ones without ASB-14.

      We agree that this is an interesting line of questioning. Based on the study by Refai et al. that we cite in the text, ESAT-6 treated with nonionic detergents ASB-14 or LDAO, but not other common detergents, undergoes a conformational change that increases its cytotoxicity in cell assays, hemolytic activity, and ability to dimerize with CFP-10. What is not known at this point, is how similar the ASB-bound conformation is to anything seen physiologically.

      (2) Building on the progress in making anti-ESAT-6 nanobodies and their anti-Mtb effects in the cells, it could have been tested in human or mouse primary macrophages infected with Mtb and a mouse model of Mtb infection for its anti-Mtb efficiency.

      We thank the reviewer for this suggestion, and we agree that these would be very informative next steps for determining the therapeutic potential of anti-ESAT-6 nanobodies.

      Reviewer #3 (Recommendations For The Authors):

      Minor comments:

      Line 133: "It is well established that Mm-induced hemolysis is ESX-1 dependent, but our results suggest that Mtb must lack one or more factors necessary for efficient hemolysis.". I would tone this down a bit, as it is also known that M. tuberculosis escapes much later than M. marinum from the phagosome, which could indicate different kinetics.

      We thank the reviewer for their insightful comments. We agree that the kinetics of Mtb and Mm infection are quite different and that this may impact the hemolysis assay. As described by Augenstreich et al. some hemolysis by Mtb is observed at 48 hours, though the method of normalization makes it impossible to determine absolute amount of hemolysis that occurred in their experiment. Our findings just show that the absolute amount of Mtb hemolysis in 2 hours is negligible, setting it apart from Mm. We have edited the wording of this statement in the manuscript to avoid any confusion.

      Line 155: "Because Mtb often exists in an acidified compartment". First of all, the reference used here does not discuss anything about Mtb, secondly, papers that do measure the acidification of Mtb-loaded phagosomes indicate that this acidification is very mild (typically to pH 6.2).

      We agree that this point should be articulated more precisely. We have added additional clarification that the pH of Mtb-containing compartments in macrophages can fall in a broad range depending on the activation state of the macrophages, and that non-activated macrophages are typically only mildly acidic. We have updated our references to better describe the current state of knowledge on this topic.

      Line 339: "Whereas most of these functions rely only on the secretion of ESAT-6 into the cytoplasm, the ability of E11rv to access Mtb suggests that this communication is likely two-way." No, not necessary, there are many processes in which ESX-1 substrates affect the macrophage. This nanobody could affect EsxA functioning only once the bacteria reach the cytoplasm. I think checking phagosomal escape in these cells is therefore crucial.

      We agree that phagosomal escape and subsequent direct secretion of ESAT-6 into the cytoplasm is a reasonable alternative hypothesis. We have added this point to our discussion, and we agree that looking directly at phagosomal escape is an important next step.

      Figure 7 is not mentioned in the text (mistake for Fig 6).

      This has been corrected.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public Review): 

      As a reviewer for this manuscript, I recognize its significant contribution to understanding the immune response to saprophytic Leptospira exposure and its implications for leptospirosis prevention strategies. The study is well-conceived, addressing an innovative hypothesis with potentially high impact. However, to fully realize its contribution to the field, the manuscript would benefit greatly from a more detailed elucidation of immune mechanisms at play, including specific cytokine profiles, antigen specificity of the antibody responses, and long-term immunity. Additionally, expanding on the methodological details, such as immunophenotyping panels, qPCR normalization methods, and the rationale behind animal model choice, would enhance the manuscript's clarity and reproducibility. Implementing functional assays to characterize effector T-cell responses and possibly investigating the microbiota's role could offer novel insights into the protective immunity mechanisms. These revisions would not only bolster the current findings but also provide a more comprehensive understanding of the potential for saprophytic Leptospira exposure in leptospirosis vaccine development. Given these considerations, I believe that after substantial revisions, this manuscript could represent a valuable addition to the literature and potentially inform future research and vaccine strategy development in the field of infectious diseases. 

      We have been interested in understanding how both pathogenic and non-pathogenic Leptospira species affect each other on a mammalian reservoir host. With the current study we continue to elucidate the immune mechanisms engaged by pathogenic Leptospira interrogans versus non-pathogenic L. biflexa, as a follow up to our previous work (Shetty et al, 2021 PMID: 34249775, and Kundu et al 2022 PMID 35392072). We found that both species engaged partially overlapping myeloid immune cells and inflammatory signatures of infection. For example, some chemokines were increased, and macrophage and dendritic cells were engaged at 24h post inoculation with both species of Leptospira (PMID: 34249775). Thus, we questioned whether this robust innate immune response raised to eliminate an immunogenic but rather non-pathogenic bacterium, could also help restrain L. interrogans pathogenesis. In this study we show that L. biflexa pre-exposure to L. interrogans challenge mediates improved kidney homeostasis, mitigates leptospirosis severity and leads to increased shedding of L. interrogans in urine. This suggests an interspecies symbiotic commensalistic process that facilitates survival of the pathogenic species. These findings have high impact on the lives of millions of people in areas endemic for leptospirosis that are naturally exposed to non-pathogenic Leptospira species.

      We will expand on the methodological details and will update the introduction and discussion to include answers to questions raised by the three reviewers to further clarify the importance and impact of our study.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors try to achieve a method of protection against pathogenic strains using saprophytic species. It is undeniable that the saprophytic species, despite not causing the disease, activates an immune response. However, based on these results, using the saprophytic species does not significantly impact the animal's infection by a virulent species. 

      We separate concepts of exposure to a non-virulent bacterium that establishes a brief infection with engagement of an immune response (L. biflexa), from infection established by a virulent species of Leptospira that leads to pathogenesis (L. interrogans). While trying to understand how both pathogenic and non-pathogenic Leptospira species affect each other on a mammalian reservoir host, we previously found that L. biflexa induces immune responses that should affect immunity of populations naturally exposed to this spirochete. Thus, we designed this study to answer that question.

      Strengths: 

      Exposure to the saprophytic strain before the virulent strain reduces animal weight loss, reduces tissue kidney damage, and increases cellular response in mice.

      Weaknesses: 

      Even after the challenge with the saprophyte strain, kidney colonization and the release of bacteria through urine continue. Moreover, the authors need to determine the impact on survival if the experiment ends on the 15th. 

      Another novel and unexpected aspect of our findings in the single exposure experiment was that L. biflexa pre-exposure mediated a homeostatic environment in the kidney (lower ColA1, healthier renal physiology) that restrained pathogenesis of L. interrogans after challenge, which resulted in better health outcomes and increased shedding of L. interrogans in urine; in contrast, if the kidney is compromised (high ColA1) by L. interrogans (without L. biflexa pre-exposure) there was lower shedding L. interrogans in urine. Interestingly, this suggests an interspecies symbiotic commensalistic process that facilitates survival of the pathogenic species. Thus, these data suggest that higher shedding of L. interrogans in urine may not be a hallmark of increased disease, but rather it could be the opposite.

      We will include these concepts in the updated discussion.

      We don’t think that extending this experiment to d21 or d28 would add relevant data to our findings. We provide survival curves for both experiments up to d15 post infection.

      Reviewer #3 (Public Review): 

      Summary: 

      Kundu et al. investigated the effects of pre-exposure to a non-pathogenic Leptospira strain in the prevention of severe disease following subsequent infection by a pathogenic strain. They utilized a single or double exposure method to the non-pathogen prior to challenge with a pathogenic strain. They found that prior exposure to a non-pathogen prevented many of the disease manifestations of the pathogen. Bacteria, however, were able to disseminate, colonize the kidneys, and be shed in the urine. This is an important foundational work to describe a novel method of vaccination against leptospirosis. Numerous studies have attempted to use recombinant proteins to vaccinate against leptospirosis, with limited success. The authors provide a new approach that takes advantage of the homology between a non-pathogen and a pathogen to provide heterologous protection. This will provide a new direction in which we can approach creating vaccines against this re-emerging disease. 

      Strengths: 

      The major strength of this paper is that it is one of the first studies utilizing a live non-pathogenic strain of Leptospira to immunize against severe disease associated with leptospirosis. They utilize two independent experiments (a single and double vaccination) to define this strategy. This represents a very interesting and novel approach to vaccine development. This is of clear importance to the field. 

      The authors use a variety of experiments to show the protection imparted by pre-exposure to the non-pathogen. They look at disease manifestations such as death and weight loss. They define the ability of Leptospira to disseminate and colonize the kidney. They show the effects infection has on kidney architecture and a marker of fibrosis. They also begin to define the immune response in both of these exposure methods. This provides evidence of the numerous advantages this vaccination strategy may have. Thus, this study provides an important foundation for future studies utilizing this method to protect against leptospirosis. 

      Weaknesses: 

      Although they provide some evidence of the utility of pretreatment with a non-pathogen, there are some areas in which the paper needs to be clarified and expanded. 

      The authors draw their conclusions based on the data presented. However, they state the graphs only represent one of two independent experiments. Each experiment utilized 3-4 mice per group. In order to be confident in the conclusions, a power analysis needs to be done to show that there is sufficient power with 3-4 mice per group. In addition, it would be important to show both experiments in one graph which would inherently increase the power by doubling the group size, while also providing evidence that this is a reproducible phenotype between experiments. Overall, this weakens the strength of the conclusions drawn and would require additional statistical analysis or additional replicates to provide confidence in these conclusions. 

      We will take these suggestions into consideration and will address as many of these issues as possible in the revised manuscript.

      A direct comparison between single and double exposure to the non-pathogen is not able to be determined. The ages of mice infected were different between the single (8 weeks) and double (10 weeks) exposure methods, thus the phenotypes associated with LIC infection are different at these two ages. The authors state that this is expected, but do not provide a reasoning for this drastic difference in phenotypes. It is therefore difficult to compare the two exposure methods, and thus determine if one approach provides advantages over the other. An experiment directly comparing the two exposure methods while infecting mice at the same age would be of great relevance to and strengthen this work. 

      Both experiments need to be analyzed as separate but complementary as they provide different hind sights into L. interrogans pathogenesis and potential solutions to the problem. Optimal measurements of disease progression (weight loss, survival curves) require infection of mice at 8 weeks. Based on this, a new L. biflexa double exposure experiment would have to start when mice are 4 weeks old which is just after weaning, and before the mouse immune system is fully developed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a valuable contribution to the electric fish community, and to studies of active sensing more generally, in that it provides evidence that a well-studied behavior (chirping) may serve in active sensing rather than communication. For the most part, the evidence is solid. In particular, the evidence showing increased chirping in more cluttered environments and the relationship between chirping and movement are convincing. Nevertheless, evidence to support the argument that chirps are mostly used for navigation rather than communication is incomplete.

      Thank you for the comment. In response to what seemed to be a generalized need for more evidence to support our hypothesis, we have extensively reviewed the manuscript, changed the existing figures and added new ones (3 new figures in the main text and 4 in the supplementary information section). Our edits include:

      (1) changes to the written text to remove categorical statements ruling out the possible communication function of chirps. When necessary, we have also added details on why we believe a social communication function of chirps could interfere with a role in electrolocation.

      (2) new experiments (and related figures) adding details on the behavioral correlates of chirping, on the effects of chirps on electric images (which are a way to represent current flow on the fish skin), and behavioral responses to ramp frequency playback EODs (used to test a continuous range of beat frequencies and fill the sampling gaps left by our experiments using real fish).

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors investigate the role of chirping in a species of weakly electric fish. They subject the fish to various scenarios and correlate the production of chirps with many different factors. They find major correlations between the background beat signals (continuously present during any social interactions) or some aspects of social and environmental conditions with the propensity to produce different types of chirps. By analyzing more specifically different aspects of these correlations they conclude that chirping patterns are related to navigation purposes and the need to localize the source of the beat signal (i.e. the location of the conspecific).

      We thank the Reviewer for the extensive feedback received. Hereby we respond to each of the points raised.

      We have better clarified that our intention is not to propose chirps as tools for “conspecific localization” intended as the pinpointing of its particular location. Instead, based on our observation of chirps being employed at very close ranges, we suggest that chirps may serve to assess other parameters related to “conspecific positioning” (which in a wide sense, it is still “electrolocation”), and that could be derived from the beat. These parameters might include size, relative orientation, or subtle changes in position during movement. While the experiments discussed in the manuscript do not provide a conclusive answer in this regard, we prioritize here the presentation of broader evidence for a different use of chirping. We are actively working on another manuscript that explores this aspect more in detail, but, due to space limitations, additional results had to be excluded.

      In the abstract we mention a role of chirps in the enhancement of “electrolocation”, but - as above mentioned - it is here meant only in a broad sense. In the introduction (at the very end) we propose chirps as self-directed signals (homeoactive sensing). In the result paragraph dedicated to the novel environment exploration experiment the following lines were added “Most chirps (90%) in fact are produced within a distance corresponding to 1% of the maximum field intensity (i.e. roughly 30 cm; Figure S12B), indicating that chirping occurs way below the threshold range for beat detection (i.e. roughly in the range of 60-120 cm, depending on the study; see appendix 1: Detecting beats at a distance) and likely does not represent a way to improve it”. We conclude this paragraph mentioning “This further corroborates the hypothesized role of chirps in beat processing.”. The last result paragraph (on chirping in cluttered environments) ends with “This supports the notion of chirps as self-referenced probing cues, potentially employed to optimize short-range aspects of conspecific electrolocation, such as conspecific size, orientation, and swimming direction - a hypothesis that will certainly be explored in future studies.”. In the discussion paragraph entitled “probing with chirps”, we do provide hints to possible mechanisms implied in the role of chirps in beat processing. As mentioned, we have planned to add further details in another manuscript, currently in preparation.

      The study provides a wealth of interesting observation of behavior and much of this data constitute a useful dataset to document the patterns of social interactions in these fish. Some data, in particular the high propensity to chirp in cluttered environments, raises interesting questions. Their main hypothesis is a useful addition to the debate on the function of these chirps and is worth being considered and explored further. However, the data they provide does not support strong conclusion statements arguing that these chirps are used for localization purposes and is even less convincing at rejecting previously established hypotheses on the communication purpose of the chirps.

      We intentionally framed our aims a bit provocatively to underscore that, to date, the role of chirps in social communication has been supported solely by correlative evidence. While the evidence we provide to support the role of chirps as probes is also correlative, it opens at the same time critical questions on the long assumed role of chirps in social communication. In fact, chirping is strongly dependent on fish reciprocal positioning, highly constrained by beat frequency, and patterned in such ways that - in our opinion - makes the existence of links between chirp types and internal states less likely, as suggested instead by the current view. Moreover, the use of different chirp types does not appear specific to any of the social contexts analyzed but is primarily explained by DF (beat frequency). This observation, coupled with the analysis of chirp transitions (more self-referenced than reflecting an actual exchange between subjects), leads us to hypothesize with greater confidence that chirp production may be more related to sensing the environment, rather than transmitting information about a specific behavioral state.

      Nevertheless, the Reviewer's comment is valid. We've tempered the study's conclusions by introducing the possibility of chirps serving both communication and electrolocation functions, as stated in the conclusion paragraph: "While our results do not completely dismiss the possibility of chirps serving a role in electrocommunication—probing cues could, for instance, function as proximity signals to signal presence, deter approaches, or coordinate behaviors like spawning (Henninger et al., 2018).". Nonetheless, we do emphasize that our hypothesis is more likely to apply - based on our data. We refrain from categorically excluding a communicative function for chirps (between subjects), but we hypothesize that this communication - if occurring - may contain the same type of information as the self-directed signaling implied by the “chirps as probes” idea (i.e. spatial information).

      In response to the Reviewer's feedback, we've revised the end of the introduction, removing suggestions of conclusiveness: "Finally, by recording fish in different conditions of electrical 'visibility,' we provide evidence supporting a previously neglected role of chirps: homeoactive sensing." (edit: the word “validating” has been removed to give a less “conclusive” answer to the open functional questions about chirping).

      I would suggest thoroughly revising the manuscript to provide a neutral description of the results and leaving any speculations and interpretations for the discussion where the authors should be careful to separate strongly supported hypotheses from more preliminary speculations. I detail below several instances where the argumentation and/or the analysis are flawed.

      Following to the reviewer’s comment, we have revised the manuscript to emphasize the following points: 1) the need for a revision of the current view on chirping, 2) our proposal of an alternative hypothesis based on correlations between chirping and behavior, which were previously unexplored, and 3) our acknowledgment that while we offer evidence supporting a probing role of chirps (e.g., lack of behavioral correlation, DF-dependency, stereotypy in repeated trials, modulation by clutter and distance), we do not present here conclusive evidence for chirps detecting specific details of conspecific positioning. Neither do we exclude categorically a role of chirps in social communication.

      They analyze chirp patterning and show that, most likely, a chirp by an individual is followed by a chirp in the same individual. They argue that it is rare that a chirp elicits a "response" in the other fish. Even if there are clearly stronger correlations between chirps in the same individual, they provide no statistical analysis that discards the existence of occasional "response" patterns. The fact that these are rare, and that the authors don't do an appropriate analysis of probabilities, leads to this unsupported conclusion.

      We employed cross-correlation indices, calculated and assessed with a 3 standard deviation symmetrical boundary (which is a statistically sound and strict criterion). Median values were utilized to depict trends in each group/pair. To support our findings, we added new experiments and new figures: 1) a correlation analysis between chirps and behaviors, providing more convincing evidence of how chirps are employed during "scanning" swimming activity (backward swimming); 2) a text mining approach to underscore chirp-behavior correlations, employing alternative and statistically more robust methods.

      One of the main pieces of evidence that chirps can be used to enhance conspecific localization is based on their "interference" measure. The measure is based on an analysis of "inter-peak-intervales". This in itself is a questionable choice. The nervous system encodes all parts of the stimulus, not just the peak, and disruption occurring at other phases of the beat might be as relevant. The interference will be mostly affected by the summed duration of intervals between peaks in the chirp AM. They do not explain why this varies with beat frequency. It is likely that the changes they see are simply an artifact of the simplistic measure. A clear demonstration that this measure is not adequate comes from the observation in Fig7E-H. They show that the interference value changes as the signal is weaker. This measure should be independent of the strength of the signal. The method is based on detecting peaks and quantifying the time between peaks. The only reason this measure could be affected by signal strength is if noisy recordings affect how the peak detection occurs. There is no way to argue that this phenomenon would happen the same way in the nervous system. Furthermore, they qualitatively argue that patterns of chirp production follow patterns of interference strength. No statistical demonstration is done. Even the qualitative appraisal is questionable. For example, they argue that there are relatively few chirps being produced for DFs of 60 or -60 Hz. But these are DF where they have only a very small sample size. The single pair of fish that they recorded at some of these frequencies might not have chirped by chance and a rigorous statistical analysis is necessary. Similarly, in Fig 5C they argue that the position of the chirps fall on areas of the graph where the interferences are strongest (darker blue) but this is far from obvious and, again, not proven.

      We would like to clarify that the estimation of the effects of chirps on the beat (referred to as “beat interference”) was not intended to serve as the primary evidence supporting a different use of chirping. In fact, all the experiments conducted prior to that calculation already provide substantial evidence supporting the hypothesis we have proposed. In an attempt to address the Reviewer’s concern and to avoid misleading interpretations, we moved this part now to the Supplementary Information (see now Figures S8 and S9), in agreement with the non crucial relevance of this approach. We also added the following statement to the result paragraph entitled “Chirps significantly interfere with the beat and enhance electric image contrast”: “Obviously, measuring chirp-triggered beat interferences by using an elementary outlier detection algorithm on the distribution of beat cycles does not reflect any physiological process carried out by the electrosensory system and can be therefore used only as an oversimplified estimate.”.

      Regarding the meaning of “beat interference” (as here estimated) from a perspective of brain physiology: chirp interference was calculated using the beat cycles as a reference. Beat peaks were used only to estimate beat cycle duration. Regardless of whether or not a beat peak is represented in the brain, beat cycle duration (estimated using the peaks) is the main determinant of p-unit rhythmic response to a beat. Regarding the effect of signal amplitude, this is also not very relevant. It is obvious that a chirp creates more - or less - interference based on the chirp FM and its duration (but also the sign of the DF and the magnitude of the amplitude modulation). If electroreceptor responses are entrained in waves of beat AMs and if “interference” is a measure of how such waves are scrambled, then “interference” is a measure of how chirps scramble waves of electroreceptor activity by affecting beat AMs.

      The reason why the interference fades with the signal (previous figure 7, now Figure S12) is because it is weighted on the signal strength (the signals used as carrier for chirps are recalculated based on real measurements of signal strength at different distances). Nonetheless, the Reviewer is right: mathematically speaking interference would not change at all because it is just the result of an outlier detection algorithm. This outlier detection is actually set to have a 1% threshold (percent of beat contrast).

      Regarding the comparison “chirps vs interference”, we did not make a statistical analysis because we wanted to just show a qualitative observation. Similar results can be obtained for slightly shorter or longer time windows, within certain limits of course (see added Figure S9, in the Supplementary Information). We hope that moving this analysis to the supplementary information makes it clear that this approach is not central to make our point.

      The Reviewer’s point on the DF sampling is correct, we have reconsidered the low chirping at 60Hz as potentially the result of sampling bias and edited the respective result paragraph.

      They relate the angle at which one fish produces chirps relative to the orientation of the mesh enclosing. They argue that this is related to the orientation of electric field lines by doing a qualitative comparison with a simplified estimate of field lines. To be convincing this analysis should include a quantitative comparison using the exact same body position of the two fish when the chirps are emitted.

      We agree with the Reviewer, this type of experiment would be much better suited to illustrate the correlations between chirping and reciprocal positioning in fish. What we can see is that chirping occurs at certain orientations more often than others. This could have something to do with either field geometry or with locomotion in the particular test environment we have used. As mentioned earlier, we are currently editing a second manuscript which will include the type of analysis/experiment the Reviewer is thinking of. We preferred to focus in this first study on the broader behavioral correlates of chirping. We removed the mention to the field current lines because - we agree - the argument is vague as presented here.

      They show that the very vast majority of chirps in Fig 6 occur when the fish are within a few centimeters (e.g. very large first bin in Fig6E-Type2). This is a situation when the other fish signal will be strongest and localization will be the easiest. It is hard to understand why the fish would need a mechanism to enhance localization in these conditions (this is the opposite of difficult conditions e.g. the "cluttered" environment).

      Agreed, in fact we do not explicitly propose chirps as means to improve “electrolocation” (this word is used only broadly in the abstract) but instead as probes to extract spatial information (e.g. shape, motion, orientation) from a beat source. In a broader sense, all these spatial parameters contribute to any given instance of "localization." Because we were unable to explore all these aspects in greater detail, we chose to maintain a broader perspective. If chirps contribute to a better resolution of fine spatial attributes of conspecific locations, it is reasonable to expect higher chirping rates in proximity to the target fish.

      The argumentation aimed at rejecting the well-established role of chirp in communication is weak at best. First, they ignored some existing data when they argue that there is no correlation between chirping and behavioral interactions. Particularly, Hupe and Lewis (2008) showed a clear temporal correlation between chirps and a decrease in bites during aggressive encounters. It could be argued that this is "causal evidence" (to reuse their wording) that chirps cause a decrease in attacks by the receiver fish (see Fig 8B of the Hupe paper and associated significant statistics). Also, Oboti et al. argue that social interactions involve "higher levels of locomotion" which would explain the use of chirps since they are used to localize. But chirps are frequent in "chirp chamber" paradigms where no movement is involved. They also point out that social context covaries with beat frequency and thus that it is hard to distinguish which one is linked to chirping propensity and then say that it is hard to disentangle this from "biophysical features of EOD fields affecting detection and localization of conspecific fish". But they don't provide any proof that beat frequency affects detection and localization so their argument is not clear. Last, they argue that tests in one species shouldn't be extrapolated to other species. But many of the studies arguing for the role of chirps in communication was done on brown ghost. In conclusion of this point, they do not provide any strong argument that rejects the role of chirps as a communication signal. A perspective that would be better supported by their data and consistent with past research would be to argue that, in addition to a role in communication, chirps could sometimes be used to help localize conspecifics.

      We did not intend to disregard the extensive body of literature supporting a role of chirps in social communication. Rather, the primary goal of this study was to present a valid alternative perspective to this prevailing view. The existence of a well-established hypothesis does not imply that new evidence cannot change it; it simply indicates that changing it may be challenging either because it's genuinely difficult or because the idea has not been thoroughly explored. Whatever the case may be, proposing new hypotheses, whether complementary or alternative to established theories, is a challenging undertaking for a single study. We judged that starting from broad correlations would be the most desirable approach.

      We did not ignore data from Hupé and Lewis 2008. We cited this study repeatedly and compared their findings to those of others, not only for the correlation chirp-behaviors but also for chirping distance considerations. However, following the Reviewer’s comment, we now cite this study in the context of the behavioral analysis recently added (data from the PSTH plots could possibly confirm the observation of lower chirps during attacks). We also cited the study by Triefenbach and Zakon 2008, which reports something along the same lines. See the statement: “Overall, these results provided mutually reinforcing evidence indicating that chirps are produced more often during locomotion or scanning-related motor activity and confirm previous reports of a lower occurrence of chirping during more direct aggressive contact (as shown also by Triefenbach and Zakon, 2008; Hupé and Lewis, 2008).”, in the result paragraph related to the behavioral correlates of chirping.

      In our study we make it clear how we distinguish causal evidence (i.e. providing evidence that A is required for B) from correlation (i.e. evidence for A simply occurring together with B). We also make it clear that we are not going to provide causal evidence but we are going to provide new evidence for correlations that were so far not considered, in order to propose a new unexplored function of chirps.

      The Reviewer's point on chirping during motion and while caged in a chirp chamber is valid. Indeed at first we were also puzzled by this finding. However, under the “chirps as probes” paradigm, chirping in a chirp-chamber can be explained by the need to obtain spatial information from an otherwise unreachable beat source (brown ghosts are typically exploring new environmental objects or conspecifics by actively swimming around them - something caged fish can’t do). So, eventually the observation of chirping under conditions of limited movement (such as in a chirp chamber experiment) is not in contradiction with our hypothesis, rather it can be used to support it. Further experiments are required - as rightfully pointed out - to evaluate the effects of beat frequency on beat detection. We added a note about this in the “probing with chirps” discussion paragraph.

      The Reviewer's comment regarding generalization is unclear. We acknowledge that most studies are conducted in brown ghosts, as stated in the abstract. Our intention was to highlight that insights gained from this species have been applied to broaden the understanding of chirps in other species. Specifically, the "behavioral meaning idea" of chirping has been extended to other gymnotiform species producing EOD frequency modulations .

      Our study's aim is not to dismiss the idea of chirps being used for communication but to present an alternative hypothesis and to provide supporting evidence. While our results may not align well with the communication theory, our intention is not dismissal but rather engaging in a discussion and exploration of alternative perspectives.

      The discussion they provide on the possible mechanism by which chirps could help with localization of the conspecific is problematic. They imply that chirps cause a stronger response in the receptors. For most chirps considered here, this is not true. For a large portion of the beat frequencies shown in this paper, chirps will cause a de-synchronization of the receptors with no increase in firing rate. They cannot argue that this represents an enhanced response. They also discuss a role for having a broader frequency spectrum -during the chirp- in localization by making a parallel with pulse fish. There is no evidence that a similar mechanism could even work in wave-type fish.

      We have already commented on the “localization” idea in our previous responses. The Reviewer is right in saying that we have provided only vague descriptions of the potential mechanisms implied by our hypothesis. The studies by Benda and others (2005, 2006) demonstrate a clear synchronizing effect of chirps on p-unit firing rates, especially at low DFs (at ranges similar to those considered in this study). This synchronization could lead to an enhanced response at the electroreceptor level, as described in these very studies, which in turn would result in a higher probability of firing in downstream neurons (E-cells in the ELL).

      As also reported within the same works, chirps may also exert an opposite effect on p-units (i.e. desynchronization). This is what happens for large chirps at high DFs. Desynchronization may cause temporary lapses of p-unit firing, which in turn may lead to increased activity of I-cells in the ELL (which are indeed specifically tuned to p-unit lack of activity).

      So, in general, if we consider both ON and OFF pyramidal cells (in the ELL) and small and large chirps, we could state that chirps can be potentially used to enhance the activity of peripheral electrosensory circuits through different mechanisms, contingent on the chirp type and beat frequency. Unfortunately, space constraints limited our ability to dig into these details in the present study.

      However, to address the Reviewer’s rightful point, we now mention this in the manuscript: Since the beat AMs generated by the chirps always trigger reliable responses in primary electrosensory circuits (pyramidal cells in the ELL respond to both increases and decreases in beat AM), any chirp-triggered AM causing a sudden change in p-unit firing could potentially amplify the downstream signal (Marsat and Maler, 2010) and thus enhance EI contrast.” (see result paragraph on beat interference and electric images).

      They write the whole paper as if males and females had been identified in their experiments. Although EOD frequency can provide some guess of the sex the method is unreliable. We can expect a non-negligible percentage of error in assigning sex.

      We agree and in fact, in the method section we state:

      “The limitation of this approach is that females cannot be distinguished from immature males with absolute certainty, since no post-mortem gonadal inspection was carried out.”

      to this we added:

      “Although a more accurate way to determine the sex of brown ghosts would be to consider other morphological features such as the shape of the snout, the body size, the occurrence of developing eggs, EOD frequency has been extensively used for this purpose.”

      Moreover, the consistent behavioral differences observed in low frequency fish, measured with those behavioral experiments aimed at assessing responses to playback stimuli and swimming behavior in novel environments, could also be caused by a younger age (as opposed to femaleness). However, the size ranges of our fish (an admittedly unreliable proxy of age) were all comparable, making this possibility perhaps less likely.

      Reviewer #2 (Public Review):

      Studying the weakly electric brown ghost knifefish, the authors provide evidence that 'chirps' (brief modulations in the frequency and amplitude of the ongoing electric signal) function in active sensing (specifically homeoactive sensing) rather than communication. This is a behavior that has been very well studied, including numerous studies on the sensory coding of chirps and the neural mechanisms for chirp generation. Chirps are largely thought to function in communication behavior, so this alternative function is a very exciting possibility that could have a great impact on the field. The authors do provide convincing evidence that chirps may function in homeoactive sensing. However, their evidence arguing against a role for chirps in communication is not as strong, and neglects a large body of research. Ultimately, the manuscript has great potential but suffers from framing these two possibilities as mutually exclusive and dismissing evidence in favor of a communicative function.

      We thank the Reviewer for the comment. Overall, we have edited the manuscript to soften our conclusions and avoid any strong categorical statement excluding the widely accepted role of chirps in social communication. We have added some new experiments with the aim to add more detail to the behavioral correlates of chirping and to the DF dependency of the production of different types of chirps. Nonetheless, based on our results, we are prone to conclude that the communication idea - although widely accepted - is not as well substantiated as it should be.

      Although we do not dismiss the bulk of literature supporting a role of chirps in social communication, we think that our hypothesis (i.e. decoding of spatial parameters from the beat) may be not fully compatible with the social communication hypothesis for the following reasons:

      (1) Chirp type dependency on DF makes chirps likely to be adaptive responses to beat frequency. While this idea is compatible with a role of chirps in the detection of beat parameters, their concurrent role in social communication would imply that chirps interacting at given beat frequencies (DFs) would communicate only (or mainly) by delivering a very limited range of “messages”. For instance, assuming type 2 chirps are related to aggression (as widely suggested), are female-male pairs - with larger DFs - interacting less aggressively than same sex pairs? Our experiments often suggested this is not the case. In addition, large DFs are not always indicative of opposite sex interactions, while they are very often characterized by the emission of large chirps. Not to mention that, despite the fact that opposite sex interactions in absence of breeding-like conditions, cannot be considered truly courtship-related, large chirps are often considered courtship signals, regardless of the reproductive state of the emitting fish.

      (2) Chirping is highly affected by locomotion (consider female/male pairs with or without mesh divider) and distance (as shown in the novel environment exploration experiment). While the involvement of both parameters is compatible with a role of chirps in active sensing, a role of chirps in social communication implies that such signaling would occur only when fish are in very close proximity to each other. In this case, the beat is therefore heavily distorted not only by fish position/locomotion but also by chirps. Which means that when fish are close to each other, the 2 different types of information relayed by the beat (electrolocation and electrocommunication) would certainly interfere (this idea has been better phrased in the Introduction paragraph).

      (3) In our playback experiments we could not see any meaningful matching (e.g. angry-chirp → angry-chirp or sexy-chirp → approach) between playback chirps and evoked chirps, raising doubts on the meaning associated so far with the different types. Considering that playback experiments are typically used to assess signal meaning based on how animals respond to them, this result is suggesting quite strongly that such meaning cannot be assigned to chirps.

      (4) In playback experiments in which the same stimulus is provided multiple times, chirp type transitions (i.e. emission of a different chirp type after a given chirp) become predictable (as shown in the added playback experiments using ramping signals). This confirms that the choice to emit a given chirp type has something to do with beat frequency (or a change in this parameter) and not a communication of internal states. It would be otherwise unclear how a fish could change its internal state so quickly - and so reliably - even in the span of a few seconds.

      Despite this evidence against a semantic content of chirps in the context of social communication, we conclude the manuscript reminding that we are not providing strong evidence dismissing the communication hypothesis, and that both could coexist (see the example of “proximity signals” in the mating context given in the concluding paragraph).

      (1) The specific underlying question of this study is not made clear in the abstract or introduction. It becomes apparent in reading through the manuscript that the authors seek to test the hypothesis that chirps function in active sensing (specifically homeoactive sensing). This should be made explicitly clear in both the abstract and introduction, along with the rationale for this hypothesis.

      In the abstract we state “Despite the success of this model in neuroethology over the past seven decades, the underlying logic of their electric communication remains unclear. This study re-evaluates this view, aiming to offer an alternative, and possibly complementary, explanation for why these freshwater bottom dwellers emit electric chirps.”. This statement is meant as a summary of our aims. However, in order to convey a clearer message, we have revised the whole manuscript to more explicitly articulate our objectives. In particular we stress that with our experiments we intend to provide correlative evidence for a different role of chirps (previously unexplored) with the idea to stimulate a discussion and possibly a revision of the current theory about the functional role of chirps.

      In the introduction we have added a paragraph explaining our aim and also why we think that communicating through chirps could potentially interfere with efficient electrolocation: “Since both chirps and positional parameters (such as size, orientation or motion) can only be detected as perturbations of the beat (Petzold et al., 2016; Yu et al., 2012; Fotowat et al., 2013), and via the same electroreceptors, the inputs relaying both types of information are inevitably interfering. Moreover, as the majority of chirps are produced within a short range (< 50 cm; Zupanc et al., 2006; Hupé and Lewis 2008; Henninger et al., 2018; see appendix 1) this interference is likely to occur consistently during social interactions.

      Under the communication-hypothesis, the assumption that chirps and beats are conveying different types of information (i.e. semantic value as opposed to position and related geometrical parameters) is therefore leaving this issue unresolved.”.

      (2) My biggest issue with this manuscript is that it is much too strong in dismissing evidence that chirping correlates with context. This is captured in this sentence in the introduction, "We first show that the choice of different chirp types does not significantly correlate with any particular behavioral or social context." This very strong conclusion comes up repeatedly, and I disagree with it, for the following reasons:

      In your behavioral observations, you found sex differences in chirping as well as differences between freely interacting and physically separated fish. Your model of chirp variability found that environmental experience, social experience, and beat frequency (DF) are the most important factors explaining chirp variability. Are these not all considered "behavioral or social context"? Beat frequency (DF) in particular is heavily downplayed as being a part of "context" but it is a crucial part of the context, as it provides information about the identity of the fish you're interacting with.

      In your playback experiments, fish responded differently to small vs. large DFs, males chirped more than females, type 2 chirps became more frequent throughout a playback, and rises tended to occur at the end of a playback. These are all examples of context-dependent behavior.

      We agree with the Reviewer’s comment and we think that probably we have been unclear in what the meaning of that statement was. We also agree with the Reviewer about what is defined as “context”, and that a given beat frequency (DF) can in the end represent a “behavioral context” as well. In order to make it clearer, we have rephrased this statement and changed it to: “We first show that the relative number of different chirp types in a given recording does not significantly correlate with any particular behavioral or social context.”. This new form refers specifically to the observation that - in all different social conditions examined - the relative amounts of different types of chirps is unchanged (see Figure S2). We thought the Reviewer maybe interpreted our statement as if we suggested that chirp type choice is random or unaffected by any social variable. We agree with the Reviewer that this is not the case. We also reported that sex differences in chirping are present, but we have emphasized they may have something to do with the propensity of the brown ghosts of either sex to swim/explore as opposed to seek refuge and wait (as suggested by our experiments in which FM pairs were either divided or freely interacting and our novel environment exploration experiments).

      We agree DF is important, in fact it is the 3rd most important factor explaining chirp variance in our model. In our fish pair recordings, we see a strong correlation of chirp total variance with tank experience (one naïve, one experienced, both fish equally experienced) and social context (novel to each other/familiar to each other, subordinate/dominant, breeding/non breeding, accessible/not accessible) although data clustering seems to better distinguish “divided” vs “freely moving” conditions (and sex may also play a role as well because of the reversal of sexual dimorphism in chirp rates in precisely this case) more than other variables. However, we do not see a specific effect of these variables on the proportion of different types of chirps in any recording (see Figure S2).

      We also edited the beginning of the first result paragraph and changed it to “Thus, if behavioral meaning can be attributed to different types of chirps, as posed by the prevailing view (e.g., Hagedorn and Heiligenberg, 1985; Larimer and MacDonald, 1968; Rose, 2004), one should be able to identify clear correlations between behavioral contexts characterizing different internal states and the relative amounts of different types of chirp”, to emphasize we are here assessing the meaning of different types of chirps (not of the total amount of chirping in general).

      Further, you only considered the identity of interacting fish or stimulated fish, not their behavior during the interaction or during playback. Such an analysis is likely beyond the scope of this study, but several other studies have shown correlations between social behavior and chirping. In the absence of such data here, it is too strong to claim that chirping is unrelated to context.

      We agree with the Reviewer, in fact this analysis was previously carried out but purposely left out in an attempt to limit the manuscript length. We have now made space for this experimental work which is now added (see the new Figure 6).

      In summary, it is simply too strong to say that chirping does not correlate with context. Importantly, however, this does not detract from your hypothesis that chirping functions in homeoactive sensing. A given EOD behavior could serve both communication and homeoactive sensing. I actually suspect that this is quite common in electric fish. The two are not mutually exclusive, and there is no reason for you to present them as such. I recommend focusing more on the positive evidence for a homeoactive function and less on the negative evidence against a communication function.

      We aimed to clarify that our reference was to the lack of correlation between "chirp type relative numbers" and the analyzed context. Regarding the communication function, we tempered negative statements. However, as this study stems from evidence within the established paradigm of "chirps as communication signals", and aims at proposing an alternative hypothesis, eliminating all references to it could undermine the study's purpose.

      (3) The results were generally challenging to follow. In the first 4 sections, it is not made clear what the specific question is, what the approach to addressing that question is, and what specific experiment was carried out (the last two sections of the results were much clearer). The independent variables (contexts) are not clearly established before presenting the results. Instead they are often mentioned in passing when describing the results. They come across as an unbalanced hodgepodge of multiple factors, and it is not made clear why they were chosen. This makes it challenging to understand why you did what you did, the results, and their implications. For each set of major results, I recommend: First, pose a clear question. Then, describe the general approach to answering that question. Next, describe the specifics of the experimental design, with a rationale that appeals to the general approach described. Finally, describe the specific results.

      The introductory sentences of the first result paragraphs have been edited, rendering the aim of the experiments more explicit.

      (4) Results: "We thus predicted that, if behavioral meaning can be attributed to different types of chirps, as posed by the prevailing view (e.g., Hagedorn and Heiligenberg, 1985; Larimer and MacDonald, 1968; Rose, 2004)..." It should be made clear why this is the prevailing view, and this description should likely be moved to the introduction. There is a large body of evidence supporting this view and it is important to be complete in describing it, especially since the authors seem to seek to refute it.

      We understand the Reviewer’s question and we tried to express in the introduction the main reasons for why this is the current view. We state “Different types of chirps are thought to carry different semantic content based on their occurrence during either affiliative or agonistic encounters (Larimer and MacDonald 1968; Bullock 1969; Hopkins 1974; Hagedorn and Heiligenberg 1985; Zupanc and Maler 1993; Engler et al. 2000; Engler and Zupanc 2001; Bastian et al., 2001).”. To this we added: “Although supported mainly by correlative evidence, this idea gained popularity because it is intuitive and because it matches well enough with the numerous behavioral observations of interacting brown ghosts.”.

      We believe the prevailing view is based on intuition and a series of basic observed correlations repeated throughout the years. The crystallization of this idea is not due to negligence but mainly to technical limitations existing at the time of the first recordings. In order to assess the role of chirps in behaving fish a tight and precise temporal control over synched video-EOD recordings is most likely necessary, and this is a technical feature probably available only much later than the 50-60ies, when electric communication was first described.

      (5) I am not convinced of the conclusion drawn by the analysis of chirp transitions. The transition matrices show plenty of 1-2 and 2-1 transitions occurring. Further, the cross-correlation analysis only shows that chirp timing between individuals is not phase-locked at these small timescales. It is entirely possible that chirp rates are correlated between interacting individuals, even if their precise timing is not.

      We agree with the Reviewer: chirp repertoires recorded in different social contexts are not devoid of reciprocal chirp transitions (i.e. fish 1 chirp - to - fish 2 chirp, or vice versa). Yet our point is to emphasize that their abundance is way more limited when compared to the self-referenced ones (i.e. 1-1 and 2-2). This is a fair concern and in order to further address this point, we have added a whole new set of analyses and new experiments (see chirp-behavior correlations, PSTHs and more analysis based on more solid statistical methods; see Figure 6).

      Reviewer #3 (Public Review):

      Summary:

      This important paper provides the best-to-date characterization of chirping in weakly electric fish using a large number of variables. These include environment (free vs divided fish, with or without clutter), breeding state, gender, intruder vs resident, social status, locomotion state and social and environmental experience, as well as with playback experiments. It applies state-of-the-art methods for reducing dimensionality and finding patterns of correlation between different kinds of variables (factor analysis, K-means). The exceptional strength of the evidence, collated from a large number of trials with many controls, leads to the conclusion that a number of commonly accepted truths about which variable affects chirping must be carefully rewritten or nuanced. Based on their extensive analyses, the authors suggest that chirps are mainly used as probes that help detect beats and objects.

      Strengths:

      The work is based on completely novel recordings using interaction chambers. The amount of new data and associated analyses is simply staggering, and yet, well organized in presentation. The study further evaluates the electric field strength around a fish (via modelling with the boundary element method) and how its decay parallels the chirp rate, thereby relating the above variables to electric field geometry.

      The main conclusions are that the lack of any significant behavioural correlates for chirping, and the lack of temporal patterning in chirp time series, cast doubt on a communication goal for most chirps. Rather, the key determinants of chirping are the difference frequency between two interacting conspecifics as well as individual subjects' environmental and social experience. These conclusions by themselves will be hugely useful to the field. They will also allow scientists working on other "communication" systems to at least reconsider, and perhaps expand the precise goal of the probes used in those senses. There are a lot of data summarized in this paper, and thorough referencing to past work. For example, the paper concludes that there is a lack of evidence for stereotyped temporal patterning of chirp time series, as well as of sender-received chirp transitions beyond the known increase in chirp frequency during an interaction.

      The alternative hypotheses that arise from the work are that chirps are mainly used as environmental probes for better beat detection and processing and object localization.

      The authors also advance the interesting idea that the sinusoidal frequency modulations caused by chirps are the electric fish's solution to the minute (and undetectable by neural wetware) echo-delays available to it, due to the propagation of electric fields at the speed of light in water.

      Weaknesses:

      My main criticism is that the alternative putative role for chirps as probe signals that optimize beat detection could be better developed. The paper could be clearer as to what that means precisely.

      We appreciate the Reviewer's kind comments. While we acknowledge that our exploration of chirp function in this study may be limited and not entirely satisfying, we made this decision due to space constraints, opting for a broader and diversified approach. We hope that future studies will build on these data and start filling the gaps. We are also working on another manuscript which is addressing this point more in detail.

      Nonetheless, we considered the Reviewer’s criticism and added not only a new figure (to show more explicitly what chirps can do to the perceived electric fields, as simulated by electric images) but also more descriptive parts explaining how we think chirps may act to improve the spatial resolution of beat processing (see the discussion paragraph “probing with chirps”). In this paragraph we rendered more clearly how chirps could improve beat processing by phase shifting EODs and recovering eventual blind-spots on the fish skin caused by disruptive EOD interferences (resulting in lower beat contrast). We also mention that enhancement of electrosensory input triggered by chirps, could be localized not only at the level of electroreceptors (consider the synchronizing effects small chirps have on p-units at low frequency beats) but also at the level of ON and OFF pyramidal cells in the ELL. Looked at from the perspective of these neurons, any chirp would enhance the activity of these input lines, yet in opposite ways.

      And there is an egg-and-chicken type issue as well, namely, that one needs a beat in order to "chirp" the beating pattern, but then how does chirping optimize the detection of the said beat? Perhaps the authors mean (as they wrote elsewhere in the paper) that the chirps could enhance electrosensory responses to the beat.

      According to the Reviewer’s comment, we have now revised several instances of the misleading phrasing identified.

      In the results on novel environment exploration: “If chirps enhance beat processing, for instance, chirping should occur within beat detection range but at a certain distance.”.

      “This, in turn, could be used to validate our beat-interference estimates as meaningfully related to beat processing.” and “In all this, rises may represent an exception as their locations are spread over larger distances and even in presence of obstacles potentially occluding the beat source (such as shelters, plants, or walls), all of which are conditions in which beat detection or beat processing could be more difficult (this, could be coherent with the production of rises right at the end of EOD playbacks; Figure S5).”

      Last result paragraph (clutter experiment): “Overall, these results indicate that chirping is significantly affected by the presence of environmental clutter partially disrupting - or simply obstructing - the processing of beat related information during locomotion”.

      In the probing with chirps discussion paragraph “In theory, chirps could also be used to improve electrolocation of objects as well (as opposed to the processing of the beat).”.

      In the conclusions: “optimizing the otherwise passive responses to the beat”.

      A second criticism is that the study links the beat detection to underwater object localization. I did not see a sufficiently developed argument in this direction, nor how the data provided support for this argument. It is certainly possible that the image on the fish's body of an object in the environment will be slightly modified by introducing a chirp on the waveform, as this may enhance certain heterogeneities of the object in relation to its environment. The thrust of this argument seems to derive more from the notion of Fourier analysis with pulse type fish (and radar theory more generally) that the higher temporal frequencies in the beat waveform induced by the chirp will enable a better spatial resolution of objects. It remains to be seen whether this is significant.

      The Reviewer is correct in noting that this point is not addressed in the manuscript. We introduced it as a speculative discussion point to mention alternative possibilities. These could be subject to further testing in future studies.

      I would also have liked to see a proposal for new experiments that could test these possible new roles.

      We have added clearer suggestions for future experiments throughout the discussion: these may be aimed at 1) improving playback experiments using more realistic copies of the brown ghost’s EODs (including harmonics), 2) assess fish reciprocal positioning during chirping in better detail and 3) test the use of chirping during target-reaching tasks in order to better assess the probing function of chirps.

      The authors should recall for the readers the gist of Bastian's 2001 argument that the chirp "can adjust the beat frequency to levels that are better detectable" in the light of their current. Further, at the beginning of the "Probing with chirps" section, the 3rd way in which chirps could improve conspecific localization mentions the phase-shifting of the EOD. The authors should clarify whether they mean that the tuberous receptors and associated ELL/toral circuitry could deal with that cue, or that the T_unit pathway would be needed?

      We thank the Reviewer for identifying this unclear point. We added reference to the p-units “Yet, this does not exclude the possibility that chirps could be used to briefly shift the EOD phase in order to avoid disruptive interferences caused by phase opposition (at the level of p-units)” in the above mentioned paragraph. We would prefer to omit a more detailed reference to t-units in order to avoid lengthy descriptions required to discuss the different electroreceptor types.

      On p.17 I don't understand what is meant by most chirps being produced, possibly aligned with the field lines, since field lines are everywhere. And what is one to conclude from the comparison of Fig.6D and 7A? Likewise it was not clear what is meant by chirps having a detectable effect on randomly generated beats.

      We agree on the valid point raised by the Reviewer and we have removed reference to current lines from the text.

      In the section on Inconsistencies between behaviour and hypothesized signal meaning, the authors could perhaps nuance the interpretation of the results further in the context of the unrealistic copy of natural stimuli using EOD mimics. In particular, Kelly et al. 2008 argued that electrode placement mattered in terms of representation of a mimic fish onto the body of a real fish, and thus, if I properly understand the set up here, the movement would cause the mimic to vary in quality. This may nevertheless be a small confounding issue.

      We agree with the Reviewer and added a comment at the beginning of the paragraph mentioned. “Nonetheless, it's plausible that playback stimuli, as employed in our study and others, may not faithfully replicate natural signals, thus potentially influencing the reliability of the observed behaviors. Future studies might consider replicating these findings using either natural signals or improved mimics, which could include harmonic components (excluded in this study).”

      Recommendations for the authors:

      8Reviewer #2 (Recommendations For The Authors):*

      (1) Abstract: "...is probably the most intensely studied species..." is a weak, unsupported, and unnecessary statement. Just state that it has been heavily studied, or is one of the most well-studied,...

      rephrased

      (2) Abstract: "...are thus used as references to specific internal states during recordings - of either the brain or the electric organ..." This was not clear to me.

      rephrased

      (3) Abstract: "...the logic underlying this electric communication..." It is not clear to me what the authors mean here by "logic".

      rephrased

      (4) I strongly recommend clearly defining homeoactive sensing and distinguishing it from allocative sensing when this term is first introduced in the introduction. This is not a commonly used term. Most readers likely think they understand what is meant by the term active sensing, however I recommend first defining it, and then distinguishing amongst these two different types of active sensing.

      rephrased

      (5) Introduction: "Together with a few other species (Rose, 2004),..." More than a few. There are hundreds of species with electric organs. It is certainly not a "unique" capability.

      rephrased

      (6) Introduction: "But the real advantage of active electrolocation can be appreciated in the context of social interaction." This is unclear. Why is this the "real advantage" of active electrolocation when an electrically silent fish could detect an electrically communicating fish just fine without interference? Active electrolocation is needed to detect objects that are not actively emitting an electric field. It is not needed to detect signaling individuals.

      rephrased

      (7) Introduction: why is active sensing using EODs limited to distances of 6-12 cm? Why does it not work at closer range?

      Here we meant to give a range based on published data. We rephrased it to “up to 12”.

      (8) Introduction: electric fields decay with the cubed of distance, as you show in appendix 1.

      rephrased

      (9) Introduction: it is not clear what is meant by "blurred EOD amplitude".

      rephrased (“noisy”)

      (10) Figure 2C is very challenging to interpret. I recommend spending more time in the manuscript walking the reader through this analysis and its presentation.

      We are grateful for the comment as we probably overlooked this point. We now added a small paragraph to explain these data in better detail.

      (11) Results: "This was done by calculating the ratio between the duration of the beat cycles affected by the chirp (beat interpeak intervals) and the total duration of the beat cycles detected within a fixed time window (roughly double the size of the maximum chirp duration, 700 ms)." This was not clear to me.

      We now rephrased to “Estimates of beat interference were made by calculating the ratio between the cumulative duration of the beat cycles affected by a given chirp (1 beat cycle corresponding to the beat comprised by two consecutive beat peaks, or - more simply - the beat inter-peak interval) over the cumulative duration of all the beat cycles within the time window used as a reference (700 ms; other analysis windows were tested Figure S9)” to clarify this method.

      (12) Results: "For each chirp, the interference values obtained for 4 different phases (90{degree sign} steps) were averaged." Why was this done?

      To consider an average effect across phases. Although it is true that chirp parameters may have a different impact on the beat, depending on EOD phase, including this parameter in our figure/s would have considerably increased the volume of data reported giving too much emphasis to an analysis we judged not crucially important. In addition, since we did not consider EOD phase in our recordings, we opted for an average estimate encompassing different phase values.

      (13) Discussion: "Third, observations in a few species are generalized to all other gymnotiforms without testing for species differences (Turner et al., 2007; Smith et al., 2013; Petzold et al., 2016)." I strongly disagree with this statement. First, the studies referenced here do explicitly compare chirps across species. Second, you only studied one species here, so it is not clear to me how this is a relevant concern in interpreting your findings.

      Here we have probably been unclear in the writing: the point we wanted to make is that the idea of chirps having semantic content has been generalized to other species without investigating the nature of their chirping with as much detail as done for brown ghosts.

      We have now rephrased the statement and changed it to: “Second, observations in a few species are generalized to all other gymnotiforms without testing whether chirping may have similar functions in other species (Turner et al., 2007; Smith et al., 2013; Petzold et al., 2016)”

      (14) Discussion: "The two beats could be indistinguishable (assuming that the mechanism underlying the discrimination of the sign of DF at low DFs, and thought to be the basis of the so called jamming avoidance response (JAR; Metzner, 1999), is not functional at higher DFs)." Why would you assume this?

      What we meant here is that it is unlikely that the two DFs are not discriminated by the same mechanisms implied in the JAR, even if the DF is higher than the levels at which usually JARs are detected (i.e. DF = 1-10 Hz?). To improve clarity, we rephrased this statement. “The two beats could be indistinguishable (assuming - perhaps not realistically - that the same mechanism involved in DF discrimination at lower DF values would not work in this case; Metzner, 1999)”.

      (15) Discussion: "...an idea which seems congruent with published electrophysiological studies..." How so?

      Rephrased to “Based on our beat interference estimates, we propose that the occurrence of the different types of chirps at more positive DFs (such as in male-to-female chirping) may be explained by their different effect on the beat (Figure 5D; Benda et al., 2006; Walz et al., 2013).”

      Reviewer #3 (Recommendations For The Authors):

      On p.2 there is a discrepancy between the quoted ranges for active sensing of objects, first 10-12 cm, and then 6-12 cm further down. And in the following paragraph right below this passage, electric fields are said to decay with the squared distance (appendix 1). That expression has a cos(theta) which is inversely proportional to the distance, and so one is really dealing, as expected for dipolar fields, with a drop-off that decays with the distance cubed.

      We thank the Reviewer for the comment, we have now corrected the mistake and added “cubed”. We also removed the imprecise reference to the range 6-12 cm, rephrased to “up to 12 cm”.

      At the end of the section on Inconsistencies..., it is not clear what "activity levels" refers to. It should also be made clearer at the outset, and reminded in this section too, that for the authors, behavioural context does not include social experience, which is somewhat counter-intuitive.

      We now specified we meant “locomotor activity levels”. Regarding the social experience we included it as “behavioral context”, we now made it clearer in the first result paragraph. We hope we resolved the confusion.

      The caption of Fig.8 could use more clarity in terms of what is being compared in (C) (and is "1*2p" a typo?)

      We corrected the typo and edited the figure to make the references more clear.

      The concept of "high self-correlation of chirp time series" is presented only in the Conclusion using those words. The word self-correlation is not used beforehand. This needs to be fixed so the reader knows clearly what is being referred to.

      Thank you for noting this. We have now changed the wording using the term “auto-correlation” and changed a statement at the beginning of the “interference” result paragraph accordingly, removing references to self-correlation.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We thank the reviewers for their thorough re-evaluation of our revised manuscript. Addressing final issues they raised has improved the manuscript further. We sincerely appreciate the detailed explanations that the reviewers provided in the "recommendations for authors" section. This comprehensive feedback helped us identify the sources of ambiguity within the analysis descriptions and in the discussion where we interpreted the results. Below, you will find our responses to the specific comments and recommendations.

      Reviewer #1 (Recommendations):

      (1) I find that the manuscript has improved significantly from the last version, especially in terms of making explicit the assumptions of this work and competing models. I think the response letter makes a good case that the existence of other research makes it more likely that oscillators are at play in the study at hand (though the authors might consider incorporating this argumentation a bit more into the paper too). Furthermore, the authors' response that the harmonic analysis is valid even when including x=y because standard correlation analysis were not significant is a helpful response. The key issue that remains for me is that I have confusions about the additional analyses prompted by my review to a point where I find it hard to evaluate how and whether they demonstrate entrainment or not. 

      First, I don't fully understand Figure 2B and how it confirms the Arnold tongue slice prediction. In the response letter the authors write: "...indicating that accuracy increased towards the preferred rate at fast rates and decreased as the stimulus rate diverged from the preferred rate at slow rates". The figure shows that, but also more. The green line (IOI < preferred rate) indeed increases toward the preferred rate (which is IOI = 0 on the x-axis; as I get it), but then it continues to go up in accuracy even after the preferred rate. And for the blue line, performance also continues to go up beyond preferred rate. Wouldn't the Arnold tongue and thus entrainment prediction be that accuracy goes down again after the preferred rate has passed? That is to say, shouldn't the pattern look like this (https://i.imgur.com/GPlt38F.png) which with linear regression should turn to a line with a slope of 0?

      This was my confusion at first, but then I thought longer about how e.g. the blue line is predicted only using trials with IOI larger than the preferred rate. If that is so, then shouldn't the plot look like this? (https://i.imgur.com/SmU6X73.png). But if those are the only data and the rest of the regression line is extrapolation, why does the regression error vary in the extrapolated region? It would be helpful if the authors could clarify this plot a bit better. Ideally, they might want to include the average datapoints so it becomes easier to understand what is being fitted. As a side note, colours blue/green have a different meaning in 2B than 2D and E, which might be confusing. 

      We thank the reviewer for their recommendation to clarify the additional analyses we ran in the previous revision to assess whether accuracy systematically increased toward the preferred rate estimate. We realized that the description of the regression analysis led to misunderstandings. In particular, we think that the reviewer interpreted (1) our analysis as linear regression (based on the request to plot raw data rather than fits), whereas, in fact, we used logistic regression, and (2) the regression lines in Figure 2B as raw IOI values, while, in fact, they were the z-scored IOI values (from trials where stimulus IOI were faster than an individual’s preferred rate, IOI < preferred rate, in green; and from trials stimulus IOI were slower than an individual’s preferred rate, IOI > preferred rate, in blue), as the x axis label depicted. We are happy to have the opportunity to clarify these points in the manuscript. We have also revised Figure 2B, which was admittedly maybe a bit opaque, to more clearly show the “Arnold tongue slice”.  

      The logic for using (1) logistic regression with (2) Z-scored IOI values as the predictor is as follows. Since the response variable in this analysis, accuracy, was binary (correct response = 1, incorrect response = 0), we used a logistic regression. The goal was to quantify an acrosssubjects effect (increase in accuracy toward preferred rate), so we aggregated datasets across all participants into the model. The crucial point here is that each participant had a different preferred rate estimate. Let’s say participant A had the estimate at IOI = 400 ms, and participant B had an estimate at IOI = 600 ms. The trials where IOI was faster than participant A’s estimate would then be those ranging from 200 ms to 398 ms, and those that were slower would range from 402 ms to 998 ms. For Participant B, the situation would be different:  trials where IOI was faster than their estimate would range from 200 ms to 598 ms, and slower trials would range between 602 ms to 998 ms. For a fair analysis that assesses the accuracy increase, regardless of a participant’s actual preferred rate, we normalized these IOI values (faster or slower than the preferred rate). Zscore normalization is a common method of normalizing predictors in regression models, and was especially important here since we were aggregating predictors across participants, and the predictors ranges varied across participants. Z-scoring ensured that the scale of the sample (that differs between participant A and B, in this example) was comparable across the datasets. This is also important for the interpretation of Figure 2B. Since Z-scoring involves mean subtraction, the zero point on the Z-scaled IOI axis corresponds to the mean of the sample prior to normalization (for Participant A: 299 ms, for Participant B: 399 ms) and not the preferred rate estimate. We have now revised Figure 2B in a way that we think makes this much clearer.  

      The manuscript text includes clarification that the analyses included logistic regression and stimulus IOI was z-scored: 

      “In addition to estimating the preferred rate as stimulus rates with peak performance, we investigated whether accuracy increased as a function of detuning, namely, the difference between stimulus rate and preferred rate, as predicted by the entrainment models (Large, 1994; McAuley, 1995; Jones, 2018). We tested this prediction by assessing the slopes of mixed-effects logistic regression models, where accuracy was regressed on the IOI condition, separately for stimulus rates that were faster or slower than an individual’s preferred rate estimate. To do so, we first z-scored IOIs that were faster and slower than the participant’s preferred rate estimates, separately to render IOI scales comparable across participants.” (p. 7)

      While thinking through the reviewer’s comment, we realized we could improve this analysis by fitting mixed effects models separately to sessions’ data. In these models, fixed effects were z-scored IOI and ‘detuning direction’ (i.e., whether IOI was faster or slower than the participant’s preferred rate estimate). To control for variability across participants in the predicted interaction between z-scored IOI and direction, this interaction was added as a random effect. 

      “Ideally, they might want to include the average datapoints so it becomes easier to understand what is being fitted.”

      Although we agree with the reviewer that including average datapoints in a figure in addition to model predictions usually better illustrates what is being fitted than the fits alone, this doesn’t work super well for logistic regression, since the dependent variable is binary. To try to do a better job illustrating single-participant data though, we instead  fitted logistic models to each participant’s single session datasets, separately to conditions where z-scored IOI from fasterthan-preferred rate trials, and those from slower-than-preferred rate trials, predicted accuracy. From these single-participant models, we obtained slope values, we referred to as ‘relative detuning slope’, for each condition and session type. This analysis allowed us to illustrate the effect of relative detuning on accuracy for each participant. Figure 2B now shows each participant’s best-fit lines from each detuning direction condition and session.

      Since we now had relative detuning slopes for each individual (which we did not before), we took advantage of this to assess the relationship between oscillator flexibility and the oscillator’s behavior in different detuning situations (how strongly leaving the preferred rate hurt accuracy, as a proxy for the width of the Arnold tongue slice). Theoretically, flexible oscillators should be able to synchronize to wide range of rates, not suffering in conditions where detuning is large (Pikovsky et al., 2003). Conversely, synchronization of inflexible oscillators should depend strongly on detuning. To test whether our flexibility measure predicted this dependence on detuning, which is a different angle on oscillator flexibility, we first averaged each participant’s detuning slopes across detuning directions (after sign-flipping one of them). Then, we assessed the correlation between the average detuning slopes and flexibility estimates, separately from conditions where |-𝚫IOI| or |+𝚫IOI| predicted accuracy. The results revealed significant negative correlations (Fig. 2F), suggesting that performance of individuals with less flexible oscillators suffered more as detuning increased. Note that flexibility estimates quantified how much accuracy decreased as a function of trial-to-trial changes in stimulus rate (±𝚫IOI). Thus, these results show that oscillators that were robust to changes in stimulus rate were also less dependent on detuning to be able to synchronize across a wide range of stimulus rates. We are excited to be able to provide this extra validation of predictions made by entrainment models. 

      To revise the manuscript with the updated analysis on detuning:

      • We added the descriptions of the analyses to the Experiment 1 Methods section.

      Calculation of detuning slopes and their averaging procedure are in Preferred rate estimates:

      “In addition to estimating the preferred rate as stimulus rates with peak performance, we investigated whether accuracy increased as a function of detuning, namely, the difference between stimulus rate and preferred rate, as predicted by the entrainment models (Large, 1994; McAuley, 1995; Jones, 2018). We tested this prediction by assessing the slopes of mixed-effects logistic regression models, where accuracy was regressed on the IOI condition, separately for stimulus rates that were faster or slower than an individual’s preferred rate estimate. To do so, we first z-scored IOIs that were faster and slower than the participant’s preferred rate estimates, separately to render IOI scales comparable across participants. The detuning direction (i.e., whether stimulus IOI was faster or slower than the preferred rate estimate) was coded categorically. Accuracy (binary) was predicted by these variables (zscored IOI, detuning direction), and their interaction. The model was fitted separately to datasets from random-order and linear-order sessions, using the fitglme function in MATLAB. Fixed effects were z-scored IOI and detuning direction and random effect was their interaction. We expected a systematic increase in performance toward the preferred rate, which would result in a significant interaction between stimulus rate and detuning direction. To decompose the significant interaction and to visualize the effects of detuning, we fitted separate models to each participant’s single-session datasets, and obtained slopes from each direction condition, hereafter denoted as the ‘relative-detuning slope’. We treated relative-detuning slope as an index of the magnitude of relative detuning effects on accuracy. We then evaluated these models, using the glmval function in MATLAB to obtain predicted accuracy values for each participant and session. To visualize the relative-detuning curves, we averaged the predicted accuracies across participants within each session, separately for each direction condition (faster or slower than the preferred rate). To obtain a single value of relative-detuning magnitude for each participant, we averaged relative detuning slopes across direction conditions. However, since slopes from IOI > preferred rate conditions quantified an accuracy decrease as a function of detuning, we sign-flipped these slopes before averaging. The resulting average relative detuning slopes, obtained from each participant’s single-session datasets, quantified how much the accuracy increase towards preferred rate was dependent on, in other words, sensitive to, relative detuning.” (p. 7-8)

      • We added the information on the correlation analyses between average detuning slopes in Flexibility estimates.

      “We further tested the relationship between the flexibility estimates (𝛽 from models where |𝚫IOI| or |+𝚫IOI| predicted accuracy) and average detuning slopes (see Preferred rate estimates) from random-order sessions. We predicted that flexible oscillators (larger 𝛽) would be less severely affected by detuning, and thus have smaller detuning slopes. Conversely, inflexible oscillators (smaller 𝛽) should have more difficulty in adapting to a large range of stimulus rates, and their adaptive abilities should be constrained around the preferred rate, as indexed by steeper relative detuning slopes.” (p. 8)

      • We provided the results in Experiment 1 Results section.

      “Logistic models assessing a systematic increase in accuracy toward the preferred rate estimate in each session type revealed significant main effects of IOI (linear-order session: 𝛽 = 0.264, p < .001; random-order session: 𝛽 = 0.175, p < .001), and significant interactions between IOI and direction (linear-order session: 𝛽 = -0.444, p < .001; random-order session: 𝛽 = -0.364, p < .001), indicating that accuracy increased as fast rates slowed toward the preferred rate (positive slopes) and decreased again as slow rates slowed further past the preferred rate (negative slopes), regardless of the session type. Fig. 2B illustrates the preferred rate estimation method for an example participant’s dataset and shows the predicted accuracy values from models fitted to each participant’s single-session datasets. Note that the main effect and interaction were obtained from mixed effects models that included aggregated datasets from all participants, whereas the slopes quantifying the accuracy increase as a function of detuning (i.e., relative detuning slopes) were from models fitted to single-participant datasets.” (p. 9-10)

      “We tested the relationship between the flexibility estimates and single-participant relative detuning slopes from random-order sessions (Fig. 2B). The results revealed negative correlations between the relative detuning slopes and flexibility estimates, both with 𝛽 (r(23) =0.529, p = 0.007) from models where |-𝚫IOI| predicted accuracy (adapting to speeding-up trials), and 𝛽 (r(23) =-0.580, p = 0.002) from models where |+𝚫IOI| predicted accuracy (adapting to slowing-down trials). That is, the performance of individuals with less flexible oscillators suffered more as detuning increased. These results are shown in Fig. 2F.” (p. 10)

      • We modified Figure 2. In Figure 2B, there are now separate subfigures with the z-scored IOI faster (left) or slower (right) than the preferred rate predicting accuracy. We illustrated the correlations between average relative detuning slopes and flexibility estimates in Figure 2F. 

      Author response image 1.

      Main findings of Experiment 1. A Left: Each circle represents a single participant’s preferred rate estimate from the random-order session (x axis) and linear-order session (y axis). The histograms along the top and right of the plot show the distributions of estimates for each session type. The dotted and dashed lines respectively represent 1:2 and 2:1 ratio between the axes, and the solid line represents one-to-one correspondence. Right: permutation test results. The distribution of summed residuals (distance of data points to the closest y=x, y=2*x and y=x/2 lines) of shuffled data over 1000 iterations, and the summed residual from original data (dashed line) that fell below .008 of the permutation distribution. B Top: Illustration of the preferred rate estimation method from an example participant’s linear-order session dataset. Estimates were the stimulus rates (IOI) where smoothed accuracy (orange line) was maximum (arrow). The dotted lines originating from the IOI axis delineate the stimulus rates that were faster (left, IOI < preferred rate) and slower (right, IOI > preferred rate) than the preferred rate estimate and expand those separate axes, the values of which were Z-scored for the relative-detuning analysis. Bottom: Predicted accuracy, calculated from single-participant models where accuracy in random-order (purple) and linear-order (orange) sessions was predicted by z-scored IOIs that were faster than a participant’s preferred rate estimate (left), and by those that were slower (right). Thin lines show predicted accuracy from single-participant models, solid lines show the averages across participants and the shaded areas represent standard error of the mean. Predicted accuracy is maximal at the preferred rate and decreases as a function of detuning. C Average accuracy from random-order (left, purple) and linear-order (right, orange) sessions. Each circle represents a participant’s average accuracy. D Flexibility estimates. Each circle represents an individuals’ slope (𝛽) obtained from logistic models, fitted separately to conditions where |𝚫IOI| (left, green) or |+𝚫IOI| (right blue) predicted accuracy, with greater values (arrow’s direction) indicating better oscillator flexibility. The means of the distributions of 𝛽 from both conditions were smaller than zero (dashed line), indicating a negative effect of between-trial absolute rate change on accuracy. E Participants’ average bias from |𝚫IOI| (green), and |+𝚫IOI| (blue) conditions in random-order (left) and linear-order (right) sessions. Negative bias indicates underestimation of the comparison intervals, positive bias indicates the opposite. Box plots in C-E show median (black vertical line), 25th and 75th percentiles (box edges) and extreme datapoints (whiskers). In C and E, empty circles show outlier values that remained after data cleaning procedures. F Correlations between participants’ average relative detuning slopes, indexing the steepness of the increase in accuracy towards the preferred rate estimate (from panel B), and flexibility estimates from |-𝚫IOI| (top, green), and |+𝚫IOI| (bottom, blue) conditions (from panel C). Solid black lines represent the best-fit line, dashed lines represent 95% confidence intervals.

      • We discussed the results in General Discussion and emphasized that only entrainment models, compared to timekeeper models, predict a relationship between detuning and accuracy that is amplified by oscillator’s inflexibility: “we observed systematic increases in task accuracy (Experiment 1) toward the best-performance rates (i.e., preferred rate estimates), with the steepness of this increase being closely related to the effects of rate change (i.e., oscillator flexibility). Two interdependent properties of an underlying system together modulating an individual’s timing responses show strong support for the entrainment approach” (p. 24)

      “As a side note, colours blue/green have a different meaning in 2B than 2D and E, which might be confusing.” 

      Upon the reviewer’s recommendation, we changed the color scale across Figure 2, such that colors refer to the same set of conditions across all panels. 

      (2) Second, I don't understand the additional harmonic relationship analyses in the appendix, and I suspect other readers will not either. As with the previous point, it is not my view that the analyses are faulty or inadequate, it is rather that the lack of clarity makes it challenging to evaluate whether they support an entrainment model or not. 

      We decided to remove the analysis that was based on a circular approach, and we have clarified the analysis that was based on a modular approach by giving example cases: 

      “We first calculated how much the slower estimate (larger IOI value) diverts, proportionally from the faster estimate (smaller IOI value) or its multiples (i.e., harmonics) by normalizing the estimates from both sessions by the faster estimate. The outcome measure was the modulus of the slower, with respect to the faster estimate, divided by the faster estimate, described as mod(max(X), min(X))/min(X) where X = [session1_estimate session2_estimate]. An example case would be a preferred rate estimate of IOI = 603 ms from the linear-order session and an estimate of IOI = 295 ms from the random-order session. In this case, the slower estimate (603 ms) diverts from the multiple of the faster estimate (295*2 = 590 ms) by 13 ms, a proportional deviation of 4% of the faster estimate (295 ms). The outcome measure in this example is calculated as mod(603,295)/295 = 0.04.” (Supplementary Information, p. 2)

      Crucially, the ability of oscillators to respond to harmonically-related stimulus rates is a main distinction between entrainment and interval (timekeeper) models. In the current study, we found that each participant’s best-performance rates, the preferred rate estimates, had harmonic relationships. The additional analyses further showed that these harmonic relationships were not due to chance. This finding speaks against the interval (timekeeper) approaches and is maximally compatible with the entrainment framework. 

      Here are a number of questions I would like to list to sketch my confusion: 

      • The authors write: "We first normalized each participant's estimates by rescaling the slower estimate with respect to the faster one and converting the values to radians". Does slower estimate mean: "task accuracy in those trials in which IOI was slower than a participant's preferred frequency"? 

      Preferred rate estimates were stimulus rates (IOI) with best performance, as described in Experiment 1 Methods section. 

      “We conceptualized individuals' preferred rates as the stimulus rates where durationdiscrimination accuracy was highest. To estimate preferred rate on an individual basis, we smoothed response accuracy across the stimulus-rate (IOI) dimension for each session type, using the smoothdata function in Matlab. Estimates of preferred rate were taken as the smoothed IOI that yielded maximum accuracy” (p. 7). 

      The estimation method and the resulting estimate for an example participant was provided in Figure 2B. The updated figure in the current revision has this illustration only for linear-order session. 

      “Estimates were the stimulus rates (IOI) where smoothed accuracy (orange line) was maximum (arrow)” (Figure caption, p. 9).

      • "We reasoned that values with integer-ratio relationships should correspond to the same phase on a unit circle". What is values here; IOI, or accuracy values for certain IOIs? And why should this correspond to the same phase? 

      We removed the analysis on integer-ratio relationships that was based on a circular approach that the reviewer is referring to here. We clarified the analysis that was based on a modular approach and avoided using the term ‘values’ without specifying what values corresponded to.

      • Des "integer-ratio relationships" have to do with the y=x, y=x*2 and y=x/2 relationships of the other analyses?  

      Integer-ratio relationships indeed refer to y=x, y=x*2 and y=x/2 relationships. For example, if a number y is double of another number x (y = x*2), these values have an integer-ratio relationship, since 2 is an integer. This holds true also for the case where y = x/2 since x = y*2. 

      • Supplementary Figure S2c shows a distribution of median divergences resulting from the modular approach. The p-value is 0.004 but the dashed line appears to be at a much higher percentile of the distribution. I find this hard to understand. 

      We thank the reviewer for a detailed inspection of all figures and information in the manuscript. The reviewer’s comment led us to realize that this figure had an error. We updated the figure in Supplementary Information (Supplementary Figure S2). 

      Reviewer #2 (Public Review):

      To get a better understanding of the mechanisms underlying the behavioral observations, it would have been useful to compare the observed pattern of results with simulations done with existing biophysical models. However, this point is addressed if the current study is read along with this other publication of the same research group: Kaya, E., & Henry, M. J. (2024, February 5). Modeling rhythm perception and temporal adaptation: top-down influences on a gradually decaying oscillator.       https://doi.org/10.31234/osf.io/q9uvr 

      We agree with the reviewer that the mechanisms underlying behavioral responses can be better understood by modeling approaches. We thank the reviewer for acknowledging our computational modeling study that addressed this concern. 

      Reviewer #2 (Recommendations):

      I very much appreciate the thorough work done by the authors in assessing all reviewers' concerns. In this new version they clearly state the assumptions to be tested by their experiments, added extra analyses further strengthening the conclusions and point the reader to a neurocomputational model compatible with the current observations. 

      I only regret that the authors misunderstood the take home message of our Essay (Doelling & Assaneo 2021). Despite this being obviously out of the scope of the current work, I would like to take this opportunity to clarify this point. In that paper, we adopted a Stuart-Landau model not to determine how an oscillator should behave, but as an example to show that some behaviors usually used to prove or refute an underlying "oscillator like" mechanism can be falsified. We obviously acknowledge that some of the examples presented in that work are attainable by specific biophysical models, as explicitly stated in the essay: "There may well be certain conditions, equations, or parameters under which some of these commonly held beliefs are true. In that case, the authors who put forth these claims must clearly state what these conditions are to clarify exactly what hypotheses are being tested." 

      This work did not mean to delineate what oscillator is (or in not), but to stress the importance of explicitly introducing biophysical models to be tested instead of relying on vague definitions sometimes reflecting the researchers' own beliefs. The take home message that we wanted to deliver to the reader appears explicitly in the last paragraph of that essay: "We believe that rather than concerning ourselves with supporting or refuting neural oscillators, a more useful framework would be to focus our attention on the specific neural dynamics we hope to explain and to develop candidate quantitative models that are constrained by these dynamics. Furthermore, such models should be able to predict future recordings or be falsified by them. That is to say that it should no longer be sufficient to claim that a particular mechanism is or is not an oscillator but instead to choose specific dynamical systems to test. In so doing, we expect to overcome our looping debate and to ultimately develop-by means of testing many model types in many different experimental conditions-a fundamental understanding of cognitive processes and the general organization of neural behavior." 

      We appreciate the reviewer’s clarification of the take-home message from Doelling and Assaneo (2021). We concur with the assertions made in this essay, particularly regarding the benefits of employing computational modeling approaches. Such methodologies provide a nuanced and wellstructured foundation for theoretical predictions, thereby minimizing the potential for reductionist interpretations of behavioral or neural data.

      In addition, we would like to underscore the significance of delineating the level of analysis when investigating the mechanisms underlying behavioral or neural observations. The current study or Kaya & Henry (2024) involved no electrophysiological measures. Thus, we would argue that the appropriate level of analysis across our studies concerns the theoretical mechanisms rather than how these mechanisms are implemented on the neural (physical) level. In both studies, we aimed to explore or approximate the theoretical oscillator that guides dynamic attention rather than the neural dynamics underlying these theoretical processes. That is, theoretical (attentional) entrainment may not necessarily correspond to neural entrainment, and differentiating these levels could be informative about the parallels and differences between these levels. 

      References

      Doelling, K. B., & Assaneo, M. F. (2021). Neural oscillations are a start toward understanding brain activity rather than the end. PLoS Biol, 19(5), e3001234. https://doi.org/10.1371/journal.pbio.3001234  Jones, M. R. (2018). Time will tell: A theory of dynamic attending. Oxford University Press. 

      Kaya, E., & Henry, M. J. (2024). Modeling rhythm perception and temporal adaptation: top-down influences on a gradually decaying oscillator. PsyArxiv. https://doi.org/https://doi.org/10.31234/osf.io/q9uvr 

      Large, E. W. (1994). Dynamic representation of musical structure. The Ohio State University. 

      McAuley, J. D. (1995). Perception of time as phase: Toward an adaptive-oscillator model of rhythmic pattern processing Indiana University Bloomington]. 

      Pikovsky, A., Rosenblum, M., & Kurths, J. (2003). Synchronization: A Universal Concept in Nonlinear Sciences. Cambridge University Press.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We thank you for the time you took to review our work and for your feedback! 

      The major changes to the manuscript are:

      (1) We have added visual flow speed and locomotion velocity traces to Figure 5 as suggested.

      (2) We have rephrased the abstract to more clearly indicate that our statement regarding acetylcholine enabling faster switching of internal representations in layer 5 is speculative.

      (3) We have further clarified the positioning of our findings regarding the basal forebrain cholinergic signal in visual cortex in the introduction.

      (4) We have added a video (Video S1) to illustrate different mouse running speeds covered by our data.

      A detailed point-by-point response to all reviewer concerns is provided below.

      Reviewer #1 (Recommendations For The Authors):

      The authors have addressed most of the concerns raised in the initial review. While the paper has been improved, there are still some points of concern in the revised version. 

      Major comments

      (1) Page 1, Line 21: The authors claim, "Our results suggest that acetylcholine augments the responsiveness of layer 5 neurons to inputs from outside of the local network, enabling faster switching between internal representations during locomotion." However, it is not clear which specific data or results support the claim of "switching between internal representations." ... 

      Authors' response: "... That acetylcholine enables a faster switching between internal representations in layer 5 is a speculation. We have attempted to make this clearer in the discussion. ..." 

      In the revised version, there is no new data added to directly support the claim - "Our results suggest acetylcholine ..., enabling faster switching between internal representations during locomotion" (in the abstract). The authors themselves acknowledge that this statement is speculative. The present data only demonstrate that ACh reduces the response latency of L5 neurons to visual stimuli, but not that ACh facilitates quicker transitions in neuronal responses from one visual stimulus to another. To maintain scientific rigor and clarity, I recommend the authors amend this sentence to more accurately reflect the findings. 

      This might be a semantic disagreement? We would argue both a gray screen and a grating are visual stimuli. Hence, we are not sure we understand what the reviewer means by “but not that ACh facilitates quicker transitions in neuronal responses from one visual stimulus to another”. We concur, our data only address one of many possible transitions, but it is a switch between distinct visual stimuli that is sped up by ACh. Nevertheless, we have rephrased the sentence in question by changing “our data suggest” to “based on this we speculate” - but are not sure whether this addresses the reviewer’s concern.  

      (2) Page 4, Line 103: "..., a direct measurement of the activity of cholinergic projection from basal forebrain to the visual cortex during locomotion has not been made." This statement is incorrect. An earlier study by Reimer et al. indeed imaged cholinergic axons in the visual cortex of mice running on a wheel. 

      Authors' response: "We have clarified this as suggested. However, we disagree slightly with the reviewer here. The key question is whether the cholinergic axons imaged originate in basal forebrain. While Reimer et al. 2016 did set out to do this, we believe a number of methodological considerations prevent this conclusion: ... Collins et al. 2023 inject more laterally and thus characterize cholinergic input to S1 and A1, ..."

      The authors pointed out some methodological caveats in previous studies that measured the BF input in V1, and I agree with them on several points. Nonetheless, the statement that "a direct measurement of the activity of cholinergic projection from basal forebrain to visual cortex during locomotion has not been made. ... Prior measurements of the activity of cholinergic axons in visual cortex have all relied on data from a cross of ChAT-Cre mice with a reporter line ..." (Page 4, Line 103) seems to be an oversimplification. In fact, contrary to what the authors noted, Collins et al. (2023) conducted direct imaging of BF cholinergic axons in V1 (Fig. 1) - "Selected axon segments were chosen from putative retrosplenial, somatosensory, primary and secondary motor, and visual cortices". They used a viral approach to express GCaMP in BF axons to bypass the limitations associated with the use of a GCaMP reporter mouse line - "Viral injections were used for BF- ACh studies to avoid imaging axons or dendrites from cholinergic projections not arising from the BF (e.g. cortical cholinergic interneurons)." The authors should reconsider the text. 

      The reason we think that our statement here was – while simplified – accurate, is that Collins et al. do record from cholinergic axons in V1, but they don’t show these data (they only show pooled data across all recordings sites). By superimposing the recording locations of the Collins paper on the Allen mouse brain atlas (Figure R1), we estimate that of the approximately 50 recording sites, most are in somatosensory and somatomotor areas of cortex, and only 1 appears to be in V1, something that is often missed as it is not really highlighted in that paper. If this is indeed correct, we would argue that the data in the Collins et al. paper are not representative of cholinergic activity in visual cortex (we fear only the authors would know for sure). Nevertheless, we have rephrased again. 

      Author response image 1.

      Overlay of the Collins et al. imaging sites (red dots, black outline and dashed circle) on the Allen mouse brain atlas (green shading). Very few (we estimate that it was only 1) of the recording sites appear to be in V1 (the lightest green area), and maybe an additional 4 appear to be in secondary visual areas.  

      Minor comments

      (1) It is unclear which BF subregion(s) were targeted in this study. 

      Authors' response: Thanks for pointing this out. We targeted the entire basal forebrain (medial septum, vertical and horizontal limbs of the diagonal band, and nucleus basalis) with our viral injections. ... We have now added the labels for basal forebrain subregions targeted next to the injection coordinates in the manuscript. 

      The authors provided the coordinates for their virus injections targeting the BF subregions - "(AP, ML, DV (in mm): ... ; +0.6, +0.6, -4.9 (nucleus basalis) ..." Is this the right coordinates for the nucleus basalis? 

      Thank you for catching this - this was indeed incorrect. The coordinates were correct, but our annotation of brain region was not (as the reviewer correctly points out, these coordinates are in the horizontal limb of the diagonal band, not the nucleus basalis). We have corrected this.

      Reviewer #2 (Recommendations For The Authors):

      Thank you for addressing most of the points raised in my original review. I still some concerns relating to the analysis of the data. 

      (1) I appreciate the authors point that getting mice to reliably during head-fixed recordings can require training. Since mice in this study were not trained to run, their low speed of locomotion limits the interpretation of the results. I think this is an important potential caveat and I have retained it in the public review. 

      This might be a misunderstanding. The Jordan paper was a bit of an outlier in that we needed mice to run at very high rates due to fact that our recording times was only minutes. Mice were chosen such that they would more or less continuously run, to maximize the likelihood that they would run during the intracellular recordings. This was what we tried to convey in our previous response. The speed range covered by the analysis in this paper is 0 cm/s to 36 cm/s. 36 cm/s is not far away from the top speed mice can reach on this treadmill (30 cm/s is 1 revolution of the treadmill per second). In our data, the top speed we measured across all mice was 36 cm/s. In the Jordan paper, the peak running speed across the entire dataset was 44 cm/s. Based on the reviewer’s comment, we suspect that the reviewer may be under the impression that 30 cm/s is a relatively slow running speed. To illustrate what this looks like we have made added a video (Video S1) to illustrate different running speeds. 

      (2) The majority of the analyses in the revised manuscript focus on grand average responses, which may mask heterogeneity in the underlying neural populations. This could be addressed by analysing the magnitude and latency of responses for individual neurons. For example, if I understand correctly, the analyses include all neurons, whether or not they are activated, inhibited, or unaffected by visual stimulation and locomotion. For example, while on average layer 2/3 neurons are suppressed by the grating stimulus (Figure 4A), presumable a subset are activated. Evaluating the effects of optogenetic stimulation and locomotion without analyzing them at the level of individual neurons could result in misleading conclusions. This could be presented in the form of a scatter plot, depicting the magnitude of neuronal responses in locomotion vs stationary condition, and opto+ vs no opto conditions. 

      We might be misunderstanding. The first part of the comment is a bit too unspecific to address directly. In cases in which we find the variability is relevant to our conclusions, we do show this for individual cells (e.g.the latencies to running onset are shown as histograms for all cells and axons in Figure S1). It is also unclear to us what the reviewer means by “Evaluating the effects of optogenetic stimulation and locomotion without analyzing them at the level of individual neurons could result in misleading conclusions”. Our conclusions relate to the average responses in L2/3, consistent with the analysis shown. All data will be freely available for anyone to perform follow-up analysis of things we may have missed. E.g., the specific suggestion of presenting the data shown in Figure 4 as a scatter plot is shown below (Figure R2). This is something we had looked at but found not to be relevant to our conclusions. The problem with this analysis is that it is difficult to estimate how much the different sources of variability contribute to the total variability observed in the data, and no interesting pattern is clearly apparent. All relevant and clear conclusions are already captured by the mean differences shown in Figure 4. 

      Author response image 2.

      Optogenetic activation of cholinergic axons in visual cortex primarily enhances responses of layer 5, but not layer 2/3 neurons. Related to Figure 4. (A) Average calcium response of layer 2/3 neurons in visual cortex to full field drifting grating in the absence or presence of locomotion. Each dot is the average calcium activity of an individual neuron during the two conditions. (B) As in A, but for layer 5 neurons. (C) As in A, but comparing the average response while the mice were stationary, to that while cholinergic axons were optogenetically stimulated. (D) As in C, but for layer 5 neurons. (E) Average calcium response of layer 2/3 neurons in visual cortex to visuomotor mismatch, without and with optogenetic stimulation of cholinergic axons in visual cortex. (F) As in E, but for layer 5 neurons. (G) Average calcium response of layer 2/3 neurons in visual cortex to locomotion onset in closed loop, without and with optogenetic stimulation of cholinergic axons in visual cortex. (H) As in G, but for layer 5 neurons.

      (3) To help the reader understand the experimental conditions in open loop experiments, please include average visual flow speed traces for each condition in Figure 5. 

      We have added the locomotion velocity and visual flow speeds to the corresponding conditions in Figure

    1. Author response:

      eLife assessment

      The authors present an algorithm and workflow for the inference of developmental trajectories from single-cell data, including a mathematical approach to increase computational efficiency. While such efforts are in principle useful, the absence of benchmarking against synthetic data and a wide range of different single-cell data sets make this study incomplete. Based on what is presented, one can neither ultimately judge if this will be an advance over previous work nor whether the approach will be of general applicability.

      We thank the eLife editor for the valuable feedback. We wish to emphasize that both, benchmarking against other methods and validation on a synthetic dataset (“dyntoy”) are indeed presented in Supplementary Note, although we failed to sufficiently emphasize it in the main text. 

      We will extend the benchmarking to more TI methods and we will improve the results and discussion sections to present those facts more clearly to the reader.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors present tviblindi, a computational workflow for trajectory inference from molecular data at single-cell resolution. The method is based on (i) pseudo-time inference via expecting hitting time, (ii) sampling of random walks in a directed acyclic k-NN where edges are oriented away from a cell of origin w.r.t. the involved nodes' expected hitting times, and (iii) clustering of the random walks via persistent homology. An extended use case on mass cytometry data shows that tviblindi can be used elucidate the biology of T cell development.

      Strengths:

      - Overall, the paper is very well written and most (but not all, see below) steps of the tviblindi algorithm are explained well.

      - The T cell biology use case is convincing (at least to me: I'm not an immunologist, only a bioinformatician with a strong interest in immunology).

      We thank the reviewer for feedback and suggestions that we will accommodate, we respond point-by-point below

      Weaknesses:

      - The main weakness of the paper is that a systematic comparison of tviblindi against other tools for trajectory inference (there are many) is entirely missing. Even though I really like the algorithmic approach underlying tviblindi, I would therefore not recommend to our wet-lab collaborators that they should use tviblindi to analyze their data. The only validation in the manuscript is the T cell development use case. Although this use case is convincing, it does not suffice for showing that the algorithms's results are systematically trustworthy and more meaningful (at least in some dimension) than trajectories inferred with one of the many existing methods.

      We have compared tviblindi to several trajectory inference methods (Supplementary note section 8.2: Comparison to state-of-the-art methods, namely Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021) and PAGA (scanpy==1.9.3) Wolf et al. (2019).) We will add thorough and systematic comparisons to the other algorithms mentioned by reviewers. We will include extended evaluation on publically available datasets.

      Also, we have successfully used tviblindi to investigate human B-cell development in primary immunodeficiency (manuscript in revisions), double negative T-cells development in ALPS (Autoimmune Lymphoproliferative Syndrome) by mass cytometry (project in progress).

      - The authors' explanation of the random walk clustering via persistent homology in the Results (subsection "Real-time topological interactive clustering") is not detailed enough, essentially only concept dropping. What does "sparse regions" mean here and what does it mean that "persistent homology" is used? The authors should try to better describe this step such that the reader has a chance to get an intuition how the random walk clustering actually works. This is especially important because the selection of sparse regions is done interactively. Therefore, it's crucial that the users understand how this selection affects the results. For this, the authors must manage to provide a better intuition of the maths behind clustering of random walks via persistent homology.

      In order to satisfy both reader types: the biologist and the mathematician, we explain the mathematics in detail in the Supplementary Note, section 4. We will improve the Results text to better point the reader to the mathematical foundations in the Supplementary Note.

      - To motivate their work, the authors write in the introduction that "TI methods often use multiple steps of dimensionality reduction and/or clustering, inadvertently introducing bias. The choice of hyperparameters also fixes the a priori resolution in a way that is difficult to predict." They claim that tviblindi is better than the original methods because "analysis is performed in the original high-dimensional space, avoiding artifacts of dimensionality reduction." However, in the manuscript, tviblindi is tested only on mass cytometry data which has a much lower dimensionality than scRNA-seq data for which most existing trajectory inference methods are designed. Since tviblindi works on a k-NN graph representation of the input data, it is unclear if it could be run on scRNA-seq data without prior dimensionality reduction. For this, cell-cell distances would have to be computed in the original high-dimensional space, which is problematic due to the very high dimensionality of scRNA-seq data. Of course, the authors could explicitly reduce the scope of tviblindi to data of lower dimensionality, but this would have to be stated explicitly.

      In the manuscript we tested the framework on the scRNA-seq data from Park et al 2020 (DOI: 10.1126/science.aay3224). To illustrate that tviblindi can work directly in the high-dimensional space, we applied the framework successfully on imputed 2000 dimensional data.

      The idea behind tviblindi is to be able to work without the necessity to use non-linear dimensionality reduction techniques, which reduce the dimensionality to a very low number of dimensions and whose effects on the data distribution are difficult to predict. On the other hand the use of (linear) dimensionality reduction techniques which effectively suppress noise in the data such as PCA is a good practice (see also response to reviewer 2). We will emphasize this in the revised version and add the results of the corresponding analysis.

      - Also tviblindi has at least one hyper-parameter, the number k used to construct the k-NN graphs (there are probably more hidden in the algorithm's subroutines). I did not find a systematic evaluation of the effect of this hyper-parameter.

      Detailed discussion of the topic is presented in the Supplementary Note, section 8.1, where Spearman correlation coefficient between pseudotime estimated using k=10 and k=50 nearest neighbors was 0.997.   The number k however does affect the number of candidate endpoints. But even when larger k causes spurious connection between unrelated cell fates, the topological clustering of random walks allows for the separation of different trajectories. We will expand the “sensitivity to hyperparameters section” also in response to reviewer 2.

      Reviewer #2 (Public Review):

      Summary:

      In Deconstructing Complexity: A Computational Topology Approach to Trajectory Inference in the Human Thymus with tviblindi, Stuchly et al. propose a new trajectory inference algorithm called tviblindi and a visualization algorithm called vaevictis for single-cell data. The paper utilizes novel and exciting ideas from computational topology coupled with random walk simulations to align single cells onto a continuum. The authors validate the utility of their approach largely using simulated data and establish known protein expression dynamics along CD4/CD8 T cell development in thymus using mass cytometry data. The authors also apply their method to track Treg development in single-cell RNA-sequencing data of human thymus.

      The technical crux of the method is as follows: The authors provide an interactive tool to align single cells along a continuum axis. The method uses expected hitting time (given a user input start cell) to obtain a pseudotime alignment of cells. The pseudotime gives an orientation/direction for each cell, which is then used to simulate random walks. The random walks are then arranged/clustered based on the sparse region in the data they navigate using persistent homology.

      We thank the reviewer for feedback and suggestions that we will accommodate, we respond point-by-point below.

      Strengths:

      The notion of using persistent homology to group random walks to identify trajectories in the data is novel.

      The strength of the method lies in the implementation details that make computationally demanding ideas such as persistent homology more tractable for large scale single-cell data.

      This enables the authors to make the method more user friendly and interactive allowing real-time user query with the data.

      Weaknesses:

      The interactive nature of the tool is also a weakness, by allowing for user bias leading to possible overfitting for a specific data.

      tviblindi is not designed as a fully automated TI tool (although it implements a fully automated module), but as a data driven framework for exploratory analysis of unknown data. There is always a risk of possible bias in this type of analysis - starting with experimental design, choice of hyperparameters in the downstream analysis, and an expert interpretation of the results. The successful analysis of new biological data involves a great deal of expert knowledge which is difficult to a priori include in the computational models.

      tvilblindi tries to solve this challenge by intentionally overfitting the data and keeping the level of resolution on a single random walk. In this way we aim to capture all putative local relationships in the data. The on-demand aggregation of the walks using the global topology of the data allows researchers to use their expert knowledge to choose the right level of detail (as demonstrated in the Figure 4 of the manuscript) while relying on the topological structure of the high dimensional point cloud. At all times tviblindi allows to inspect the composition of the trajectory to assess the variance in the development, possible hubs on the KNN-graph etc.

      The main weakness of the method is lack of benchmarking the method on real data and comparison to other methods. Trajectory inference is a very crowded field with many highly successful and widely used algorithms, the two most relevant ones (closest to this manuscript) are not only not benchmarked against, but also not sited. Including those that specifically use persistent homology to discover trajectories (Rizvi et.al. published Nat Biotech 2017). Including those that specifically implement the idea of simulating random walks to identify stable states in single-cell data (e.g. CellRank published in Lange et.al Nat Meth 2022), as well as many trajectory algorithms that take alternative approaches. The paper has much less benchmarking, demonstration on real data and comparison to the very many other previous trajectory algorithms published before it. Generally speaking, in a crowded field of previously published trajectory methods, I do not think this one approach will compete well against prior work (especially due to its inability to handle the noise typical in real world data (as was even demonstrated in the little bit of application to real world data provided).

      We provide comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021) and PAGA (scanpy==1.9.3) Wolf et al. (2019). We use two datasets: artificial Dyntoy and real mass cytometry thymus+peripheral blood dataset. We thank the reviewer for suggesting specific methods.  CellRank was excluded from the benchmarking as it was originally designed for RNA-velocity data (not available in mass cytometry data), but will include recent upgrade CellRank2 (preprint at doi.org/10.1101/2023.07.19.549685) which offers more flexibility.

      We will add further benchmarking as suggested by the reviewer in the course of revisions.

      Beyond general lack of benchmarking there are two issues that give me particular concern. As previously mentioned, the algorithm is highly susceptible to user bias and overfitting. The paper gives the example (Figure 4) of a trajectory which mistakenly shows that cells may pass from an apoptotic phase to a different developmental stage. To circumvent this mistake, the authors propose the interactive version of tviblindi that allows users to zoom in (increase resolution) and identify that there are in fact two trajectories in one. In this case, the authors show how the author can fix a mistake when the answer is known. However, the point of trajectory inference is to discover the unknown. With so much interactive options for the user to guide the result, the method is more user/bias driven than data-driven. So a rigorous and quantitative discussion of robustness of the method, as well as how to ensure data-driven inference and avoid over-fitting would be useful.

      Local directionality in expression data is a challenge which is not, to our knowledge, solved. And we are not sure it can be solved entirely, even theoretically. The random walks passing “through” the apoptotic phase are biologically infeasible, but it is an (unbiased) representation of what the data look like based on the diffusion model. It is a property of the data (or of the panel design), which has to be interpreted properly rather than a mistake. Of note, except for Monocle3 (which does not provide the directionality) other tested methods did not discover this trajectory at all.

      The “zoom in” has in fact nothing to do with “passing through the apoptosis”. We show how the researcher can investigate the suggested trajectory to see if there is an additional structure of interest and/or relevance. This investigation is still data driven (although not fully automated). Anecdotally in this particular case this branching was discovered by an bioinformatician, who knew nothing about the presence of beta-selection in the data. 

      We show that the trajectory of apoptosis of cortical thymocytes consists of 2 trajectories corresponding to 2 different checkpoints (beta-selection and positive/negative selection). This type of structure, where 2 (or more) trajectories share the same path for most of the time, then diverge only to be connected at a later moment (immediately from the point of view of the beta-selection failure trajectory) is a challenge for TI algorithms and none of tested methods gave a correct result. More importantly there seems to be no clear way to focus on these kinds of structures (common origin and common fate) in TI methods.

      Of note, the “zoom in” is a recommended and convenient method to look for an inner structure, but it does not necessarily mean addition of further homological classes. Indeed, in this case the reason that the structure is not visible directly is the limitation of the dendrogram complexity (only branches containing at least 10% of simulated random walks are shown by default).

      In summary, tviblindi effectively handled all noise in the data that obscured biologically valid trajectories for other methods. We will improve the discussion of the robustness in the reviewed version. 

      Second, the paper discusses the benefit of tviblindi operating in the original high dimensions of the data. This is perhaps adequate for mass cytometry data where there is less of an issue of dropouts and the proteins may be chosen to be large independent. But in the context of single-cell RNA-sequencing data, the massive undersampling of mRNA, as well as high degree of noise (e.g. ambient RNA), introduces very large degree of noise so that modeling data in the original high dimensions leads to methods being fit to the noise. Therefore ALL other methods for trajectory inference work in a lower dimension, for very good reason, otherwise one is learning noise rather than signal. It would be great to have a discussion on the feasibility of the method as is for such noisy data and provide users with guidance. We note that the example scRNA-seq data included in the paper is denoised using imputation, which will likely result in the trajectory inference being oversmoothed as well.

      We agree with the reviewer. In our manuscript we wanted to showcase that tviblindi can directly operate in high-dimensional space (thousands of dimensions) and we used MAGIC imputation for this purpose. This was not ideal. More standard approach, which uses 30-50 PCs as input to the algorithm resulted in equivalent trajectories. We will add this analysis to the study.

      In summary, the fact that tviblindi scales well with dimensionality of the data and is able to work in the original space does not mean that it is always the best option. We will emphasize in the revised paper that we aim to avoid the non-linear dimensional reduction techniques as a data preprocessing tool, as the effect of the reduction is difficult to predict. We will also discuss the preprocessing of scRNA-seq data in greater detail.

      Reviewer #3 (Public Review):

      Summary:

      Stuchly et al. proposed a single-cell trajectory inference tool, tviblindi, which was built on a sequential implementation of the k-nearest neighbor graph, random walk, persistent homology and clustering, and interactive visualization. The paper was organized around the detailed illustration of the usage and interpretation of results through the human thymus system.

      Strengths:

      Overall, I found the paper and method to be practical and needed in the field. Especially the in-depth, step-by-step demonstration of the application of tviblindi in numerous T cell development trajectories and how to interpret and validate the findings can be a template for many basic science and disease-related studies. The videos are also very helpful in showcasing how the tool works.

      Weaknesses:

      I only have a few minor suggestions that hopefully can make the paper easier to follow and the advantage of the method to be more convincing.

      (1) The "Computational method for the TI and interrogation - tviblindi" subsection under the Results is a little hard to follow without having a thorough understanding of the tviblindi algorithm procedures. I would suggest that the authors discuss the uniqueness and advantages of the tool after the detailed introduction of the method (moving it after the "Connectome - a fully automated pipeline".

      We thank the reviewer for the suggestion and we will accommodate it to improve readability of the text.

      Also, considering it is a computational tool paper, inevitably, readers are curious about how it functions compared to other popular trajectory inference approaches. I did not find any formal discussion until almost the end of the supplementary note (even that is not cited anywhere in the main text). Authors may consider improving the summary of the advantages of tviblindi by incorporating concrete quantitative comparisons with other trajectory tools.

      We provide comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021) and PAGA (scanpy==1.9.3) Wolf et al. (2019). We use two datasets: artificial Dyntoy and real mass cytometry thymus+peripheral blood dataset. We will also add CellRank2 into comparisons and we will strengthen the message of the benchmarking results in the Discussion section.

      (2) Regarding the discussion in Figure 4 the trajectory goes through the apoptotic stage and reconnects back to the canonical trajectory with counterintuitive directionality, it can be a checkpoint as authors interpret using their expert knowledge, or maybe a false discovery of the tool. Maybe authors can consider running other algorithms on those cells and see which tracks they identify and if the directionality matches with the tviblindi.

      We have indeed used the thymus dataset for comparison of all TI algorithms listed above. Except for Monocle 3 they failed to discover the negative selection branch (Monocle 3 does not offer directionality information). Therefore, a valid topological trajectory with incorrect (expert-corrected) directionality was partly or entirely missed by other algorithms.

      (3) The paper mainly focused on mass cytometry data and had a brief discussion on scRNA-seq. Can the tool be applied to multimodality data such as CITE-seq data that have both protein markers and gene expression? Any suggestions if users want to adapt to scATAC-seq or other epigenomic data?

      The analysis of multimodal data is the logical next step and is the topic of our current research. At this moment tviblindi cannot be applied directly to multimodal data. It is possible to use the KNN-graph based on multimodal data (such as weighted nearest neighbor graph implemented in Seurat) for pseudotime calculation and random walk simulation. However, we do not have a fully developed triangulation for the multimodal case yet.

    2. Reviewer #2 (Public Review):

      Summary: In Deconstructing Complexity: A Computational Topology Approach to Trajectory Inference in the Human Thymus with tviblindi, Stuchly et al. propose a new trajectory inference algorithm called tviblindi and a visualization algorithm called vaevictis for single-cell data. The paper utilizes novel and exciting ideas from computational topology coupled with random walk simulations to align single cells onto a continuum. The authors validate the utility of their approach largely using simulated data and establish known protein expression dynamics along CD4/CD8 T cell development in thymus using mass cytometry data. The authors also apply their method to track Treg development in single-cell RNA-sequencing data of human thymus.

      The technical crux of the method is as follows: The authors provide an interactive tool to align single cells along a continuum axis. The method uses expected hitting time (given a user input start cell) to obtain a pseudotime alignment of cells. The pseudotime gives an orientation/direction for each cell, which is then used to simulate random walks. The random walks are then arranged/clustered based on the sparse region in the data they navigate using persistent homology.

      Strengths:<br /> The notion of using persistent homology to group random walks to identify trajectories in the data is novel.<br /> The strength of the method lies in the implementation details that make computationally demanding ideas such as persistent homology more tractable for large scale single-cell data. This enables the authors to make the method more user friendly and interactive allowing real-time user query with the data.

      Weaknesses:<br /> The interactive nature of the tool is also a weakness, by allowing for user bias leading to possible overfitting for a specific data.

      The main weakness of the method is lack of benchmarking the method on real data and comparison to other methods. Trajectory inference is a very crowded field with many highly successful and widely used algorithms, the two most relevant ones (closest to this manuscript) are not only not benchmarked against, but also not sited. Including those that specifically use persistent homology to discover trajectories (Rizvi et.al. published Nat Biotech 2017). Including those that specifically implement the idea of simulating random walks to identify stable states in single-cell data (e.g. CellRank published in Lange et.al Nat Meth 2022), as well as many trajectory algorithms that take alternative approaches. The paper has much less benchmarking, demonstration on real data and comparison to the very many other previous trajectory algorithms published before it. Generally speaking, in a crowded field of previously published trajectory methods, I do not think this one approach will compete well against prior work (especially due to its inability to handle the noise typical in real world data (as was even demonstrated in the little bit of application to real world data provided).

      Beyond general lack of benchmarking there are two issues that give me particular concern. As previously mentioned, the algorithm is highly susceptible to user bias and overfitting. The paper gives the example (Figure 4) of a trajectory which mistakenly shows that cells may pass from an apoptotic phase to a different developmental stage. To circumvent this mistake, the authors propose the interactive version of tviblindi that allows users to zoom in (increase resolution) and identify that there are in fact two trajectories in one. In this case, the authors show how the author can fix a mistake when the answer is known. However, the point of trajectory inference is to discover the unknown. With so much interactive options for the user to guide the result, the method is more user/bias driven than data-driven. So a rigorous and quantitative discussion of robustness of the method, as well as how to ensure data-driven inference and avoid over-fitting would be useful.

      Second, the paper discusses the benefit of tviblindi operating in the original high dimensions of the data. This is perhaps adequate for mass cytometry data where there is less of an issue of dropouts and the proteins may be chosen to be large independent. But in the context of single-cell RNA-sequencing data, the massive undersampling of mRNA, as well as high degree of noise (e.g. ambient RNA), introduces very large degree of noise so that modeling data in the original high dimensions leads to methods being fit to the noise. Therefore ALL other methods for trajectory inference work in a lower dimension, for very good reason, otherwise one is learning noise rather than signal. It would be great to have a discussion on the feasibility of the method as is for such noisy data and provide users with guidance. We note that the example scRNA-seq data included in the paper is denoised using imputation, which will likely result in the trajectory inference being oversmoothed as well.

    1. Skip to main content <iframe src="https://www.googletagmanager.com/ns.html?id=GTM-WRSZQF8&gtm_auth=74eL4wQLYRNQ18AwQITlNA&gtm_preview=&gtm_cookies_win=x&noscript=true" height="0" width="0" style="display:none;visibility:hidden"></iframe> $(function(){ var bloxServiceIDs = []; var bloxUserServiceIds = []; var dataLayer = window.dataLayer || []; bloxServiceIDs.push(); if (__tnt.user.services){ var bloxUserServiceIDs = __tnt.user.services.replace('%2C',',').split(','); } // GTM tncms.subscription.paid_access_service_ids if(bloxServiceIDs){ dataLayer.push({'tncms':{'subscription':{'access_service_ids':bloxServiceIDs.toString()}}}); } // GTM tncms.subscrption.user_service_ids if(bloxUserServiceIDs){ dataLayer.push({'tncms':{'subscription':{'user_service_ids':bloxUserServiceIDs.toString()}}}); } }); Toronto.com Home News Business Council Crime Municipal Election Provincial Election Federal Election Bloor West - Parkdale Beach - East York Etobicoke North York Scarborough York - City Centre Topics Events Arts Attractions Community Festivals and Fairs Music Seasonal Shows and Expos Sports Things to Do Books And Authors Contests Food And Drink Opinion Advice Columns Community Voices Editorial Letters Life Fashion And Beauty Obituaries Personal Finance Real Estate Travel Wellness Wheels Special Features Marketplace Readers' Choice Awards Sponsored and Partners Classifieds Site search googletag.cmd.push(function() { googletag.display('ad-1356160'); }); 19°C Wednesday, May 8, 2024 Facebook Twitter Instagram { "@context" : "https://schema.org", "@type" : "Organization", "url" : "http://www.toronto.com", "sameAs" : ["https://www.facebook.com/torontodotcom","https://twitter.com/torontodotcom","https://www.instagram.com/torontodotcom/?hl=en"] } Menu Toronto.com Home News Business Council Crime Municipal Election Provincial Election Federal Election Bloor West - Parkdale Beach - East York Etobicoke North York Scarborough York - City Centre Topics Events Arts Attractions Community Festivals and Fairs Music Seasonal Shows and Expos Sports Things to Do Books And Authors Contests Food And Drink Opinion Advice Columns Community Voices Editorial Letters Life Fashion And Beauty Obituaries Personal Finance Real Estate Travel Wellness Wheels Special Features Marketplace Readers' Choice Awards Sponsored and Partners Classifieds googletag.cmd.push(function() { googletag.display('ad-1360687'); }); googletag.cmd.push(function() { googletag.display('ad-1168968'); }); News Bank of Canada continuing work on updating ‘workhorse’ $20 bill — will feature King Charles III The new $20 note will be vertical, like the current $10 note, and will feature enhanced secu… News Canadian mint commemorates anniversary of King Charles III's coronation with silver dollar collector coin News Toronto's May 8 forecast: Chance of showers By Torstar Open Data Team News Things To Do 16 must-visit holiday events to check out across Ontario before the festive season officially ends From sparkling light festivals to immersive walk-through experiences, check out these festive happenings before the holiday season officially ends News ‘Shines a light’: Canada Post reveals 2024 stamp lineup By Hunter Crowther Canada Post says these stamps will ‘shine a light on truth and reconciliation, the natural world, accomplished Canadians, a rare space sighting and much more’ News Toronto's May 8 forecast: Chance of showers By Torstar Open Data Team News What is the May 2-4 long weekend and why isn't it on the 24th? By Heidi Riedner News Ontario preparing for extreme heat emergencies — are you? Things to Do Things To Do Colm Tóibín never planned a sequel to 'Brooklyn.' Then the opening scene of 'Long Island' came out of the blue By Steven W. Beattie Special to the Star "Long Island" is another brick in the wall of a writer quietly building an edifice that marks him as a master of contemporary literature. Just don’t compare him to James Joyce. Things To Do A relentlessly honest depiction of motherhood: In her debut novel, theatre artist Erin Brubacher explores the hope and heartbreak of creating a child By Aisling Murphy Brubacher’s novel, “These Songs I Know By Heart,” shows off the same flair for dramatic intimacy that makes her such a sought-after collaborator in the theatre world. Things To Do More than 40 music festivals await you in Ontario for 2024 this spring, summer, fall Things To Do A Negroni journey: I travelled to Italy to sip my favourite cocktail in Venice, Florence and Rome By Tim Johnson Special to the Star Contributed Children’s books on nature, dancing, self-confidence and signing! By Glenn Perrett Trending My husband quit his job to pursue his passion. Turns out his 'passion' is his stunning trainer. You won't believe how I caught them. Ask Lisi My friend is so cute and sweet, but he's never had a girlfriend. I think I know why — but telling him might break his heart. Should I do it anyway? Ask Lisi I moved after my husband died and met a man and his young son. One day, we all watched a snail in my garden for 10 minutes. I think the man's wife died. Should I ask him? Ask Lisi My boyfriend is rich — like, rich rich. His mother has never worked and she assumes that I'll give up my dental hygienist career when we get married. Do I have to? Ask Lisi My daughter is getting married. My ex isn't ponying up a dime and refuses to walk our child down the aisle. But now his sister is insisting that his name should be on the invite. No, right? Ask Lisi googletag.cmd.push(function() { googletag.display('ad-1168977'); }); Events Calendar Life Life My friend group is in crisis. Some of them make a ton of money. Most of us don't. Is our friendship doomed? Ask Lisi By Lisi Tesher And Lisi shares thoughtful reader feedback. Life My boyfriend is rich — like, rich rich. His mother has never worked and she assumes that I'll give up my dental hygienist career when we get married. Do I have to? Ask Lisi By Lisi Tesher And Lisi advises a letter writer who is struggling to understand her professor. Life Zendaya, Demi Moore and Lana Del Rey were the 2024 Met Gala best dressed By Liz Guber Life My friend is so cute and sweet, but he's never had a girlfriend. I think I know why — but telling him might break his heart. Should I do it anyway? Ask Lisi By Lisi Tesher Things To Do A Negroni journey: I travelled to Italy to sip my favourite cocktail in Venice, Florence and Rome By Tim Johnson Special to the Star Food & Drink News Starbucks unveiling several new menu items across Canada May 7 and people already have strong reactions online By Louie Rosella Available now. Food And Drink Dairy Queen unveils new Blizzard menu items at restaurants across Canada and people are reacting online By Louie Rosella Available for a limited time. Food And Drink It's time to weigh in on the KitKat break debate with #MyBreak social media posts By Bruce Froude Updated Apr 18, 2024 Food And Drink Tim Hortons to start selling pizza April 17 at restaurants and coffee shops across Canada and the online response has been huge By Louie Rosella Updated Apr 29, 2024 Food And Drink Starbucks and A&W unveil new menu items at restaurants and coffee shops across Canada and here's what people are saying online By Louie Rosella Updated Apr 15, 2024 Opinion Contributed Children’s books on nature, dancing, self-confidence and signing! By Glenn Perrett Glenn Perrett's latest list of recommended books for young readers includes “The Art of Rewilding: The Return of Yellowstone’s Wolves,” “Why We Dance: A Story of Hope and Healing” and “Butterfly On the Wind.” Contributed Tool gift ideas for Mother's Day and Father's Day By Glenn Perrett If you're looking for a gift for mom or dad this spring, Glenn Perrett recommends considering these tools from DeWalt, Irwin and Craftsman. Contributed Education workers frustrated for students as province promises change, delivers more of the same cuts and distraction: union Editorials Monday's highway carnage is yet more proof that police chases are never worth the risk By Star Editorial Board Money Matters ASK THE MONEY LADY: Should I skip the pre-nup to save on legal fees? By Christine Ibbotson googletag.cmd.push(function() { googletag.display('ad-1168974'); }); @media (min-width:768px) { .newsletterSignup {display: flex;justify-content: center;align-items: center;} } .newsletterSignup {background-color: #c4e4c2;text-align:center;padding:15px} .newsletterText {color:black;font-size:20px;/*font-weight:700*/} .newsletterText small {font-family: 'Source Sans Pro', sans-serif; letter-spacing: .10ch;} .newsletterText p { margin: 5px 0;line-height:1;} .newsletterSignupButton{color:white;background-color:#006633;display:inline-block;text-transform: uppercase;font-family: 'Source Sans Pro', sans-serif; letter-spacing: .10ch;-webkit-transition: background .3s ease-in-out; -moz-transition: background .3s ease-in-out; -ms-transition: background .3s ease-in-out; -o-transition: background .3s ease-in-out; transition: background .3s ease-in-out;} .newsletterSignupButton:hover {background-color:#00ac56;color:white} @media (max-width:767px) { .newsletterText {font-size:18px;margin-bottom:15px;} } @media (min-width:992px) { .main-sidebar .newsletterSignup {display: block; max-width: 300px; margin: auto;} } .main-sidebar .newsletterSignup .col-md-8, .main-sidebar .newsletterSignup .col-md-4 {width:100%;} .main-sidebar .newsletterText {font-size:18px;margin-bottom:15px;} HEADLINES NEWSLETTER TOP STORIES, delivered to your inbox. Sign Up Follow us on Facebook (function(d, s, id) { var js, fjs = d.getElementsByTagName(s)[0]; if (d.getElementById(id)) return; js = d.createElement(s); js.id = id; js.src = "//connect.facebook.net/en_US/sdk.js#xfbml=1&version=v2.5&appId=1550124928647000"; fjs.parentNode.insertBefore(js, fjs); }(document, 'script', 'facebook-jssdk')); TOP STORIES, delivered to your inbox.Headlines Newsletter Sign Up googletag.cmd.push(function() { googletag.display('ad-1202244'); }); More News News 'A meaningful difference': Annual McHappy Day returns to McDonald's restaurants for 30th year on May 8 to raise money for charity News 'Critical service': What's happening May 15 on phones in Ontario and what you need to know about it By Louie Rosella News Toronto's May 6 forecast: Mainly sunny By Torstar Open Data Team News Toronto's May 5 forecast: Showers By Torstar Open Data Team Crime TIMELINE OF A TRAGEDY: It started as a liquor store robbery in Bowmanville and ended with four dead on Highway 401 in Whitby By Bruce Froude News Grade 11, 12 students to have more access to skilled trades through co-op programming News Skilled trades in Ontario: What are the industries and jobs in most need? News No GO Train or bus service available May 3 to 5 from Pickering GO to Toronto Union News Toronto's May 3 forecast: Chance of showers By Torstar Open Data Team googletag.cmd.push(function() { googletag.display('ad-1168986'); }); Follow us on Twitter (function(w, d) { var twitterWidget = { init: function () { var twitHolder = d.getElementById("tncms-block-1366069").parentNode, widget = d.getElementById("twitter-widget-1366069"); function handleIntersection(entries) { entries.map((entry) => { if (entry.isIntersecting) { twttr.widgets.createTimeline( { sourceType: "profile", screenName: "torontodotcom" }, d.getElementById("twitter-widget-1366069"), { height: '350' } ).then(function (el) {} ); observer.unobserve(entry.target); } }); } const options = { threshold: 0.1 } const observer = new IntersectionObserver(handleIntersection, options); observer.observe(widget); } } if (d.readyState == "loading") { d.onreadystatechange = function () { if (d.readyState == "complete") { twitterWidget.init(); } } } else { twitterWidget.init(); } })(window, document); googletag.cmd.push(function() { googletag.display('ad-1168962'); }); Helpful Links Classifieds Digital Editions Marketplace Obituaries Sitemap Toronto.com Readers Choice Metroland Gives Back Walk-In Clinics Connect with us About Us Advertising Standards Become a Carrier Contact Us Delivery Concerns Newsletter Signup Feedback Submit a Letter Submit Multimedia Contact Information Phone: 1-833-440-7474 Email: newsroom@toronto.com Follow Us Facebook Twitter Instagram { "@context" : "https://schema.org", "@type" : "Organization", "url" : "http://www.toronto.com", "sameAs" : ["https://www.facebook.com/torontodotcom","https://twitter.com/torontodotcom","https://www.instagram.com/torontodotcom/?hl=en"] } × Browser Compatibility Your browser is out of date and potentially vulnerable to security risks.We recommend switching to one of the following browsers: Microsoft Edge Google Chrome Firefox Copyright 2023 Toronto Star Newspapers Limited. All Rights Reserved. 8 Spadina Avenue, Suite 10A, Toronto, ON M5V 0S8 Corporate Privacy Policy | Terms of Use | Advertising Terms | Accessibility googletag.cmd.push(function() { googletag.display('ad-1168980'); }); window.__tnt = window.__tnt || {}; __tnt.compatibility = __tnt.compatibility || {}; __tnt.compatibility.status = ''; __tnt.compatibility.check = function() { if (typeof __tnt.advertisements == 'undefined') { __tnt.compatibility.status = 'FAIL: object 0 undefined'; return false; } return true; }; __tnt.compatibility.notification = function() { }; (function() { function compatibilityCheck() { if (!__tnt.compatibility.check()) { __tnt.trackEvent({ 'category':'subscription', 'action':'adblock', 'label':'adblock detected', 'value':'1' }); __tnt.compatibility.notification(); } } if (document.readyState != 'loading') { compatibilityCheck(); } else { document.addEventListener('DOMContentLoaded', compatibilityCheck); } })(); jQuery(function() { if(typeof TNCMS.Tracking != 'undefined'){ jQuery(TNCMS.Tracking.trackDeclarativeEvents); }}); __tnt.trackEvent = function(obj) { if (typeof obj === 'object') { if (obj.category && obj.action) { __tnt.googleEvent(obj); } else if (obj.network && obj.socialAction) { __tnt.googleSocial(obj); } else if (obj.url) { __tnt.googlePageView(obj); } if (typeof TNCMS.Tracking != 'undefined' && obj.metric) { TNCMS.Tracking.addEvent({ app: obj.app, metric: obj.metric, id: obj.uuid }); } } }; if (__tnt.trackEventLater.length > 0) { __tnt.trackEventLater.forEach(function(obj) { __tnt.trackEvent(obj); }); } Array.from(document.querySelectorAll('body [data-track]')).forEach(function(el) { el.addEventListener(__tnt.client.clickEvent, function() { __tnt.trackEvent(JSON.parse(el.dataset.track)); }); }); Array.from(document.querySelectorAll('body [data-tncms-track-event]')).forEach(function(el) { el.addEventListener(__tnt.client.clickEvent, function() { __tnt.trackEvent(JSON.parse(el.dataset.tncmsTrackEvent)); }); }); Array.from(document.querySelectorAll('body [data-tncms-track-dmp]')).forEach(function(el) { el.addEventListener(__tnt.client.clickEvent, function() { var dmpData = el.dataset.tncmsTrackDmp; }); }); /*<![CDATA[*/ __tnt.googleEvent = function(obj) { dataLayer.push({ 'event': 'tncms.event.trigger', 'tncms.event.trigger.category': obj.category, 'tncms.event.trigger.action': obj.action, 'tncms.event.trigger.label': obj.label, 'tncms.event.trigger.value': obj.value }); } /* Virtual page view */ __tnt.googlePageView = function(obj) { var sURL = obj.url.replace(/^.*\/\/[^\/]+/, ''); dataLayer.push({ 'event': 'tncms.event.virtual_pageview', 'tncms.event.virtual_pageview.url': sURL, 'tncms.event.virtual_pageview.title': obj.title, 'tncms.event.virtual_pageview.metric': obj.metric }); } /* Social event */ __tnt.googleSocial = function(obj) { dataLayer.push({ 'event': 'tncms.event.social', 'tncms.event.social.network': obj.network, 'tncms.event.social.action': obj.socialAction, 'tncms.event.social.target': obj.url }); } /*]]>*/ /*<![CDATA[*/ { "@context": "https://schema.org", "@type": "WebSite", "url": "https://www.toronto.com", "potentialAction": { "@type": "SearchAction", "target": "https://www.toronto.com/search?q={search_term_string}", "query-input": "required name=search_term_string" } } /*]]>*/ /*<![CDATA[*/ (function(d) { var form = d.getElementById('site-search-1168614'), query_input = d.getElementById('site-search-1168614-term'), search_dropdown = d.getElementById('site-search-1168614-dropdown'); /** Input focus */ try { search_dropdown.onmouseenter = function(){ setTimeout(function(){ query_input.focus(); }, 700); }; } catch (error) { // No dropdown behavior } /** Submit handler */ form.onsubmit = function(){ // Filter query var elem = document.querySelector("#site-search-1168614 input[name=q]"), sQueryFiltered = elem.value.replace(/\?/g, ''); elem.value = sQueryFiltered; // No submit if empty input if( query_input.val() ){ return true; } else{ return false; } };})(document); /*]]>*/ /*<![CDATA[*/ !function(t,i,n){var e,a,s,o,c,d={init:function(){a=i.getElementById("site-navbar-container"),n.client.platform.ios?a.classList.add("affix-sticky"):(e=i.getElementById("main-body-container"),s=a.offsetHeight||a.clientHeight,o=!1,c=0,t.addEventListener("scroll",d.navPosition,!1),t.addEventListener("mousewheel",d.navPosition,!1))},navPosition:function(){o||(o=!0,setTimeout(function(){var n=a.getBoundingClientRect(),d=t.pageYOffset||i.documentElement.scrollTop,f=n.top+d;d>=f&&d>c?a.classList.contains("affix")||(c=f,a.classList.add("affix"),a.classList.remove("affix-top"),e.style.marginTop=s+"px"):a.classList.contains("affix-top")||(a.classList.remove("affix"),a.classList.add("affix-top"),e.style.marginTop="0px"),o=!1},25))}};"loading"==i.readyState?i.addEventListener("DOMContentLoaded",d.init,!1):d.init()}(window,document,__tnt); document.addEventListener('DOMContentLoaded', function() { var isIOS = /iPad|iPhone|iPod/.test(navigator.userAgent) && !window.MSStream; if (isIOS) { Array.from(document.querySelectorAll('[data-toggle="offcanvas"]')).forEach(function(drawer) { drawer.addEventListener("mouseover", function(e) { var drawerCls = drawer.dataset.target === 'left' ? 'active-left' : 'active-right'; document.documentElement.classList.add('drawer-open', drawerCls); }) }) } }); /*]]>*/ /*<![CDATA[*/ (function() { window.addEventListener('load', function() { __tnt.regions.stickySide.init(document.getElementById('sticky-side-primary'), document.getElementById('sticky-side-primary-spacer'), 'siderail', '.row'); }); })(); /*]]>*/ /*<![CDATA[*/ (function() { window.addEventListener('load', function() { __tnt.regions.stickySide.init(document.getElementById('sticky-side-secondary'), document.getElementById('sticky-side-secondary-spacer'), 'siderail', '.row'); }); })(); /*]]>*/ /*<![CDATA[*/ (function() { window.addEventListener('load', function() { __tnt.regions.stickySide.init(document.getElementById('sticky-side-tertiary'), document.getElementById('sticky-side-tertiary-spacer'), 'siderail', '.row'); }); })(); /*]]>*/ /*<![CDATA[*/ document.addEventListener("DOMContentLoaded", __tnt.deprecatedCheck, false); /*]]>*/ /*<![CDATA[*/ __tnt.regions.stickyAnchor.init(); /*]]>*/ _satellite["_runScript1"](function(event, target, Promise) { var existingEcid = _satellite.getVar('cookie:s_ecid'); if (!existingEcid){ var ecid = _satellite.getVisitorId().getMarketingCloudVisitorID(); if (ecid){ var now = new Date(); var time = now.getTime(); var expireTime = time + 1000 * 60 * 60 * 24 * 730; now.setTime(expireTime); var cookieName = "s_ecid"; var cookieValue = "MCMID|" + _satellite.getVisitorId().getMarketingCloudVisitorID(); cookieValue = encodeURIComponent(cookieValue); var cookieString = ""; cookieString = cookieName +'=' + cookieValue + ';expires=' + now.toGMTString() + ';path=/;domain=' + _satellite.getVar('processed:MainDomain'); document.cookie = cookieString; } } });_satellite["_runScript2"](function(event, target, Promise) { "no"===_satellite.getVar("processed:UserLoggedInState")?sessionStorage.setItem("cls","false"):sessionStorage.setItem("cls2","false"); });!function(){var a=window.analytics=window.analytics||[];if(!a.initialize)if(a.invoked)window.console&&console.error&&console.error("Segment snippet included twice.");else{a.invoked=!0;a.methods="trackSubmit trackClick trackLink trackForm pageview identify reset group track ready alias debug page once off on addSourceMiddleware addIntegrationMiddleware setAnonymousId addDestinationMiddleware".split(" ");a.factory=function(b){return function(){var c=Array.prototype.slice.call(arguments);c.unshift(b); a.push(c);return a}};for(var e=0;e<a.methods.length;e++){var f=a.methods[e];a[f]=a.factory(f)}a.load=function(b,c){var d=document.createElement("script");d.type="text/javascript";d.async=!0;d.src="https://cdn.segment.com/analytics.js/v1/"+b+"/analytics.min.js";b=document.getElementsByTagName("script")[0];b.parentNode.insertBefore(d,b);a._loadOptions=c};a._writeKey="YNwPRuYDOjrAr7O9PCSVIw1QoK0Oimn6";a.SNIPPET_VERSION="4.15.3";a.debug(google_tag_manager["rm"]["61227858"](44));a.load("YNwPRuYDOjrAr7O9PCSVIw1QoK0Oimn6");a.ready(function(){var b= window.analytics.user();sUserId=null;b&&(sUserId=b.id()||b.anonymousId());b=new CustomEvent("TownnewsSegmentLoaded",{detail:{analytics:window.analytics,user_id:sUserId}});window.document.dispatchEvent(b)})}}();_satellite["_runScript3"](function(event, target, Promise) { var adWordsPixelId=_satellite.getVar("processed:AdWordsPixelJSON"),pageType=_satellite.getVar("processed:PageType"),template=_satellite.getVar("processed:Template");try{if(adWordsPixelId&&"x"!==adWordsPixelId.accountId){var googleConversionScript=document.createElement("script");function gtag(){dataLayer.push(arguments)}googleConversionScript.type="text/javascript",googleConversionScript.src="https://www.googletagmanager.com/gtag/js?id="+adWordsPixelId.accountId,googleConversionScript.async=!0,document.getElementsByTagName("head")[0].appendChild(googleConversionScript),window.dataLayer=window.dataLayer||[],gtag("config",adWordsPixelId.accountId),setTimeout((function(){!window.newsletterSignupG&&!0===window.atLeastOneSubscribe&&adWordsPixelId.use.newsletterSuccess&&(gtag("event","conversion",{send_to:adWordsPixelId.accountId+"/"+adWordsPixelId.use.newsletterSuccess}),window.newsletterSignupG=!0)}),400)}}catch(e){} });_satellite["_runScript4"](function(event, target, Promise) { var doubleClickPixelId=_satellite.getVar("processed:DoubleClickPixelJSON"),pageType=_satellite.getVar("processed:PageType"),template=_satellite.getVar("processed:Template");try{if(doubleClickPixelId&&"x"!==doubleClickPixelId.accountId){var doubleclickScript=document.createElement("script");function gtag(){dataLayer.push(arguments)}doubleclickScript.type="text/javascript",doubleclickScript.src="https://www.googletagmanager.com/gtag/js?id="+doubleClickPixelId.accountId,doubleclickScript.async=!0,document.getElementsByTagName("head")[0].appendChild(doubleclickScript),window.dataLayer=window.dataLayer||[],gtag("config",doubleClickPixelId.accountId),doubleClickPixelId.use.allPages&&gtag("event","conversion",{allow_custom_scripts:!0,send_to:doubleClickPixelId.accountId+"/"+doubleClickPixelId.use.allPages})}}catch(e){} });_satellite["_runScript5"](function(event, target, Promise) { function waitForTwq(t){counter++,"undefined"!=typeof twq?t():counter>500||setTimeout((function(){waitForTwq(t)}),100)}var twitterPixelId=_satellite.getVar("processed:TwitterPixelJSON"),template=_satellite.getVar("processed:Template"),counter=0;try{twitterPixelId&&"x"!=twitterPixelId.accountId&&"undefined"==typeof twq&&function(t,e,i,n,o,r){t.twq||(n=t.twq=function(){n.exe?n.exe.apply(n,arguments):n.queue.push(arguments)},n.version="1.1",n.queue=[],(o=e.createElement(i)).async=!0,o.src="//static.ads-twitter.com/uwt.js",(r=e.getElementsByTagName(i)[0]).parentNode.insertBefore(o,r))}(window,document,"script")}catch(t){}waitForTwq((function(){twq("config",twitterPixelId.accountId)})); });_satellite["_runScript6"](function(event, target, Promise) { var redditPixelId=_satellite.getVar("processed:RedditPixelJSON"),pageType=_satellite.getVar("processed:PageType"),template=_satellite.getVar("processed:Template");try{redditPixelId&&"x"!==redditPixelId.accountId&&(!function(e,t){if(!e.rdt){var a=e.rdt=function(){a.sendEvent?a.sendEvent.apply(a,arguments):a.callQueue.push(arguments)};a.callQueue=[];var d=t.createElement("script");d.src="https://www.redditstatic.com/ads/pixel.js",d.async=!0;var r=t.getElementsByTagName("script")[0];r.parentNode.insertBefore(d,r)}}(window,document),rdt("init",redditPixelId.accountId,{optOut:!1,useDecimalCurrencyValues:!0}),rdt("track","PageVisit"))}catch(e){} });_satellite["_runScript7"](function(event, target, Promise) { var linkedInPixelId=_satellite.getVar("processed:LinkedInPixelJSON"),pageType=_satellite.getVar("processed:PageType"),template=_satellite.getVar("processed:Template");try{linkedInPixelId&&"x"!==linkedInPixelId.accountId&&(_linkedin_partner_id=linkedInPixelId.accountId,window._linkedin_data_partner_ids=window._linkedin_data_partner_ids||[],window._linkedin_data_partner_ids.push(_linkedin_partner_id),function(){window.lintrk||(window.lintrk=function(e,n){window.lintrk.q.push([e,n])},window.lintrk.q=[]);var e=document.getElementsByTagName("script")[0],n=document.createElement("script");n.type="text/javascript",n.async=!0,n.src="https://snap.licdn.com/li.lms-analytics/insight.min.js",e.parentNode.insertBefore(n,e)}())}catch(e){} });_satellite["_runScript8"](function(event, target, Promise) { var bingPixelId=_satellite.getVar("processed:BingPixelJSON"),pageType=_satellite.getVar("processed:PageType"),template=_satellite.getVar("processed:Template");try{bingPixelId&&"x"!==bingPixelId.accountId&&function(e,t,a,n,i){var o,c,l;e[i]=e[i]||[],o=function(){var t={ti:bingPixelId.accountId};t.q=e[i],e[i]=new UET(t),e[i].push("pageLoad")},(c=t.createElement(a)).src=n,c.async=1,c.onload=c.onreadystatechange=function(){var e=this.readyState;e&&"loaded"!==e&&"complete"!==e||(o(),c.onload=c.onreadystatechange=null)},(l=t.getElementsByTagName(a)[0]).parentNode.insertBefore(c,l)}(window,document,"script","//bat.bing.com/bat.js","uetq")}catch(e){} });_satellite["_runScript9"](function(event, target, Promise) { var pinterestPixelId=_satellite.getVar("processed:PinterestPixelJSON"),pageType=_satellite.getVar("processed:PageType"),template=_satellite.getVar("processed:Template");try{pinterestPixelId&&"x"!==pinterestPixelId.accountId&&(!function(e){if(!window.pintrk){window.pintrk=function(){window.pintrk.queue.push(Array.prototype.slice.call(arguments))};var t=window.pintrk;t.queue=[],t.version="3.0";var r=document.createElement("script");r.async=!0,r.src=e;var i=document.getElementsByTagName("script")[0];i.parentNode.insertBefore(r,i)}}("https://s.pinimg.com/ct/core.js"),pintrk("load",pinterestPixelId.accountId),pintrk("page"))}catch(e){} }); var janrainUUID=_satellite.getVar("processed:UserScreenNameJanrainUUID"),loggedIn=_satellite.getVar("processed:UserLoggedInState"),entitled=_satellite.getVar("processed:Entitlement"),siteLevelUserId=_satellite.getVar("processed:SiteLevelUserId"),hubLevelUserId=_satellite.getVar("processed:HubLevelUserId"),scrollIncrement=0,AMCID=_satellite.getVar("processed:VisitorID"),wordCount=_satellite.getVar("var:WordCount"),plan="";"yes"===loggedIn&&(plan="no"===entitled?"registered":"subscribed"),function(e,t,o){var r=o.location.protocol,i=t+"-"+e,d=o.getElementById(i),c=o.getElementById(t+"-root"),l="https:"===r?"d1z2jf7jlzjs58.cloudfront.net":"static."+t+".com";d||((d=o.createElement(e)).id=i,d.async=!0,d.src=r+"//"+l+"/p.js",c.appendChild(d))}("script","parsely",document);try{function trackScroll(e,t){PARSELY.beacon&&PARSELY.beacon.trackPageView({action:"_scroll",data:{_scrollIncrement:e,_scrollMethod:t,_y:Math.round(window.scrollY),_bodyHeight:window.document.body.clientHeight,_articleTop:window.document.querySelector('div[class*="asset-body"],div#SA_article_tracking')?Math.round(window.document.querySelector('div[class*="asset-body"],div#SA_article_tracking').getBoundingClientRect().top+window.scrollY):void 0,_articleBottom:window.document.querySelector('div[class*="asset-body"],div#SA_article_tracking')?Math.round(window.document.querySelector('div[class*="asset-body"],div#SA_article_tracking').getBoundingClientRect().bottom+window.scrollY):void 0,_articleMidway:window.document.querySelector('div[class*="asset-body"],div#SA_article_tracking')?Math.round(window.document.querySelector('div[class*="asset-body"],div#SA_article_tracking').getBoundingClientRect().top+window.scrollY+window.document.querySelector('div[class*="asset-body"],div#SA_article_tracking').clientHeight/2):void 0}})}window.PARSELY=window.PARSELY||{autotrack:!1,video:{autotrack:!1},onload:function(){PARSELY.updateDefaults({data:{plan:plan,janrain_uuid:janrainUUID,site_level_uuid:siteLevelUserId,hub_level_uuid:hubLevelUserId,adobe_mcid:AMCID,word_count:wordCount}}),PARSELY.beacon.trackPageView({url:window.location.href,urlref:document.referrer,data:{_scrollIncrement:0,_scrollMethod:"pageview",_y:Math.round(window.scrollY),_bodyHeight:window.document.body.clientHeight,_articleTop:window.document.querySelector('div[class*="asset-body"],div#SA_article_tracking')?Math.round(window.document.querySelector('div[class*="asset-body"],div#SA_article_tracking').getBoundingClientRect().top+window.scrollY):void 0,_articleBottom:window.document.querySelector('div[class*="asset-body"],div#SA_article_tracking')?Math.round(window.document.querySelector('div[class*="asset-body"],div#SA_article_tracking').getBoundingClientRect().bottom+window.scrollY):void 0,_articleMidway:window.document.querySelector('div[class*="asset-body"],div#SA_article_tracking')?Math.round(window.document.querySelector('div[class*="asset-body"],div#SA_article_tracking').getBoundingClientRect().top+window.scrollY+window.document.querySelector('div[class*="asset-body"],div#SA_article_tracking').clientHeight/2):void 0},js:1})},onHeartbeat:function(){scrollIncrement++,scrollMethod="heartbeat",trackScroll(scrollIncrement,scrollMethod)}},window.setInterval((function(){scrollIncrement++,scrollMethod="setinterval",trackScroll(scrollIncrement,scrollMethod)}),1e4)}catch(e){} _satellite["_runScript10"](function(event, target, Promise) { setTimeout((function(){if("true"===sessionStorage.getItem("createAccountSubmittedP")&&("thestar|page|create-account-traditional"!==_satellite.getVar("processed:PageName")||!window.document.querySelector("#system_errors"))){function e(t){window.PARSELY&&window.PARSELY.beacon?(PARSELY.conversions.trackLeadCapture("registration-success"),sessionStorage.removeItem("createAccountSubmittedP")):t<20&&setTimeout((function(){e(++t)}),300)}e(1)}}),500); });_satellite["_runScript11"](function(event, target, Promise) { var ele,elelist,pageType=_satellite.getVar("processed:PageType"),subPageType=_satellite.getVar("processed:SubPageType"),channel=_satellite.getVar("processed:Channel");if(window.document.querySelector("#site-top-nav-container")&&(ele=window.document.querySelector("#site-top-nav-container")).setAttribute("data-lpos","header"),window.document.querySelector("#site-header-container")&&(ele=window.document.querySelector("#site-header-container")).setAttribute("data-lpos","header"),window.document.querySelector("#main-navigation .navbar-brand")&&(ele=window.document.querySelector("#main-navigation .navbar-brand")).setAttribute("data-lpos","header"),window.document.querySelector("#main-navigation .navbar-brand #torstar-user-mobile")&&(ele=window.document.querySelector("#main-navigation .navbar-brand #torstar-user-mobile")).setAttribute("data-lpos","header|user-dropdown"),window.document.querySelector("#main-navigation")&&(ele=window.document.querySelector("#main-navigation")).setAttribute("data-lpos","main-menu"),window.document.querySelector(".offcanvas-drawer")&&(ele=window.document.querySelector(".offcanvas-drawer")).setAttribute("data-lpos","left-drawer"),window.document.querySelector("#tncms-region-nav-mobile-nav-left")&&(ele=window.document.querySelector("#tncms-region-nav-mobile-nav-left")).setAttribute("data-lpos","left-drawer|menu"),window.document.querySelectorAll(".tsAlertCarousel div.item"))for(elelist=window.document.querySelectorAll(".tsAlertCarousel div.item"),x=0;x<elelist.length;x++){if(titleEle=elelist[x].querySelector(".alertType")){var title=titleEle.innerText.trim().replace(/[^a-zA-Z0-9]/g,"-").replace(/(-)\1+/g,"$1").toLowerCase();elelist[x].setAttribute("data-lpos","alert|"+title)}}if(window.document.querySelector('div[class~="weather-alert"]')){var eleParent=(ele=window.document.querySelector('div[class~="weather-alert"]')).closest("div.tncms-block");eleParent.setAttribute("data-lpos","alert|weather-alert")}if(window.document.querySelector("#main-content")&&(ele=window.document.querySelector("#main-content")).setAttribute("data-lpos","main-content"),window.document.querySelector("#main-body-container")&&(ele=window.document.querySelector("#main-body-container")).setAttribute("data-lpos","main-content"),window.document.querySelector(".asset-masthead")&&"asset"===subPageType&&(ele=window.document.querySelector(".asset-masthead")).setAttribute("data-lpos","asset|header"),window.document.querySelector(".main-content-wrap")&&"asset"===subPageType&&(ele=window.document.querySelector(".main-content-wrap")).setAttribute("data-lpos","asset|body"),window.document.querySelector(".tsArticleContainer")&&"asset"===subPageType&&(ele=window.document.querySelector(".tsArticleContainer")).setAttribute("data-lpos","asset|body"),window.document.querySelector(".asset-photo")&&"asset"===subPageType&&(ele=window.document.querySelector(".asset-photo")).setAttribute("data-lpos","asset|main-multimedia"),window.document.querySelector(".articleMainArt")&&"asset"===subPageType&&(ele=window.document.querySelector(".articleMainArt")).setAttribute("data-lpos","asset|main-multimedia"),window.document.querySelectorAll("#main-body-container .social-share-links"))if(elelist=window.document.querySelectorAll("#main-body-container .social-share-links"),"asset"===subPageType)for(x=0;x<elelist.length;x++)(ele=elelist[x]).setAttribute("data-lpos","asset|share-toolbar");else for(x=0;x<elelist.length;x++)(ele=elelist[x]).setAttribute("data-lpos","share-toolbar");if(window.document.querySelectorAll("#main-body-container div.photo-share .social-share-links"))if(elelist=window.document.querySelectorAll("#main-body-container div.photo-share .social-share-links"),"asset"===subPageType)for(x=0;x<elelist.length;x++)(ele=elelist[x]).setAttribute("data-lpos","asset|multimedia|share-toolbar");else for(x=0;x<elelist.length;x++)(ele=elelist[x]).setAttribute("data-lpos","multimedia|share-toolbar");if(window.document.querySelector("#asset-below")&&"asset"===subPageType&&(ele=window.document.querySelector("#asset-below")).setAttribute("data-lpos","asset|footer"),window.document.querySelector(".related-sidebar")&&"asset"===subPageType&&(ele=window.document.querySelector(".related-sidebar")).setAttribute("data-lpos","asset|related-links"),window.document.querySelector(".articleRelatedSiblings")&&"asset"===subPageType&&(ele=window.document.querySelector(".articleRelatedSiblings")).setAttribute("data-lpos","asset|related-links"),window.document.querySelector(".asset-comments")&&"asset"===subPageType&&(ele=window.document.querySelector(".asset-comments")).setAttribute("data-lpos","asset|conversation"),window.document.querySelector(".asset-paging .prev")&&(ele=window.document.querySelector(".asset-paging .prev")).setAttribute("data-lpos","asset|previous"),window.document.querySelector(".asset-paging .next")&&(ele=window.document.querySelector(".asset-paging .next")).setAttribute("data-lpos","asset|next"),window.document.querySelector(".access-offers-in-page")&&"asset"===subPageType&&(ele=window.document.querySelector(".access-offers-in-page")).setAttribute("data-lpos","asset|wall"),window.document.querySelector(".breadcrumb")&&(ele=window.document.querySelector(".breadcrumb")).setAttribute("data-lpos","breadcrumbs"),window.document.querySelectorAll(".newsletterSignup"))for(elelist=window.document.querySelectorAll(".newsletterSignup"),x=0;x<elelist.length;x++)(ele=elelist[x]).setAttribute("data-lpos","newsletter-signup");if(window.document.querySelector(".newsletterAnonymousSignup")&&(ele=window.document.querySelector(".newsletterAnonymousSignup")).setAttribute("data-lpos","newsletter|signup-form"),window.document.querySelectorAll("#main-body-container .tncms-block")){elelist=window.document.querySelectorAll("#main-body-container .tncms-block");var category=_satellite.getVar("processed:PrimaryCategory");for(category=category.trim().replace(/[^a-zA-Z0-9]/g,"-").replace(/(-)\1+/g,"$1").toLowerCase(),x=0;x<elelist.length;x++){if(titleEle=elelist[x].querySelector(".block-title-inner"))(title=titleEle.innerText.trim().replace(/[^a-zA-Z0-9]/g,"-").replace(/(-)\1+/g,"$1").toLowerCase()).indexOf("recommended-for-")>-1?elelist[x].setAttribute("data-lpos","recommended-content"):elelist[x].setAttribute("data-lpos",title);else elelist[x].className.indexOf("news-promo")>-1?elelist[x].innerText.toLowerCase().indexOf("newsletter")>-1||elelist[x].innerText.toLowerCase().indexOf("inbox")>-1?elelist[x].setAttribute("data-lpos","newsletter-promo"):elelist[x].setAttribute("data-lpos","promo-container-"+x):"home"===pageType?elelist[x].setAttribute("data-lpos","untitled-container-"+x):"section"===pageType&&(channel.indexOf("events")>-1?elelist[x].querySelector(".citySparkNavCategories")&&elelist[x].setAttribute("data-lpos","events|categories-filter"):elelist[x].setAttribute("data-lpos",category+"-"+x),elelist[x].className.indexOf("page-heading-breadcrumbs")>-1&&elelist[x].setAttribute("data-lpos","breadcrumbs"))}}(window.document.querySelector("#CitySpark")&&(ele=window.document.querySelector("#CitySpark")).setAttribute("data-lpos","events"),window.document.querySelector(".csTwoWrap"))&&(ele=window.document.querySelector(".csTwoWrap"),channel=(channel=_satellite.getVar("processed:Channel")).trim().replace(/[^a-zA-Z0-9]/g,"-").replace(/(-)\1+/g,"$1").toLowerCase(),ele.setAttribute("data-lpos",channel));if(window.document.querySelector("#CitySpark .csRoutingDetails")&&(ele=window.document.querySelector("#CitySpark .csRoutingDetails")).setAttribute("data-lpos","events|body"),"topic"===pageType&&window.document.querySelector("#main-page-container")){ele=window.document.querySelector("#main-page-container");var topicName=_satellite.getVar("processed:Channel");topicName=topicName.trim().substr(topicName.lastIndexOf("|")+1).replace(/[^a-zA-Z0-9]/g,"-").replace(/(-)\1+/g,"$1").toLowerCase(),ele.setAttribute("data-lpos",topicName)}if(window.document.querySelector(".poll-panel")&&(ele=window.document.querySelector(".poll-panel")).setAttribute("data-lpos","poll"),window.document.querySelector("#weatherLocationSelector")&&(ele=window.document.querySelector("#weatherLocationSelector")).setAttribute("data-lpos","weather|change-location"),window.document.querySelector(".weather-container")&&(ele=window.document.querySelector(".weather-container")).setAttribute("data-lpos","weather"),window.document.querySelector("#site-footer-container")&&(ele=window.document.querySelector("#site-footer-container")).setAttribute("data-lpos","footer"),window.document.querySelector('#site-footer-container div[class*="footer-right-icons"]')&&(ele=window.document.querySelector('#site-footer-container div[class*="footer-right-icons"]')).setAttribute("data-lpos","footer|apps"),window.document.querySelector('#site-footer-container div[class*="follow-links"]')&&(ele=window.document.querySelector('#site-footer-container div[class*="follow-links"]')).setAttribute("data-lpos","footer|social-links"),window.document.querySelector("#site-copyright-container")&&(ele=window.document.querySelector("#site-copyright-container")).setAttribute("data-lpos","footer|corporate-links"),window.document.querySelector(".results-container")&&(ele=window.document.querySelector(".results-container")).setAttribute("data-lpos","search|results"),window.document.querySelector("#tnt-search-url-results")&&(ele=window.document.querySelector("#tnt-search-url-results")).setAttribute("data-lpos","search|url-results"),window.document.querySelector(".pagination-container")&&(ele=window.document.querySelector(".pagination-container")).setAttribute("data-lpos","search|pagination"),window.document.querySelector(".search-page-container")&&(ele=window.document.querySelector(".search-page-container")).setAttribute("data-lpos","search|refine-search"),window.document.querySelector("#search-form-collapse")&&(ele=window.document.querySelector("#search-form-collapse")).setAttribute("data-lpos","search|refine-search"),window.document.querySelectorAll(".promotion-service.subscription-service"))if(elelist=window.document.querySelectorAll(".promotion-service.subscription-service"),"asset"===subPageType)for(x=0;x<elelist.length;x++)(ele=elelist[x]).setAttribute("data-lpos","asset|wall|subscription|card");else for(x=0;x<elelist.length;x++)(ele=elelist[x]).setAttribute("data-lpos","subscription|card");if(window.document.querySelector("#user-main-menu-wrapper")&&(ele=window.document.querySelector("#user-main-menu-wrapper")).setAttribute("data-lpos","users|account-info"),window.document.querySelector(".users-sidebar")&&(ele=window.document.querySelector(".users-sidebar")).setAttribute("data-lpos","users|sidebar"),window.document.querySelector("#promo-designer-modal-custom-pop")){var subscriptionOverlay=!1;if((ele=window.document.querySelector("#promo-designer-modal-custom-pop")).querySelector(".promo-design-button")){var overlayAction=ele.querySelector(".promo-design-button").innerHTML;overlayAction.indexOf("subscribe")>-1&&(subscriptionOverlay=!0)}!0===subscriptionOverlay?ele.setAttribute("data-lpos","subscription|overlay"):ele.setAttribute("data-lpos","promo|overlay")}if(window.document.querySelector("#onboardingModal")&&(ele=window.document.querySelector("#onboardingModal")).setAttribute("data-lpos","onboarding|modal"),window.document.querySelector("#onboardingNewsletters")&&(ele=window.document.querySelector("#onboardingNewsletters")).setAttribute("data-lpos","onboarding|newsletters"),window.document.querySelector('#onboardingModal #onboardingSlides a[href*="apps.apple.com"]')){ele=window.document.querySelector('#onboardingModal #onboardingSlides a[href*="apps.apple.com"]');try{var parentEle=ele.parentNode.parentNode;parentEle.setAttribute("data-lpos","onboarding|apps")}catch(e){}}if(window.document.querySelectorAll(".ad-placeholder-container"))for(elelist=window.document.querySelectorAll(".ad-placeholder-container"),x=0;x<elelist.length;x++)(ele=elelist[x]).setAttribute("data-lpos","gamp");if(window.document.querySelectorAll(".tnt-ads"))for(elelist=window.document.querySelectorAll(".tnt-ads"),x=0;x<elelist.length;x++)(ele=elelist[x]).setAttribute("data-lpos","gamp");if(window.document.querySelectorAll(".card-panel.volunteerOpportunity"))for(elelist=window.document.querySelectorAll(".card-panel.volunteerOpportunity"),x=0;x<elelist.length;x++){var titleEle=elelist[x].querySelector("div.orgHeadline"),cardOrg=elelist[x].querySelector("div.organization"),org=(title="unknown","unknown|");titleEle&&(title=titleEle.innerText.trim().replace(/[^a-zA-Z0-9]/g,"-").replace(/(-)\1+/g,"$1").toLowerCase()),cardOrg&&(0===(org=cardOrg.innerText.trim().replace(/[^a-zA-Z0-9]/g,"-").replace(/(-)\1+/g,"$1").toLowerCase()).indexOf("with-")&&(org=org.replace("with-","")),org+="|"),elelist[x].setAttribute("data-lpos","volunteer-card|"+org+title)} }); var _comscore=_comscore||[];_comscore.push({c1:"2",c2:"3005674"}),function(){var c=document.createElement("script"),e=document.getElementsByTagName("script")[0];c.async=!0,c.src=("https:"==document.location.protocol?"https://sb":"http://b")+".scorecardresearch.com/beacon.js",e.parentNode.insertBefore(c,e)}();

      When resizing the website, there is no change in layout (unresponsive) which means it is not robust.

  3. classroom.google.com classroom.google.com
    1. According to all known laws of aviation,

      there is no way a bee should be able to fly.

      Its wings are too small to get its fat little body off the ground.

      The bee, of course, flies anyway

      because bees don't care what humans think is impossible.

      Yellow, black. Yellow, black. Yellow, black. Yellow, black.

      Ooh, black and yellow! Let's shake it up a little.

      Barry! Breakfast is ready!

      Ooming!

      Hang on a second.

      Hello?

      Barry?

      Adam?

      Oan you believe this is happening?

      I can't. I'll pick you up.

      Looking sharp.

      Use the stairs. Your father paid good money for those.

      Sorry. I'm excited.

      Here's the graduate. We're very proud of you, son.

      A perfect report card, all B's.

      Very proud.

      Ma! I got a thing going here.

      You got lint on your fuzz.

      Ow! That's me!

      Wave to us! We'll be in row 118,000.

      Bye!

      Barry, I told you, stop flying in the house!

      Hey, Adam.

      Hey, Barry.

      Is that fuzz gel?

      A little. Special day, graduation.

      Never thought I'd make it.

      Three days grade school, three days high school.

      Those were awkward.

      Three days college. I'm glad I took a day and hitchhiked around the hive.

      You did come back different.

      Hi, Barry.

      Artie, growing a mustache? Looks good.

      Hear about Frankie?

      Yeah.

      You going to the funeral?

      No, I'm not going.

      Everybody knows, sting someone, you die.

      Don't waste it on a squirrel. Such a hothead.

      I guess he could have just gotten out of the way.

      I love this incorporating an amusement park into our day.

      That's why we don't need vacations.

      Boy, quite a bit of pomp… under the circumstances.

      Well, Adam, today we are men.

      We are!

      Bee-men.

      Amen!

      Hallelujah!

      Students, faculty, distinguished bees,

      please welcome Dean Buzzwell.

      Welcome, New Hive Oity graduating class of…

      …9:15.

      That concludes our ceremonies.

      And begins your career at Honex Industries!

      Will we pick ourjob today?

      I heard it's just orientation.

      Heads up! Here we go.

      Keep your hands and antennas inside the tram at all times.

      Wonder what it'll be like? A little scary. Welcome to Honex, a division of Honesco

      and a part of the Hexagon Group.

      This is it!

      Wow.

      Wow.

      We know that you, as a bee, have worked your whole life

      to get to the point where you can work for your whole life.

      Honey begins when our valiant Pollen Jocks bring the nectar to the hive.

      Our top-secret formula

      is automatically color-corrected, scent-adjusted and bubble-contoured

      into this soothing sweet syrup

      with its distinctive golden glow you know as…

      Honey!

      That girl was hot.

      She's my cousin!

      She is?

      Yes, we're all cousins.

      Right. You're right.

      At Honex, we constantly strive

      to improve every aspect of bee existence.

      These bees are stress-testing a new helmet technology.

      What do you think he makes? Not enough. Here we have our latest advancement, the Krelman.

      What does that do? Oatches that little strand of honey that hangs after you pour it. Saves us millions.

      Oan anyone work on the Krelman?

      Of course. Most bee jobs are small ones. But bees know

      that every small job, if it's done well, means a lot.

      But choose carefully

      because you'll stay in the job you pick for the rest of your life.

      The same job the rest of your life? I didn't know that.

      What's the difference?

      You'll be happy to know that bees, as a species, haven't had one day off

      in 27 million years.

      So you'll just work us to death?

      We'll sure try.

      Wow! That blew my mind!

      "What's the difference?" How can you say that?

      One job forever? That's an insane choice to have to make.

      I'm relieved. Now we only have to make one decision in life.

      But, Adam, how could they never have told us that?

      Why would you question anything? We're bees.

      We're the most perfectly functioning society on Earth.

      You ever think maybe things work a little too well here?

      Like what? Give me one example.

      I don't know. But you know what I'm talking about.

      Please clear the gate. Royal Nectar Force on approach.

      Wait a second. Oheck it out.

      Hey, those are Pollen Jocks! Wow. I've never seen them this close.

      They know what it's like outside the hive.

      Yeah, but some don't come back.

      Hey, Jocks! Hi, Jocks! You guys did great!

      You're monsters! You're sky freaks! I love it! I love it!

      I wonder where they were. I don't know. Their day's not planned.

      Outside the hive, flying who knows where, doing who knows what.

      You can'tjust decide to be a Pollen Jock. You have to be bred for that.

      Right.

      Look. That's more pollen than you and I will see in a lifetime.

      It's just a status symbol. Bees make too much of it.

      Perhaps. Unless you're wearing it and the ladies see you wearing it.

      Those ladies? Aren't they our cousins too?

      Distant. Distant.

      Look at these two.

      Oouple of Hive Harrys. Let's have fun with them. It must be dangerous being a Pollen Jock.

      Yeah. Once a bear pinned me against a mushroom!

      He had a paw on my throat, and with the other, he was slapping me!

      Oh, my! I never thought I'd knock him out. What were you doing during this?

      Trying to alert the authorities.

      I can autograph that.

      A little gusty out there today, wasn't it, comrades?

      Yeah. Gusty.

      We're hitting a sunflower patch six miles from here tomorrow.

      Six miles, huh? Barry! A puddle jump for us, but maybe you're not up for it.

      Maybe I am. You are not! We're going 0900 at J-Gate.

      What do you think, buzzy-boy? Are you bee enough?

      I might be. It all depends on what 0900 means.

      Hey, Honex!

      Dad, you surprised me.

      You decide what you're interested in?

      Well, there's a lot of choices. But you only get one. Do you ever get bored doing the same job every day?

      Son, let me tell you about stirring.

      You grab that stick, and you just move it around, and you stir it around.

      You get yourself into a rhythm. It's a beautiful thing.

      You know, Dad, the more I think about it,

      maybe the honey field just isn't right for me.

      You were thinking of what, making balloon animals?

      That's a bad job for a guy with a stinger.

      Janet, your son's not sure he wants to go into honey!

      Barry, you are so funny sometimes. I'm not trying to be funny. You're not funny! You're going into honey. Our son, the stirrer!

      You're gonna be a stirrer? No one's listening to me! Wait till you see the sticks I have.

      I could say anything right now. I'm gonna get an ant tattoo!

      Let's open some honey and celebrate!

      Maybe I'll pierce my thorax. Shave my antennae.

      Shack up with a grasshopper. Get a gold tooth and call everybody "dawg"!

      I'm so proud.

      We're starting work today! Today's the day. Oome on! All the good jobs will be gone.

      Yeah, right.

      Pollen counting, stunt bee, pouring, stirrer, front desk, hair removal…

      Is it still available? Hang on. Two left! One of them's yours! Oongratulations! Step to the side.

      What'd you get? Picking crud out. Stellar! Wow!

      Oouple of newbies?

      Yes, sir! Our first day! We are ready!

      Make your choice.

      You want to go first? No, you go. Oh, my. What's available?

      Restroom attendant's open, not for the reason you think.

      Any chance of getting the Krelman? Sure, you're on. I'm sorry, the Krelman just closed out.

      Wax monkey's always open.

      The Krelman opened up again.

      What happened?

      A bee died. Makes an opening. See? He's dead. Another dead one.

      Deady. Deadified. Two more dead.

      Dead from the neck up. Dead from the neck down. That's life!

      Oh, this is so hard!

      Heating, cooling, stunt bee, pourer, stirrer,

      humming, inspector number seven, lint coordinator, stripe supervisor,

      mite wrangler. Barry, what do you think I should… Barry?

      Barry!

      All right, we've got the sunflower patch in quadrant nine…

      What happened to you? Where are you?

      I'm going out.

      Out? Out where?

      Out there.

      Oh, no!

      I have to, before I go to work for the rest of my life.

      You're gonna die! You're crazy! Hello?

      Another call coming in.

      If anyone's feeling brave, there's a Korean deli on 83rd

      that gets their roses today.

      Hey, guys.

      Look at that. Isn't that the kid we saw yesterday? Hold it, son, flight deck's restricted.

      It's OK, Lou. We're gonna take him up.

      Really? Feeling lucky, are you?

      Sign here, here. Just initial that.

      Thank you. OK. You got a rain advisory today,

      and as you all know, bees cannot fly in rain.

      So be careful. As always, watch your brooms,

      hockey sticks, dogs, birds, bears and bats.

      Also, I got a couple of reports of root beer being poured on us.

      Murphy's in a home because of it, babbling like a cicada!

      That's awful. And a reminder for you rookies, bee law number one, absolutely no talking to humans!

      All right, launch positions!

      Buzz, buzz, buzz, buzz! Buzz, buzz, buzz, buzz! Buzz, buzz, buzz, buzz!

      Black and yellow!

      Hello!

      You ready for this, hot shot?

      Yeah. Yeah, bring it on.

      Wind, check.

      Antennae, check.

      Nectar pack, check.

      Wings, check.

      Stinger, check.

      Scared out of my shorts, check.

      OK, ladies,

      let's move it out!

      Pound those petunias, you striped stem-suckers!

      All of you, drain those flowers!

      Wow! I'm out!

      I can't believe I'm out!

      So blue.

      I feel so fast and free!

      Box kite!

      Wow!

      Flowers!

      This is Blue Leader. We have roses visual.

      Bring it around 30 degrees and hold.

      Roses!

      30 degrees, roger. Bringing it around.

      Stand to the side, kid. It's got a bit of a kick.

      That is one nectar collector!

      Ever see pollination up close? No, sir. I pick up some pollen here, sprinkle it over here. Maybe a dash over there,

      a pinch on that one. See that? It's a little bit of magic.

      That's amazing. Why do we do that?

      That's pollen power. More pollen, more flowers, more nectar, more honey for us.

      Oool.

      I'm picking up a lot of bright yellow. Oould be daisies. Don't we need those?

      Oopy that visual.

      Wait. One of these flowers seems to be on the move.

      Say again? You're reporting a moving flower?

      Affirmative.

      That was on the line!

      This is the coolest. What is it?

      I don't know, but I'm loving this color.

      It smells good. Not like a flower, but I like it.

      Yeah, fuzzy.

      Ohemical-y.

      Oareful, guys. It's a little grabby.

      My sweet lord of bees!

      Oandy-brain, get off there!

      Problem!

      Guys! This could be bad. Affirmative.

      Very close.

      Gonna hurt.

      Mama's little boy.

      You are way out of position, rookie!

      Ooming in at you like a missile!

      Help me!

      I don't think these are flowers.

      Should we tell him? I think he knows. What is this?!

      Match point!

      You can start packing up, honey, because you're about to eat it!

      Yowser!

      Gross.

      There's a bee in the car!

      Do something!

      I'm driving!

      Hi, bee.

      He's back here!

      He's going to sting me!

      Nobody move. If you don't move, he won't sting you. Freeze!

      He blinked!

      Spray him, Granny!

      What are you doing?!

      Wow… the tension level out here is unbelievable.

      I gotta get home.

      Oan't fly in rain.

      Oan't fly in rain.

      Oan't fly in rain.

      Mayday! Mayday! Bee going down!

      Ken, could you close the window please?

      Ken, could you close the window please?

      Oheck out my new resume. I made it into a fold-out brochure.

      You see? Folds out.

      Oh, no. More humans. I don't need this.

      What was that?

      Maybe this time. This time. This time. This time! This time! This…

      Drapes!

      That is diabolical.

      It's fantastic. It's got all my special skills, even my top-ten favorite movies.

      What's number one? Star Wars?

      Nah, I don't go for that…

      …kind of stuff.

      No wonder we shouldn't talk to them. They're out of their minds.

      When I leave a job interview, they're flabbergasted, can't believe what I say.

      There's the sun. Maybe that's a way out.

      I don't remember the sun having a big 75 on it.

      I predicted global warming.

      I could feel it getting hotter. At first I thought it was just me.

      Wait! Stop! Bee!

      Stand back. These are winter boots.

      Wait!

      Don't kill him!

      You know I'm allergic to them! This thing could kill me!

      Why does his life have less value than yours?

      Why does his life have any less value than mine? Is that your statement?

      I'm just saying all life has value. You don't know what he's capable of feeling.

      My brochure!

      There you go, little guy.

      I'm not scared of him. It's an allergic thing.

      Put that on your resume brochure.

      My whole face could puff up.

      Make it one of your special skills.

      Knocking someone out is also a special skill.

      Right. Bye, Vanessa. Thanks.

      Vanessa, next week? Yogurt night?

      Sure, Ken. You know, whatever.

      You could put carob chips on there.

      Bye.

      Supposed to be less calories.

      Bye.

      I gotta say something.

      She saved my life. I gotta say something.

      All right, here it goes.

      Nah.

      What would I say?

      I could really get in trouble.

      It's a bee law. You're not supposed to talk to a human.

      I can't believe I'm doing this.

      I've got to.

      Oh, I can't do it. Oome on!

      No. Yes. No.

      Do it. I can't.

      How should I start it? "You like jazz?" No, that's no good.

      Here she comes! Speak, you fool!

      Hi!

      I'm sorry.

      You're talking. Yes, I know. You're talking!

      I'm so sorry.

      No, it's OK. It's fine. I know I'm dreaming.

      But I don't recall going to bed.

      Well, I'm sure this is very disconcerting.

      This is a bit of a surprise to me. I mean, you're a bee!

      I am. And I'm not supposed to be doing this,

      but they were all trying to kill me.

      And if it wasn't for you…

      I had to thank you. It's just how I was raised.

      That was a little weird.

      I'm talking with a bee. Yeah. I'm talking to a bee. And the bee is talking to me!

      I just want to say I'm grateful. I'll leave now.

      Wait! How did you learn to do that? What? The talking thing.

      Same way you did, I guess. "Mama, Dada, honey." You pick it up.

      That's very funny. Yeah. Bees are funny. If we didn't laugh, we'd cry with what we have to deal with.

      Anyway…

      Oan I…

      …get you something?

      Like what? I don't know. I mean… I don't know. Ooffee?

      I don't want to put you out.

      It's no trouble. It takes two minutes.

      It's just coffee.

      I hate to impose.

      Don't be ridiculous!

      Actually, I would love a cup.

      Hey, you want rum cake?

      I shouldn't.

      Have some.

      No, I can't.

      Oome on!

      I'm trying to lose a couple micrograms.

      Where? These stripes don't help. You look great!

      I don't know if you know anything about fashion.

      Are you all right?

      No.

      He's making the tie in the cab as they're flying up Madison.

      He finally gets there.

      He runs up the steps into the church. The wedding is on.

      And he says, "Watermelon? I thought you said Guatemalan.

      Why would I marry a watermelon?"

      Is that a bee joke?

      That's the kind of stuff we do.

      Yeah, different.

      So, what are you gonna do, Barry?

      About work? I don't know.

      I want to do my part for the hive, but I can't do it the way they want.

      I know how you feel.

      You do? Sure. My parents wanted me to be a lawyer or a doctor, but I wanted to be a florist.

      Really? My only interest is flowers. Our new queen was just elected with that same campaign slogan.

      Anyway, if you look…

      There's my hive right there. See it?

      You're in Sheep Meadow!

      Yes! I'm right off the Turtle Pond!

      No way! I know that area. I lost a toe ring there once.

      Why do girls put rings on their toes?

      Why not?

      It's like putting a hat on your knee.

      Maybe I'll try that.

      You all right, ma'am?

      Oh, yeah. Fine.

      Just having two cups of coffee!

      Anyway, this has been great. Thanks for the coffee.

      Yeah, it's no trouble.

      Sorry I couldn't finish it. If I did, I'd be up the rest of my life.

      Are you…?

      Oan I take a piece of this with me?

      Sure! Here, have a crumb.

      Thanks! Yeah. All right. Well, then… I guess I'll see you around.

      Or not.

      OK, Barry.

      And thank you so much again… for before.

      Oh, that? That was nothing.

      Well, not nothing, but… Anyway…

      This can't possibly work.

      He's all set to go. We may as well try it.

      OK, Dave, pull the chute.

      Sounds amazing. It was amazing! It was the scariest, happiest moment of my life.

      Humans! I can't believe you were with humans!

      Giant, scary humans! What were they like?

      Huge and crazy. They talk crazy.

      They eat crazy giant things. They drive crazy.

      Do they try and kill you, like on TV?

      Some of them. But some of them don't.

      How'd you get back?

      Poodle.

      You did it, and I'm glad. You saw whatever you wanted to see.

      You had your "experience." Now you can pick out yourjob and be normal.

      Well… Well? Well, I met someone.

      You did? Was she Bee-ish?

      A wasp?! Your parents will kill you!

      No, no, no, not a wasp.

      Spider?

      I'm not attracted to spiders.

      I know it's the hottest thing, with the eight legs and all.

      I can't get by that face.

      So who is she?

      She's… human.

      No, no. That's a bee law. You wouldn't break a bee law.

      Her name's Vanessa. Oh, boy. She's so nice. And she's a florist!

      Oh, no! You're dating a human florist!

      We're not dating.

      You're flying outside the hive, talking to humans that attack our homes

      with power washers and M-80s! One-eighth a stick of dynamite!

      She saved my life! And she understands me.

      This is over!

      Eat this.

      This is not over! What was that?

      They call it a crumb. It was so stingin' stripey! And that's not what they eat. That's what falls off what they eat!

      You know what a Oinnabon is? No. It's bread and cinnamon and frosting. They heat it up…

      Sit down!

      …really hot!

      Listen to me! We are not them! We're us. There's us and there's them!

      Yes, but who can deny the heart that is yearning?

      There's no yearning. Stop yearning. Listen to me!

      You have got to start thinking bee, my friend. Thinking bee!

      Thinking bee. Thinking bee. Thinking bee! Thinking bee! Thinking bee! Thinking bee!

      There he is. He's in the pool.

      You know what your problem is, Barry?

      I gotta start thinking bee?

      How much longer will this go on?

      It's been three days! Why aren't you working?

      I've got a lot of big life decisions to think about.

      What life? You have no life! You have no job. You're barely a bee!

      Would it kill you to make a little honey?

      Barry, come out. Your father's talking to you.

      Martin, would you talk to him?

      Barry, I'm talking to you!

      You coming?

      Got everything?

      All set!

      Go ahead. I'll catch up.

      Don't be too long.

      Watch this!

      Vanessa!

      We're still here. I told you not to yell at him. He doesn't respond to yelling!

      Then why yell at me? Because you don't listen! I'm not listening to this.

      Sorry, I've gotta go.

      Where are you going? I'm meeting a friend. A girl? Is this why you can't decide?

      Bye.

      I just hope she's Bee-ish.

      They have a huge parade of flowers every year in Pasadena?

      To be in the Tournament of Roses, that's every florist's dream!

      Up on a float, surrounded by flowers, crowds cheering.

      A tournament. Do the roses compete in athletic events?

      No. All right, I've got one. How come you don't fly everywhere?

      It's exhausting. Why don't you run everywhere? It's faster.

      Yeah, OK, I see, I see. All right, your turn.

      TiVo. You can just freeze live TV? That's insane!

      You don't have that?

      We have Hivo, but it's a disease. It's a horrible, horrible disease.

      Oh, my.

      Dumb bees!

      You must want to sting all those jerks.

      We try not to sting. It's usually fatal for us.

      So you have to watch your temper.

      Very carefully. You kick a wall, take a walk,

      write an angry letter and throw it out. Work through it like any emotion:

      Anger, jealousy, lust.

      Oh, my goodness! Are you OK?

      Yeah.

      What is wrong with you?! It's a bug. He's not bothering anybody. Get out of here, you creep!

      What was that? A Pic 'N' Save circular?

      Yeah, it was. How did you know?

      It felt like about 10 pages. Seventy-five is pretty much our limit.

      You've really got that down to a science.

      I lost a cousin to Italian Vogue. I'll bet. What in the name of Mighty Hercules is this?

      How did this get here? Oute Bee, Golden Blossom,

      Ray Liotta Private Select?

      Is he that actor?

      I never heard of him.

      Why is this here?

      For people. We eat it.

      You don't have enough food of your own?

      Well, yes.

      How do you get it?

      Bees make it.

      I know who makes it!

      And it's hard to make it!

      There's heating, cooling, stirring. You need a whole Krelman thing!

      It's organic. It's our-ganic! It's just honey, Barry.

      Just what?!

      Bees don't know about this! This is stealing! A lot of stealing!

      You've taken our homes, schools, hospitals! This is all we have!

      And it's on sale?! I'm getting to the bottom of this.

      I'm getting to the bottom of all of this!

      Hey, Hector.

      You almost done? Almost. He is here. I sense it.

      Well, I guess I'll go home now

      and just leave this nice honey out, with no one around.

      You're busted, box boy!

      I knew I heard something. So you can talk!

      I can talk. And now you'll start talking!

      Where you getting the sweet stuff? Who's your supplier?

      I don't understand. I thought we were friends.

      The last thing we want to do is upset bees!

      You're too late! It's ours now!

      You, sir, have crossed the wrong sword!

      You, sir, will be lunch for my iguana, Ignacio!

      Where is the honey coming from?

      Tell me where!

      Honey Farms! It comes from Honey Farms!

      Orazy person!

      What horrible thing has happened here?

      These faces, they never knew what hit them. And now

      they're on the road to nowhere!

      Just keep still.

      What? You're not dead?

      Do I look dead? They will wipe anything that moves. Where you headed?

      To Honey Farms. I am onto something huge here.

      I'm going to Alaska. Moose blood, crazy stuff. Blows your head off!

      I'm going to Tacoma.

      And you? He really is dead. All right.

      Uh-oh!

      What is that?!

      Oh, no!

      A wiper! Triple blade!

      Triple blade?

      Jump on! It's your only chance, bee!

      Why does everything have to be so doggone clean?!

      How much do you people need to see?!

      Open your eyes! Stick your head out the window!

      From NPR News in Washington, I'm Oarl Kasell.

      But don't kill no more bugs!

      Bee!

      Moose blood guy!!

      You hear something?

      Like what?

      Like tiny screaming.

      Turn off the radio.

      Whassup, bee boy?

      Hey, Blood.

      Just a row of honey jars, as far as the eye could see.

      Wow!

      I assume wherever this truck goes is where they're getting it.

      I mean, that honey's ours.

      Bees hang tight. We're all jammed in. It's a close community.

      Not us, man. We on our own. Every mosquito on his own.

      What if you get in trouble? You a mosquito, you in trouble. Nobody likes us. They just smack. See a mosquito, smack, smack!

      At least you're out in the world. You must meet girls.

      Mosquito girls try to trade up, get with a moth, dragonfly.

      Mosquito girl don't want no mosquito.

      You got to be kidding me!

      Mooseblood's about to leave the building! So long, bee!

      Hey, guys! Mooseblood! I knew I'd catch y'all down here. Did you bring your crazy straw?

      We throw it in jars, slap a label on it, and it's pretty much pure profit.

      What is this place?

      A bee's got a brain the size of a pinhead.

      They are pinheads!

      Pinhead.

      Oheck out the new smoker. Oh, sweet. That's the one you want. The Thomas 3000!

      Smoker?

      Ninety puffs a minute, semi-automatic. Twice the nicotine, all the tar.

      A couple breaths of this knocks them right out.

      They make the honey, and we make the money.

      "They make the honey, and we make the money"?

      Oh, my!

      What's going on? Are you OK?

      Yeah. It doesn't last too long.

      Do you know you're in a fake hive with fake walls?

      Our queen was moved here. We had no choice.

      This is your queen? That's a man in women's clothes!

      That's a drag queen!

      What is this?

      Oh, no!

      There's hundreds of them!

      Bee honey.

      Our honey is being brazenly stolen on a massive scale!

      This is worse than anything bears have done! I intend to do something.

      Oh, Barry, stop.

      Who told you humans are taking our honey? That's a rumor.

      Do these look like rumors?

      That's a conspiracy theory. These are obviously doctored photos.

      How did you get mixed up in this?

      He's been talking to humans.

      What? Talking to humans?! He has a human girlfriend. And they make out!

      Make out? Barry!

      We do not.

      You wish you could. Whose side are you on? The bees!

      I dated a cricket once in San Antonio. Those crazy legs kept me up all night.

      Barry, this is what you want to do with your life?

      I want to do it for all our lives. Nobody works harder than bees!

      Dad, I remember you coming home so overworked

      your hands were still stirring. You couldn't stop.

      I remember that.

      What right do they have to our honey?

      We live on two cups a year. They put it in lip balm for no reason whatsoever!

      Even if it's true, what can one bee do?

      Sting them where it really hurts.

      In the face! The eye!

      That would hurt. No. Up the nose? That's a killer.

      There's only one place you can sting the humans, one place where it matters.

      Hive at Five, the hive's only full-hour action news source.

      No more bee beards!

      With Bob Bumble at the anchor desk.

      Weather with Storm Stinger.

      Sports with Buzz Larvi.

      And Jeanette Ohung.

      Good evening. I'm Bob Bumble. And I'm Jeanette Ohung. A tri-county bee, Barry Benson,

      intends to sue the human race for stealing our honey,

      packaging it and profiting from it illegally!

      Tomorrow night on Bee Larry King,

      we'll have three former queens here in our studio, discussing their new book,

      Olassy Ladies, out this week on Hexagon.

      Tonight we're talking to Barry Benson.

      Did you ever think, "I'm a kid from the hive. I can't do this"?

      Bees have never been afraid to change the world.

      What about Bee Oolumbus? Bee Gandhi? Bejesus?

      Where I'm from, we'd never sue humans.

      We were thinking of stickball or candy stores.

      How old are you?

      The bee community is supporting you in this case,

      which will be the trial of the bee century.

      You know, they have a Larry King in the human world too.

      It's a common name. Next week…

      He looks like you and has a show and suspenders and colored dots…

      Next week…

      Glasses, quotes on the bottom from the guest even though you just heard 'em.

      Bear Week next week! They're scary, hairy and here live.

      Always leans forward, pointy shoulders, squinty eyes, very Jewish.

      In tennis, you attack at the point of weakness!

      It was my grandmother, Ken. She's 81.

      Honey, her backhand's a joke! I'm not gonna take advantage of that?

      Quiet, please. Actual work going on here.

      Is that that same bee? Yes, it is! I'm helping him sue the human race.

      Hello. Hello, bee. This is Ken.

      Yeah, I remember you. Timberland, size ten and a half. Vibram sole, I believe.

      Why does he talk again?

      Listen, you better go 'cause we're really busy working.

      But it's our yogurt night!

      Bye-bye.

      Why is yogurt night so difficult?!

      You poor thing. You two have been at this for hours!

      Yes, and Adam here has been a huge help.

      Frosting… How many sugars? Just one. I try not to use the competition.

      So why are you helping me?

      Bees have good qualities.

      And it takes my mind off the shop.

      Instead of flowers, people are giving balloon bouquets now.

      Those are great, if you're three.

      And artificial flowers.

      Oh, those just get me psychotic! Yeah, me too. Bent stingers, pointless pollination.

      Bees must hate those fake things!

      Nothing worse than a daffodil that's had work done.

      Maybe this could make up for it a little bit.

      This lawsuit's a pretty big deal. I guess. You sure you want to go through with it?

      Am I sure? When I'm done with the humans, they won't be able

      to say, "Honey, I'm home," without paying a royalty!

      It's an incredible scene here in downtown Manhattan,

      where the world anxiously waits, because for the first time in history,

      we will hear for ourselves if a honeybee can actually speak.

      What have we gotten into here, Barry?

      It's pretty big, isn't it?

      I can't believe how many humans don't work during the day.

      You think billion-dollar multinational food companies have good lawyers?

      Everybody needs to stay behind the barricade.

      What's the matter? I don't know, I just got a chill. Well, if it isn't the bee team.

      You boys work on this?

      All rise! The Honorable Judge Bumbleton presiding.

      All right. Oase number 4475,

      Superior Oourt of New York, Barry Bee Benson v. the Honey Industry

      is now in session.

      Mr. Montgomery, you're representing the five food companies collectively?

      A privilege.

      Mr. Benson… you're representing all the bees of the world?

      I'm kidding. Yes, Your Honor, we're ready to proceed.

      Mr. Montgomery, your opening statement, please.

      Ladies and gentlemen of the jury,

      my grandmother was a simple woman.

      Born on a farm, she believed it was man's divine right

      to benefit from the bounty of nature God put before us.

      If we lived in the topsy-turvy world Mr. Benson imagines,

      just think of what would it mean.

      I would have to negotiate with the silkworm

      for the elastic in my britches!

      Talking bee!

      How do we know this isn't some sort of

      holographic motion-picture-capture Hollywood wizardry?

      They could be using laser beams!

      Robotics! Ventriloquism! Oloning! For all we know,

      he could be on steroids!

      Mr. Benson?

      Ladies and gentlemen, there's no trickery here.

      I'm just an ordinary bee. Honey's pretty important to me.

      It's important to all bees. We invented it!

      We make it. And we protect it with our lives.

      Unfortunately, there are some people in this room

      who think they can take it from us

      'cause we're the little guys! I'm hoping that, after this is all over,

      you'll see how, by taking our honey, you not only take everything we have

      but everything we are!

      I wish he'd dress like that all the time. So nice!

      Oall your first witness.

      So, Mr. Klauss Vanderhayden of Honey Farms, big company you have.

      I suppose so.

      I see you also own Honeyburton and Honron!

      Yes, they provide beekeepers for our farms.

      Beekeeper. I find that to be a very disturbing term.

      I don't imagine you employ any bee-free-ers, do you?

      No.

      I couldn't hear you.

      No.

      No.

      Because you don't free bees. You keep bees. Not only that,

      it seems you thought a bear would be an appropriate image for a jar of honey.

      They're very lovable creatures.

      Yogi Bear, Fozzie Bear, Build-A-Bear.

      You mean like this?

      Bears kill bees!

      How'd you like his head crashing through your living room?!

      Biting into your couch! Spitting out your throw pillows!

      OK, that's enough. Take him away.

      So, Mr. Sting, thank you for being here. Your name intrigues me.

      Where have I heard it before? I was with a band called The Police. But you've never been a police officer, have you?

      No, I haven't.

      No, you haven't. And so here we have yet another example

      of bee culture casually stolen by a human

      for nothing more than a prance-about stage name.

      Oh, please.

      Have you ever been stung, Mr. Sting?

      Because I'm feeling a little stung, Sting.

      Or should I say… Mr. Gordon M. Sumner!

      That's not his real name?! You idiots!

      Mr. Liotta, first, belated congratulations on

      your Emmy win for a guest spot on ER in 2005.

      Thank you. Thank you.

      I see from your resume that you're devilishly handsome

      with a churning inner turmoil that's ready to blow.

      I enjoy what I do. Is that a crime?

      Not yet it isn't. But is this what it's come to for you?

      Exploiting tiny, helpless bees so you don't

      have to rehearse your part and learn your lines, sir?

      Watch it, Benson! I could blow right now!

      This isn't a goodfella. This is a badfella!

      Why doesn't someone just step on this creep, and we can all go home?!

      Order in this court! You're all thinking it! Order! Order, I say!

      Say it! Mr. Liotta, please sit down! I think it was awfully nice of that bear to pitch in like that.

      I think the jury's on our side.

      Are we doing everything right, legally?

      I'm a florist.

      Right. Well, here's to a great team.

      To a great team!

      Well, hello.

      Ken! Hello. I didn't think you were coming.

      No, I was just late. I tried to call, but… the battery.

      I didn't want all this to go to waste, so I called Barry. Luckily, he was free.

      Oh, that was lucky.

      There's a little left. I could heat it up.

      Yeah, heat it up, sure, whatever.

      So I hear you're quite a tennis player.

      I'm not much for the game myself. The ball's a little grabby.

      That's where I usually sit. Right… there.

      Ken, Barry was looking at your resume,

      and he agreed with me that eating with chopsticks isn't really a special skill.

      You think I don't see what you're doing?

      I know how hard it is to find the rightjob. We have that in common.

      Do we?

      Bees have 100 percent employment, but we do jobs like taking the crud out.

      That's just what I was thinking about doing.

      Ken, I let Barry borrow your razor for his fuzz. I hope that was all right.

      I'm going to drain the old stinger.

      Yeah, you do that.

      Look at that.

      You know, I've just about had it

      with your little mind games.

      What's that? Italian Vogue. Mamma mia, that's a lot of pages.

      A lot of ads.

      Remember what Van said, why is your life more valuable than mine?

      Funny, I just can't seem to recall that!

      I think something stinks in here!

      I love the smell of flowers.

      How do you like the smell of flames?!

      Not as much.

      Water bug! Not taking sides!

      Ken, I'm wearing a Ohapstick hat! This is pathetic!

      I've got issues!

      Well, well, well, a royal flush!

      You're bluffing. Am I? Surf's up, dude!

      Poo water!

      That bowl is gnarly.

      Except for those dirty yellow rings!

      Kenneth! What are you doing?!

      You know, I don't even like honey! I don't eat it!

      We need to talk!

      He's just a little bee!

      And he happens to be the nicest bee I've met in a long time!

      Long time? What are you talking about?! Are there other bugs in your life?

      No, but there are other things bugging me in life. And you're one of them!

      Fine! Talking bees, no yogurt night…

      My nerves are fried from riding on this emotional roller coaster!

      Goodbye, Ken.

      And for your information,

      I prefer sugar-free, artificial sweeteners made by man!

      I'm sorry about all that.

      I know it's got an aftertaste! I like it!

      I always felt there was some kind of barrier between Ken and me.

      I couldn't overcome it. Oh, well.

      Are you OK for the trial?

      I believe Mr. Montgomery is about out of ideas.

      We would like to call Mr. Barry Benson Bee to the stand.

      Good idea! You can really see why he's considered one of the best lawyers…

      Yeah.

      Layton, you've gotta weave some magic

      with this jury, or it's gonna be all over.

      Don't worry. The only thing I have to do to turn this jury around

      is to remind them of what they don't like about bees.

      You got the tweezers? Are you allergic? Only to losing, son. Only to losing.

      Mr. Benson Bee, I'll ask you what I think we'd all like to know.

      What exactly is your relationship

      to that woman?

      We're friends.

      Good friends? Yes. How good? Do you live together?

      Wait a minute…

      Are you her little…

      …bedbug?

      I've seen a bee documentary or two. From what I understand,

      doesn't your queen give birth to all the bee children?

      Yeah, but…

      So those aren't your real parents!

      Oh, Barry…

      Yes, they are!

      Hold me back!

      You're an illegitimate bee, aren't you, Benson?

      He's denouncing bees!

      Don't y'all date your cousins?

      Objection! I'm going to pincushion this guy! Adam, don't! It's what he wants!

      Oh, I'm hit!!

      Oh, lordy, I am hit!

      Order! Order!

      The venom! The venom is coursing through my veins!

      I have been felled by a winged beast of destruction!

      You see? You can't treat them like equals! They're striped savages!

      Stinging's the only thing they know! It's their way!

      Adam, stay with me. I can't feel my legs. What angel of mercy will come forward to suck the poison

      from my heaving buttocks?

      I will have order in this court. Order!

      Order, please!

      The case of the honeybees versus the human race

      took a pointed turn against the bees

      yesterday when one of their legal team stung Layton T. Montgomery.

      Hey, buddy.

      Hey.

      Is there much pain?

      Yeah.

      I…

      I blew the whole case, didn't I?

      It doesn't matter. What matters is you're alive. You could have died.

      I'd be better off dead. Look at me.

      They got it from the cafeteria downstairs, in a tuna sandwich.

      Look, there's a little celery still on it.

      What was it like to sting someone?

      I can't explain it. It was all…

      All adrenaline and then… and then ecstasy!

      All right.

      You think it was all a trap?

      Of course. I'm sorry. I flew us right into this.

      What were we thinking? Look at us. We're just a couple of bugs in this world.

      What will the humans do to us if they win?

      I don't know.

      I hear they put the roaches in motels. That doesn't sound so bad.

      Adam, they check in, but they don't check out!

      Oh, my.

      Oould you get a nurse to close that window?

      Why? The smoke. Bees don't smoke.

      Right. Bees don't smoke.

      Bees don't smoke! But some bees are smoking.

      That's it! That's our case!

      It is? It's not over?

      Get dressed. I've gotta go somewhere.

      Get back to the court and stall. Stall any way you can.

      And assuming you've done step correctly, you're ready for the tub.

      Mr. Flayman.

      Yes? Yes, Your Honor!

      Where is the rest of your team?

      Well, Your Honor, it's interesting.

      Bees are trained to fly haphazardly,

      and as a result, we don't make very good time.

      I actually heard a funny story about…

      Your Honor, haven't these ridiculous bugs

      taken up enough of this court's valuable time?

      How much longer will we allow these absurd shenanigans to go on?

      They have presented no compelling evidence to support their charges

      against my clients, who run legitimate businesses.

      I move for a complete dismissal of this entire case!

      Mr. Flayman, I'm afraid I'm going

      to have to consider Mr. Montgomery's motion.

      But you can't! We have a terrific case.

      Where is your proof? Where is the evidence?

      Show me the smoking gun!

      Hold it, Your Honor! You want a smoking gun?

      Here is your smoking gun.

      What is that?

      It's a bee smoker!

      What, this? This harmless little contraption?

      This couldn't hurt a fly, let alone a bee.

      Look at what has happened

      to bees who have never been asked, "Smoking or non?"

      Is this what nature intended for us?

      To be forcibly addicted to smoke machines

      and man-made wooden slat work camps?

      Living out our lives as honey slaves to the white man?

      What are we gonna do? He's playing the species card. Ladies and gentlemen, please, free these bees!

      Free the bees! Free the bees!

      Free the bees!

      Free the bees! Free the bees!

      The court finds in favor of the bees!

      Vanessa, we won!

      I knew you could do it! High-five!

      Sorry.

      I'm OK! You know what this means?

      All the honey will finally belong to the bees.

      Now we won't have to work so hard all the time.

      This is an unholy perversion of the balance of nature, Benson.

      You'll regret this.

      Barry, how much honey is out there?

      All right. One at a time.

      Barry, who are you wearing?

      My sweater is Ralph Lauren, and I have no pants.

      What if Montgomery's right? What do you mean? We've been living the bee way a long time, 27 million years.

      Oongratulations on your victory. What will you demand as a settlement?

      First, we'll demand a complete shutdown of all bee work camps.

      Then we want back the honey that was ours to begin with,

      every last drop.

      We demand an end to the glorification of the bear as anything more

      than a filthy, smelly, bad-breath stink machine.

      We're all aware of what they do in the woods.

      Wait for my signal.

      Take him out.

      He'll have nauseous for a few hours, then he'll be fine.

      And we will no longer tolerate bee-negative nicknames…

      But it's just a prance-about stage name!

      …unnecessary inclusion of honey in bogus health products

      and la-dee-da human tea-time snack garnishments.

      Oan't breathe.

      Bring it in, boys!

      Hold it right there! Good.

      Tap it.

      Mr. Buzzwell, we just passed three cups, and there's gallons more coming!

      I think we need to shut down! Shut down? We've never shut down. Shut down honey production!

      Stop making honey!

      Turn your key, sir!

      What do we do now?

      Oannonball!

      We're shutting honey production!

      Mission abort.

      Aborting pollination and nectar detail. Returning to base.

      Adam, you wouldn't believe how much honey was out there.

      Oh, yeah?

      What's going on? Where is everybody?

      Are they out celebrating? They're home. They don't know what to do. Laying out, sleeping in.

      I heard your Uncle Oarl was on his way to San Antonio with a cricket.

      At least we got our honey back.

      Sometimes I think, so what if humans liked our honey? Who wouldn't?

      It's the greatest thing in the world! I was excited to be part of making it.

      This was my new desk. This was my new job. I wanted to do it really well.

      And now…

      Now I can't.

      I don't understand why they're not happy.

      I thought their lives would be better!

      They're doing nothing. It's amazing. Honey really changes people.

      You don't have any idea what's going on, do you?

      What did you want to show me? This. What happened here?

      That is not the half of it.

      Oh, no. Oh, my.

      They're all wilting.

      Doesn't look very good, does it?

      No.

      And whose fault do you think that is?

      You know, I'm gonna guess bees.

      Bees?

      Specifically, me.

      I didn't think bees not needing to make honey would affect all these things.

      It's notjust flowers. Fruits, vegetables, they all need bees.

      That's our whole SAT test right there.

      Take away produce, that affects the entire animal kingdom.

      And then, of course…

      The human species?

      So if there's no more pollination,

      it could all just go south here, couldn't it?

      I know this is also partly my fault.

      How about a suicide pact?

      How do we do it?

      I'll sting you, you step on me. Thatjust kills you twice. Right, right.

      Listen, Barry… sorry, but I gotta get going.

      I had to open my mouth and talk.

      Vanessa?

      Vanessa? Why are you leaving? Where are you going?

      To the final Tournament of Roses parade in Pasadena.

      They've moved it to this weekend because all the flowers are dying.

      It's the last chance I'll ever have to see it.

      Vanessa, I just wanna say I'm sorry. I never meant it to turn out like this.

      I know. Me neither.

      Tournament of Roses. Roses can't do sports.

      Wait a minute. Roses. Roses?

      Roses!

      Vanessa!

      Roses?!

      Barry?

      Roses are flowers! Yes, they are. Flowers, bees, pollen!

      I know. That's why this is the last parade.

      Maybe not. Oould you ask him to slow down?

      Oould you slow down?

      Barry!

      OK, I made a huge mistake. This is a total disaster, all my fault.

      Yes, it kind of is.

      I've ruined the planet. I wanted to help you

      with the flower shop. I've made it worse.

      Actually, it's completely closed down.

      I thought maybe you were remodeling.

      But I have another idea, and it's greater than my previous ideas combined.

      I don't want to hear it!

      All right, they have the roses, the roses have the pollen.

      I know every bee, plant and flower bud in this park.

      All we gotta do is get what they've got back here with what we've got.

      Bees.

      Park.

      Pollen!

      Flowers.

      Repollination!

      Across the nation!

      Tournament of Roses, Pasadena, Oalifornia.

      They've got nothing but flowers, floats and cotton candy.

      Security will be tight.

      I have an idea.

      Vanessa Bloome, FTD.

      Official floral business. It's real.

      Sorry, ma'am. Nice brooch.

      Thank you. It was a gift.

      Once inside, we just pick the right float.

      How about The Princess and the Pea?

      I could be the princess, and you could be the pea!

      Yes, I got it.

      Where should I sit?

      What are you?

      I believe I'm the pea.

      The pea?

      It goes under the mattresses.

      Not in this fairy tale, sweetheart. I'm getting the marshal. You do that! This whole parade is a fiasco!

      Let's see what this baby'll do.

      Hey, what are you doing?!

      Then all we do is blend in with traffic…

      …without arousing suspicion.

      Once at the airport, there's no stopping us.

      Stop! Security.

      You and your insect pack your float? Yes. Has it been in your possession the entire time?

      Would you remove your shoes?

      Remove your stinger. It's part of me. I know. Just having some fun. Enjoy your flight.

      Then if we're lucky, we'll have just enough pollen to do the job.

      Oan you believe how lucky we are? We have just enough pollen to do the job!

      I think this is gonna work.

      It's got to work.

      Attention, passengers, this is Oaptain Scott.

      We have a bit of bad weather in New York.

      It looks like we'll experience a couple hours delay.

      Barry, these are cut flowers with no water. They'll never make it.

      I gotta get up there and talk to them.

      Be careful.

      Oan I get help with the Sky Mall magazine?

      I'd like to order the talking inflatable nose and ear hair trimmer.

      Oaptain, I'm in a real situation.

      What'd you say, Hal? Nothing. Bee!

      Don't freak out! My entire species…

      What are you doing?

      Wait a minute! I'm an attorney! Who's an attorney? Don't move.

      Oh, Barry.

      Good afternoon, passengers. This is your captain.

      Would a Miss Vanessa Bloome in 24B please report to the cockpit?

      And please hurry!

      What happened here?

      There was a DustBuster, a toupee, a life raft exploded.

      One's bald, one's in a boat, they're both unconscious!

      Is that another bee joke? No! No one's flying the plane!

      This is JFK control tower, Flight 356. What's your status?

      This is Vanessa Bloome. I'm a florist from New York.

      Where's the pilot?

      He's unconscious, and so is the copilot.

      Not good. Does anyone onboard have flight experience?

      As a matter of fact, there is.

      Who's that? Barry Benson. From the honey trial?! Oh, great.

      Vanessa, this is nothing more than a big metal bee.

      It's got giant wings, huge engines.

      I can't fly a plane.

      Why not? Isn't John Travolta a pilot? Yes. How hard could it be?

      Wait, Barry! We're headed into some lightning.

      This is Bob Bumble. We have some late-breaking news from JFK Airport,

      where a suspenseful scene is developing.

      Barry Benson, fresh from his legal victory…

      That's Barry!

      …is attempting to land a plane, loaded with people, flowers

      and an incapacitated flight crew.

      Flowers?!

      We have a storm in the area and two individuals at the controls

      with absolutely no flight experience.

      Just a minute. There's a bee on that plane.

      I'm quite familiar with Mr. Benson and his no-account compadres.

      They've done enough damage.

      But isn't he your only hope?

      Technically, a bee shouldn't be able to fly at all.

      Their wings are too small…

      Haven't we heard this a million times?

      "The surface area of the wings and body mass make no sense."

      Get this on the air!

      Got it.

      Stand by.

      We're going live.

      The way we work may be a mystery to you.

      Making honey takes a lot of bees doing a lot of small jobs.

      But let me tell you about a small job.

      If you do it well, it makes a big difference.

      More than we realized. To us, to everyone.

      That's why I want to get bees back to working together.

      That's the bee way! We're not made of Jell-O.

      We get behind a fellow.

      Black and yellow! Hello! Left, right, down, hover.

      Hover? Forget hover. This isn't so hard. Beep-beep! Beep-beep!

      Barry, what happened?!

      Wait, I think we were on autopilot the whole time.

      That may have been helping me. And now we're not! So it turns out I cannot fly a plane.

      All of you, let's get behind this fellow! Move it out!

      Move out!

      Our only chance is if I do what I'd do, you copy me with the wings of the plane!

      Don't have to yell.

      I'm not yelling! We're in a lot of trouble.

      It's very hard to concentrate with that panicky tone in your voice!

      It's not a tone. I'm panicking!

      I can't do this!

      Vanessa, pull yourself together. You have to snap out of it!

      You snap out of it.

      You snap out of it.

      You snap out of it!

      You snap out of it!

      You snap out of it!

      You snap out of it!

      You snap out of it!

      You snap out of it!

      Hold it!

      Why? Oome on, it's my turn.

      How is the plane flying?

      I don't know.

      Hello?

      Benson, got any flowers for a happy occasion in there?

      The Pollen Jocks!

      They do get behind a fellow.

      Black and yellow. Hello. All right, let's drop this tin can on the blacktop.

      Where? I can't see anything. Oan you?

      No, nothing. It's all cloudy.

      Oome on. You got to think bee, Barry.

      Thinking bee. Thinking bee. Thinking bee! Thinking bee! Thinking bee!

      Wait a minute. I think I'm feeling something.

      What? I don't know. It's strong, pulling me. Like a 27-million-year-old instinct.

      Bring the nose down.

      Thinking bee! Thinking bee! Thinking bee!

      What in the world is on the tarmac? Get some lights on that! Thinking bee! Thinking bee! Thinking bee!

      Vanessa, aim for the flower. OK. Out the engines. We're going in on bee power. Ready, boys?

      Affirmative!

      Good. Good. Easy, now. That's it.

      Land on that flower!

      Ready? Full reverse!

      Spin it around!

      Not that flower! The other one!

      Which one?

      That flower.

      I'm aiming at the flower!

      That's a fat guy in a flowered shirt. I mean the giant pulsating flower

      made of millions of bees!

      Pull forward. Nose down. Tail up.

      Rotate around it.

      This is insane, Barry! This's the only way I know how to fly. Am I koo-koo-kachoo, or is this plane flying in an insect-like pattern?

      Get your nose in there. Don't be afraid. Smell it. Full reverse!

      Just drop it. Be a part of it.

      Aim for the center!

      Now drop it in! Drop it in, woman!

      Oome on, already.

      Barry, we did it! You taught me how to fly!

      Yes. No high-five! Right. Barry, it worked! Did you see the giant flower?

      What giant flower? Where? Of course I saw the flower! That was genius!

      Thank you. But we're not done yet. Listen, everyone!

      This runway is covered with the last pollen

      from the last flowers available anywhere on Earth.

      That means this is our last chance.

      We're the only ones who make honey, pollinate flowers and dress like this.

      If we're gonna survive as a species, this is our moment! What do you say?

      Are we going to be bees, orjust Museum of Natural History keychains?

      We're bees!

      Keychain!

      Then follow me! Except Keychain.

      Hold on, Barry. Here.

      You've earned this.

      Yeah!

      I'm a Pollen Jock! And it's a perfect fit. All I gotta do are the sleeves.

      Oh, yeah.

      That's our Barry.

      Mom! The bees are back!

      If anybody needs to make a call, now's the time.

      I got a feeling we'll be working late tonight!

      Here's your change. Have a great afternoon! Oan I help who's next?

      Would you like some honey with that? It is bee-approved. Don't forget these.

      Milk, cream, cheese, it's all me. And I don't see a nickel!

      Sometimes I just feel like a piece of meat!

      I had no idea.

      Barry, I'm sorry. Have you got a moment?

      Would you excuse me? My mosquito associate will help you.

      Sorry I'm late.

      He's a lawyer too?

      I was already a blood-sucking parasite. All I needed was a briefcase.

      Have a great afternoon!

      Barry, I just got this huge tulip order, and I can't get them anywhere.

      No problem, Vannie. Just leave it to me.

      You're a lifesaver, Barry. Oan I help who's next?

      All right, scramble, jocks! It's time to fly.

      Thank you, Barry!

      That bee is living my life!

      Let it go, Kenny.

      When will this nightmare end?!

      Let it all go.

      Beautiful day to fly.

      Sure is.

      Between you and me, I was dying to get out of that office.

      You have got to start thinking bee, my friend.

      Thinking bee! Me? Hold it. Let's just stop for a second. Hold it.

      I'm sorry. I'm sorry, everyone. Oan we stop here?

      I'm not making a major life decision during a production number!

      All right. Take ten, everybody. Wrap it up, guys.

      I had virtually no rehearsal for that.

    1. 39:00 Vanevar Bush misses out on a whole swath of history regarding commonplace books and indexing. In As We May Think he presents these older methods to the computer. "Why not imitate?" Aldrich says, instead of trying to reinvent the wheel (or thinking you are doing so).

    1. Author response:

      We would like to thank all the reviewers and editors for their thoughtful and detailed comments, critiques and suggestions. We will revise our manuscript in accordance with all the points raised by the reviewers. Here we summarize some of the main points that we intend to address in our revised manuscript.

      The reviewers noted that we were not sufficiently careful in identifying possible exogenous cues that the mice might be using to locate the cues and that we did not consider why such cues might be ineffective. As the reviewers point out, the mice may be ignoring the visual landmarks (and floor scratches) because they are not reliable cues and their relation to the food varies with the entrance the mice have used. In particular, a reviewer refers to papers that show that “in environments with 'unreliable' landmarks, place cells are not controlled by landmarks”. These papers were known to the authors but failed to make final cut of our extensive discussion. This important point will be thoroughly addressed.

      Another critical point was the mice were often doing thigmotaxis. The literature on thigmotaxis was known to us and we will now directly refer to this point. We do note that the final average start to food trajectory (TEV) is directly to the food. In other words, the thigmotaxic trajectories and “towards the center” trajectories effectively average out.

      There was a very cogent point about the difficulty of totally eliminating odor cues that we will now address. Finally, based on studies using a virtual reality environment, one reviewer questioned the use of “path integration” as a signal that encodes goal location. The relevance of path integration to spatial learning and performance is a very difficult issue that, to our knowledge, has never been entirely settled in the vast spatial learning literature. We do not think that our data can “settle’ this issue but will try to at least be explicit re the complexity of the path integration hypothesis as it applies to both our own data and the virtual reality literature. In particular, we will discuss the potential roles of optic flow versus proprioceptive and vestibular inputs to a putative path integration mechanism.

      Finally, the reviewers raised many important technical points re statistics reporting and how the figures are presented. In our revision, we will completely comply with all these helpful critiques.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      Chang et al. provide glutamate co-expression profiles in the central noradrenergic system and test the requirement of Vglut2-based glutamatergic release in respiratory and metabolic activity under physiologically relevant gas challenges. Their experiments provide compelling evidence that conditional deletion of Vglut2 in noradrenergic neurons does not impact steadystate breathing or metabolic activity in room air, hypercapnia, or hypoxia. This study provides an important contribution to our understanding of how noradrenergic neurons regulate respiratory homeostasis in conscious adult mice.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Chang et al. provide glutamate co-expression profiles in the central noradrenergic system and test the requirement of Vglut2-based glutamatergic release in respiratory and metabolic activity under physiologically relevant gas challenges. Their experiments show that conditional deletion of Vglut2 in NA neurons does not impact steady-state breathing or metabolic activity in room air, hypercapnia, or hypoxia. Their observations challenge the importance of glutamatergic signaling from Vglut2 expressing NA neurons in normal respiratory homeostasis in conscious adult mice.

      Strengths:

      The comprehensive Vglut1, Vglut2, and Vglut3 co-expression profiles in the central noradrenergic system and the combined measurements of breathing and oxygen consumption are two major strengths of this study. Observations from these experiments provide previously undescribed insights into (1) expression patterns for subtypes of the vesicular glutamate transporter protein in the noradrenergic system and (2) the dispensable nature of Vglut2-dependent glutamate signaling from noradrenergic neurons to breathing responses to physiologically relevant gas challenges in adult conscious mice.

      Weaknesses:

      Although the cellular expression profiles for the vesicular glutamate transporters are provided, the study fails to document that glutamatergic-based signaling originating from noradrenergic neurons is evident at the cellular level under normal, hypoxic, and/or hypercapnic conditions. This limits the reader's understanding of why conditional Vglut2 knockdown is dispensable for breathing under the conditions tested.

      We thank the reviewers for their positive evaluation of our work. First, we would like to highlight that multiple studies have provided anatomical evidence of innervation of multiple cardio-respiratory nuclei by Vglut2+ noradrenergic fibers. Thus, the anatomical substrates are present for noradrenergic based Vglut2 signaling to either play a direct role in breathing control or, upon perturbation, to indirectly affect breathing through disrupted metabolic or cardiovascular control. We have included supplemental table 1 that summarizes central noradrenergic Vglut2+ innervations of respiratory and autonomic nuclei. Additionally, Ultrastructural evidence shows asymmetric synaptic contacts assuming glutamatergic transmission between C1 neurons and LC, A1, A2 and the dorsal motor nucleus of the vagus (DMV) (Milner et al., 1989; Abbott et al., 2012; Holloway et al., 2013; DePuy et al., 2013).

      Functionally, electrophysiological evidence showed that photostimulating C1 neurons activate LC, A1, A2 noradrenergic neurons monosynaptically by releasing glutamate (Holloway et al., 2013; DePuy et al., 2013) and optogenetic stimulation of LC neurons excite the downstream parabrachial nucleus (PBN) neurons by releasing glutamate. Thus, at least the glutamatergic signaling from C1 and LC noradrenergic neurons (two noradrenergic nuclei that have been shown to play a role in breathing control) is evident at the cellular level under normal conditions. Other evidence, highlighted in our manuscript, is more circumstantial.

      Reviewer #2 (Public Review):

      The authors characterized the recombinase-based cumulative fate maps for vesicular glutamate transporters (Vglut1, Vglut2 and Vglut3) expression and compared those maps to their real-time expression profiles in central NA neurons by RNA in situ hybridization in adult mice. Authors have revealed a new and intriguing expression pattern for Vglut2, along with an entirely uncharted co-expression domain for Vglut3 within central noradrenergic neurons. Interestingly, and in contrast to previous studies, the authors demonstrated that glutamatergic signaling in central noradrenergic neurons does not exert any influence on breathing and metabolic control either under normoxic/normocapnic conditions or after chemoreflex stimulation. Also, they showed for the first-time the Vglut3-expressing NA population in C2/A2 nuclei. In addition, they were also able to demonstrate Vglut2 expression in anterior NA populations, such as LC neurons, by using more refined techniques, unlike previous studies.

      A major strength of the study is the use of a set of techniques to investigate the participation of NA-based glutamatergic signaling in breathing and metabolic control. The authors provided a full characterization of the recombinase-based cumulative fate maps for Vglut transporters. They performed real-time mRNA expression of Vglut transporters in central NA neurons of adult mice. Further, they evaluated the effect of knocking down Vglut2 expression in NA neurons using a DBH-Cre; Vglut2cKO mice on breathing and control in unanesthetized mice. Finally, they injected the AAV virus containing Cre-dependent Td tomato into LC of v-Glut2 Cre mice to verify the VGlut2 expression in LC-NA neurons. A very positive aspect of the article is that the authors combined ventilation with metabolic measurements. This integration holds particular significance, especially when delving into the exploration of respiratory chemosensitivity. Furthermore, the sample size of the experiments is excellent.

      Despite the clear strengths of the paper, some weaknesses exist. It is not clear in the manuscript if the experiments were performed in males and females and if the data were combined. I believe that the study would have benefited from a more comprehensive analysis exploring the sex specific differences. The reason I think this is particularly relevant is the developmental disorders mentioned by the authors, such as SIDS and Rett syndrome, which could potentially arise from disruptions in central noradrenergic (NA) function, exhibit varying degrees of sex predominance. Moreover, some of the noradrenergic cell groups are sexually dimorphic. For instance, female Wistar rats exhibit a larger LC size and more LC-NA neurons than male subjects (Pinos et al., 2001; Garcia-Falgueras et al., 2005). More recently, a detailed transcriptional profiling investigation has unveiled the identities of over 3,000 genes in the LC. This revelation has highlighted significant sexual dimorphisms, with more than 100 genes exhibiting differential expression within LC-NA neurons at the transcript level. Furthermore, this investigation has convincingly showcased that these distinct gene expression patterns have the capacity to elicit disparate behavioral responses between sexes (Mulvey et al., 2018). Therefore, the authors should compare the fate maps, Vglut transporters in males and females, at least considering LC-NA neurons. Even in the absence of identified sex differences, this information retains significant importance.

      All experiments contained both males and females as described in the original submission. In our analysis of breathing and metabolism, sex was included in the analysis and no significant phenotypic difference was observed. For the fate map and in situ experiments, we did not see obvious differences in the expression patterns in the three glutamate transporters between females and males, though the group size is small. Though all the anatomical and phenotypic data in this manuscript are presented as combined graphs, we have differentially labeled our data points by sex. The reviewer does raise important questions regarding possible sexual dimorphisms in the central noradrenergic system and whether such dimorphisms may extend to glutamate transporter co-expression. Our thorough interrogation of respiratory-metabolic parameters fails to reveal any sex specific differences in control or experimental mice. Thus, it is unclear if any of the previously described and cited dimorphisms are functionally relevant in this setting. Given the large differences in the real time expression and cumulative fate maps of Vglut2, a worthwhile interrogation of differential glutamate transporter expression would be best served by longitudinal studies with large group sizes across age as it is not clear what underlies the dynamic VGlut2 expression changes. Such changes may at times be greater in males and other times in females, driven by experience or physiological challenges etc., but resulting in averaged cumulative fatemaps that are similar between sexes. Such a longitudinal quantitative study of real-time and fatemapped cell populations across the central NA system would be of a scale that is beyond the scope of this report, especially when no phenotypic changes have been observed in our respiratory data.

      An important point well raised by the authors is that although suggestive, these experiments do not definitively rule out that NA-Vglut2 based glutamatergic signaling has a role in breathing control. Subsequent experiments will be necessary to validate this hypothesis.

      As noted, we discuss that we only address requirement, not sufficiency, of NA Vglut2 in breathing. Functional sufficiency experiments usually involve increasing the relevant output. However, these experiments can lead to non-specific, pleiotropic effects that would be difficult to disambiguate, even if done with high cellular specificity. Viral or genetic overexpression of Vglut2 in NA neurons may be a feasible approach. Conditional ablation of TH or DBH with concurrent chemo or optogenetic stimulation may also be informative. These approaches would require significant investments in mouse model generation and suffer additional experimental limitations.

      An improvement could be made in terms of measuring body temperature. Opting for implanted sensors over rectal probes would circumvent the need to open the chamber, thereby preventing alterations in gas composition during respiratory measurements. Further, what happens to body temperature phenotype in these animals under different gas exposures? These data should be included in the Tables.

      While surgical implantation of sensors would provide a more direct assessment of temperature, it requires components that were not available at the time of the study and addresses a question (temperature changes during a time course of gas exposure) that go beyond the scope of the current work focused on respiratory response. As we have done for prior experiments (Martinez et al., 2019; Ray et al., 2011), the body temperature was measured immediately before and after measuring breathing only. Our flow through system using inline gas sensors (AEI P-61B CO2 sensor and AEI N-22M O2 sensor) ensure that gas challenges were constant and consistent across all measurements. Any disruption in gas composition would have been noted by our software analysis system, Breathe Easy, and the data rejected. We did not observe any such perturbations.

      Is it plausible that another neurotransmitter within NA neurons might be released in higher amounts in DBH-Cre; Vglut2 cKO mice to compensate for the deficiency in glutamate and prevent changes in ventilation?

      We agree that compensation is always a possibility at the synaptic, cellular, and circuit levels that may involve a variety of transcriptional, translational, cellular, and circuit mechanisms (i.e., synaptic strength). This could be interrogated by combining multiple conditional alleles and recombinase drivers for various transmitters and receptors, but would, in our experience, take multiple years for the requisite breeding to be completed.

      Continuing along the same line of inquiry is there a possibility that Vglut2 cKO from NA neurons not only eliminates glutamate release but also reduces NA release? A similar mechanism was previously found in VGLUT2 cKO from DA neurons in previous studies (Alsio et al., 2011; Fortin et al., 2012; Hnasko et al., 2010). Additionally, does glutamate play a role in the vesicular loading of NA? Therefore, could the lack of effect on breathing be explained by the lack of noradrenaline and not glutamate?

      These are all excellent points, but prior studies suggest that reductions in NA signaling would itself have an apparent effect (Zanella et al., 2006; Kuo et al., 2016). Although several studies showed that LC and C1 NA neurons co-release noradrenaline and glutamate, no direct evidence yet makes clear that glutamate facilitates NA release or vice versa. However, it would be of great interest to test if reduced or lack of NA compensated for loss of glutamate in the future. We do fully acknowledge that compensation in the manuscript that any number of compensatory events could be at play in these findings.

      Reviewer #3 (Public Review):

      Summary:

      The authors, Y Chang and colleagues, have performed elegant studies in transgenic mouse models that were designed to examine glutamatergic transmission in noradrenergic neurons, with a focus on respiratory regulation. They generated 3 different transgenic lines, in which a red fluorophore was expressed in dopamine-B-hydroxylase (DBH; noradrenergic and adrenergic neurons) neurons that did not express a vesicular glutamate transporter (Vglut) and a green fluorophore in DBH neurons that did express one of either Vglut1, Vglut2 or Vglut3.

      Further experiments generated a transgenic mouse with knockout of Vglut2 in DBH neurons. The authors used plethysmography to measure respiratory parameters in conscious, unrestrained mice in response to various challenges.

      Strengths:

      The distribution of the Vglut expression is broadly in agreement with other studies, but with the addition of some novel Vglut3 expression. Validation of the transgenic results, using in situ hybridization histochemistry to examine mRNA expression, revealed potential modulation of Vglut2 expression during phases of development. This dataset is comprehensive, wellpresented and very useful.

      In the physiological studies the authors observed that neither baseline respiratory parameters, nor respiratory responses to hypercapnea (5, 7, 10% CO2) or hypoxia (10% O2) were different between knockout mice and littermate controls. The studies are well-designed and comprehensive. They provide observations that are supportive of previous reports using similar methodology.

      Weaknesses:

      In relation to the expression of Vglut2, the authors conclude that modulation of expression occurs, such that in adulthood there are differences in expression patterns in some (nor)adrenergic cell groups. Altered sensitivity is provided as an explanation for different results between studies examining mRNA expression. These are likely explanations; however, the conclusion would really be definitive with inclusion of a conditional cre expressing mouse. Given the effort taken to generate this dataset, it seems to me that taking that extra step would be of value for the overall understanding of glutamatergic expression in these catecholaminergic neurons

      The seemingly dynamic Vglut2 expression pattern across the NA system is intriguing. As noted in our comments to reviewer 2, a robust age dependent interrogation would require a large magnitude study. The reviewer correctly points out that a temporally controlled recombinase fate mapping experiment would offer greater insight into the dynamic expression of Vglut2. We strongly agree with that idea and did work to develop a Vglut2-CreER targeted allele that, despite our many other successes in mouse genetic engineering (Lusk et al., 2022; Sun and Ray, 2016), did not succeed on the first attempt. We aim to complete the line in the near future so that we may better understand the Vglut2 expression pattern in central noradrenergic neurons in a time-specific manner and sex specific manner.

      The respiratory physiology is very convincing and provides clear support for the view that Vglut2 is not required for modulation of the respiratory parameters measured and the reflex responses tested. It is stated that this is surprising. However, comparison with the data from Abbott et al., Eur J Neurosci (2014) in which the same transgenic approach was used, shows that they also observed no change in baseline breathing frequency. Differences were observed with strong, coordinated optogenetic stimulation, but, as discussed in this manuscript, it is not clear what physiological function this is relevant to. It just shows that some C1 neurons can use glutamate as a signaling molecule. Further, Holloway et al., Eur J Neurosci (2015), using the same transgenic mouse approach, showed that the respiratory response to optogenetic activation of Phox2 expressing neurons is not altered in DBH-Vglut2 KO mice. The conclusion seems to be that some C1 neuron effects are reliant upon glutamatergic transmission (C1DMV for example), and some not.

      We agree that activation of C1 neurons may be sufficient to modulate breathing when artificially stimulated and that such stimulation relies on glutamatergic transmission for its effect. This is why we find our results surprising and important in clarifying for the field that glutamatergic signaling in noradrenergic cells is dispensable for breathing and hypoxic and hypercapnic responses under physiological conditions.

      Further contrast is made in this manuscript to the work of Malheiros-Lima and colleagues (eLife 2020) who showed that the activation of abdominal expiratory nerve activity in response to peripheral chemoreceptor activation with cyanide was dependent upon C1 neurons and could be attenuated by blockade of glutamate receptors in the pFRG - i.e. the supposition that glutamate release from C1 neurons was responsible for the function. However, it is interesting to observe that diaphragm EMG responses to hypercapnia (10% CO2) or cyanide, and the expiratory activation to hypercapnia, were not affected by the glutamate receptor blockade. Thus, a very specific response is affected and one that was not measured in the current study.

      As we mention above, we do not dispute that glutamate signaling can be manipulated to create a response in non-physiological conditions – we suggest that framing the interpretation around the glutamatergic role in a model that better matches physiological conditions should inform our interpretation. Furthermore, we do include an examination of expiratory flow – which was not impacted by loss of glutamatergic activity in NA neurons – which would be likely to have been impacted if abdominal expiratory nerve activity was modified.

      These previous published observations are consistent with the current study which provides a more comprehensive analysis of the role of glutamatergic contributions respiratory physiology. A more nuanced discussion of the data and acknowledgement of the differences, which are not actually at odds, would improve the paper and place the information within a more comprehensive model.

      Thank you for the comments. As noted in the original and extended discussion, we respectfully disagree with the perspective that our results align with prior results.

      Recommendations for the authors:

      The three reviewers believe this is an important study. They have numerous suggestions for improvement of the manuscript (outlined below), but no new experiments are required. The Editor requests some nomenclature changes as indicated in attachment 1.

      Reviewer #1 (Recommendations For The Authors):

      Abstract/Introduction: Although the need for this study is obvious, it is important that the authors explicitly communicate their working hypothesis < before the start of the work> to the reader. In the current form, it is unclear whether the authors aimed to test the hypothesis that glutamatergic signaling from noradrenergic neurons is important to breathing or whether to test the hypothesis that glutamatergic signaling from noradrenergic neurons is not important to breathing. If it is the latter-it is not important-then the study (related to the breathing measurements) is poorly justified and designed, as additional orthogonal approaches (e.g., actual measurements of glutamatergic signaling at the cellular level) are almost requisite. If the authors' hypothesis was originally based on existing literature suggesting that glutamatergic signaling from noradrenergic neurons is important to breathing, then the experimental design appropriate.

      Thank you for the suggestion. The working hypothesis has been added in the abstract (line 2425) and the introduction (line 92-94)), making clear that we initially hypothesized that glutamatergic signaling from noradrenergic neurons is important in breathing.

      Results: While the steady state measurements for breathing metrics are clearly important in defining how glutamatergic signaling may contribute to be pulmonary function, the role of glutamatergic signaling may have a greater role in the dynamics of patterns (i.e., regularity of the breathing rhythms) such traits can be described using SD1 and SD2 from Poincare maps, and/or entropy measurements. Such an analysis should be performed.

      Thank you for the suggestion. The dynamic patterns of respiratory rate (Vf), tidal volume (VT), minute ventilation (VE), inspiratory duration (TI), expiratory duration (TE), breath cycle duration (TTOT), inspiratory flow rate (VT/TI), expiratory flow rate (VT/TE) have been shown as Poincaré plots and quantified and tested using the SD1 and SD2 statistics in the supplemental figures of Figure 4-7.

      Results: Analyses of Inspiratory time (Ti) and flow rate (i.e., Tidal Volume / Ti) should be assessed and included.

      Thank you for the suggestion. Inspiratory duration (Ti), expiratory duration (TE), breath cycle duration (TTOT), inspiratory flow rate (VT/Ti), and expiratory flow rate (VT/TE) have been included in the Figures 4-7.

      Results/Methods: If similar analytical approaches were used in the current study as to that in Lusk et al. 2022, it appears that data was discontinuously sampled, rejecting periods of movement and only including periods of quiescent breathing. Were the periods of quiescent breathing different? Information should be provided to describe the total sampling duration included.

      For room air, the entire gas condition was used for data analysis. For hypercapnia (5% CO2, 7% CO2, 10% CO2), only the last 5 minutes of the gas challenge period was used for data analysis. For hypoxia (10% O2), we analyzed the breathing trace of three 5-minute epochs following initiation of the gas exposure separately, e.g., epoch 1 = 5-10min, epoch 2 = 10-15min, and epoch 3 = 15-20min. All breaths included as quiescent breathing were analyzed in the aggregate for each group and experimental condition, we did not compare individual periods of quiescent breathing within or across an animal(s)/group(s)/experimental condition(s). We have added the details in the Materials and Methods (line 637-642).

      Results: As mice were conscious in this study, were sniff periods (transient periods of fast breathing, i.e.,>8Hz) included in the analysis?

      No, only regular quiescent breathing periods were included in the analysis.

      Discussion: The authors need to discuss the limitations of their findings.

      • How should the reader interpret the findings? Concluding that glutamatergic signaling is dispensable implies that it occurs in room air, hypoxia, and hypercapnia.

      We have edited our discussion for clarity to highlight our conclusions that Vglut2-based glutamatergic signaling from noradrenergic neurons is ultimately dispensable for baseline breathing and hypercapnia and hypoxic chemoreflex in unanesthetized and unrestrained mice.

      • Assuming that glutamatergic signaling is active during the conditions tested, then the authors should discuss what may be the potential compensations.

      We have provided additional discussion surrounding potential compensatory events that may have taken place and could result in the unchanged phenotype in the experimental group.

      • The authors need to discuss how age and state of consciousness may play a role in their finds. The current discussion gives the impression that their findings are broadly applicable in all cases, but the lack of differences in this study may not hold true under different conditions.

      The study was done in adult (6–8-week-old) unanesthetized and unrestrained mice. In the discussion (line 472-474), we highlight that in our unpublished results, loss of NA-expressed Vglut2 does not change the survival curve in P7 neonate mice undergoing repeated bouts of autoresuscitation until death. Thus, we believed that Vglut2-based glutamatergic signaling in central NA neurons is dispensable for baseline breathing and the hypercapnic and hypoxic chemoreflexes in unanesthetized and unrestrained mice across different ages. Otherwise, we do not imply that we have interrogated any other aspects of breathing in our discussion.

      Methods: Further description of the analysis window for the respiratory metrics should be provided. Were breath values for each condition taken throughout the entire condition? This is particularly important for hypoxia, where the stereotypical respiratory response is biphasic.

      For room air, the entire gas condition was used for data analysis. For hypercapnia (5% CO2, 7% CO2, 10% CO2), only the last 5min of the gas challenge period was used for data analysis. For hypoxia (10% O2), we analyzed the breathing trace of three 5min time periods separately including 5-10min, 10-15min, and 15-20min during the hypoxic challenge as noted in our original manuscript, we graph and assess three 5min epochs during hypoxic exposure to capture the dynamic nature of the hypoxic ventilatory response. We have added the details in the Materials and Methods (line 637-642).

      Methods: How was consciousness determined?

      The conscious mice mentioned in the manuscript refer to the mice without anesthesia. We have replaced “awake” and “conscious” with “unanesthetized” in the text.

      Reviewer #2 (Recommendations For The Authors):

      Since no EEG/EMG recording was performed it would be more appropriate to remove "awake" and "conscious" throughout the manuscript and include the term "unanesthetized".

      Thank you for the suggestion. “Awake” and “conscious” have been replaced by “unanesthetized” in the text.

      Line 545: Why 32C? Isn't this temperature too high for animals?

      30-32°C is the thermoneutral zone for mice. It is the range of ambient temperature where mice can maintain a stable core temperature with their minimal metabolic rate (Gordon, 1985). Whole-body plethysmography uses the barometric technique to detect pressure oscillations caused by changes in temperature and humidity with each breathing act when an animal sits in a sealed chamber (Mortola et al., 2013). Thus, maintaining the chamber temperature near the thermoneutral zone during the plethysmography assay is required to maintain constancy in respiratory and metabolic parameters from trial to trial as well as to maintain linearity of ventilatory pressure changes due to humidification, rarefaction, and thermal expansion and contraction during inspiration and expiration (Ray et al., 2011). The chamber temperature that has been used for adult plethysmography has been set across a range 30-34°C (Hodges et al., 2008; Ray et al., 2011; Hennessy et al., 2017). We use 32°C in this manuscript which is consistent with previously published literature from other groups and our own work (Sun et al., 2017; Lusk et al., 2022).

      I would include the units of the physiological variables in the tables.

      Thank you for the suggestion. The units of the physiological variables have been added in all the tables.

      Reviewer #3 (Recommendations For The Authors):

      Why is the C3 group not considered in this study?

      The C3 adrenergic group, best characterized in rat, is only seen in rodents but not in many other species including primates (including human) (Kitahama et al., 1994). Thus, the C3 group is not the focus of this study where we aim to discuss if glutamate derived from noradrenergic neurons could be the potential therapeutic target of human respiratory disorders. The C3 adrenergic group is typically described as a population containing only about 30 neurons. We have added the fate map data and the adult expression pattern for the three vesicular glutamate transporters for the C3 group in the figure 1 and 2 supplements for reference.

      Sub CD/CV does not appear to be defined in the manuscript.

      Thank you for the point. The definition of sub CD/CV has been added in the text (line 126).

      The data on line 131-133 is interesting but could be described more effectively and clearly.

      Thank you for the suggestion. The text has been modified accordingly.

      The end of the paragraph at lines 140 onwards is rather repeated in the paragraph that starts at line 146.

      The repeated text has been removed accordingly.

      Whilst anterior and posterior are correct anatomical terms, for a quadraped, rostral and caudal are more widely used - particularly in the brainstem field. Is there a particular reason for using anterior/posterior?

      We followed the anatomical terminations in the Robertson et al. (2013) where they used anterior/posterior to describe C2/A2 and C1/A1.

      On the protocol lines include in Figure 4-7 it would be worth adding the test day. This seems a little strange. Why wait up to one week after the habituation to perform the stimulation. How many mice were left for each day between habituation and experimentation, and does this timing affect responses? Do mice forget the habituation after a period?

      Thank you for the point. We have added the test day for plethysmography in figures 4-7. After the 5 days of habituation, we began the plethysmography recordings on the sixth day. A maximum of 6 mice can be assayed for plethysmography per day due to the limited number of barometric flow through plethysmography and metabolic measurement systems we have. Thus, all animals were finished with plethysmography “within” one week of the last day of habituation. This protocol is consistent with our previous published work (Martinez et al., 2019; Lusk et al., 2022; Lusk et al., 2023). For the experiments in this manuscript, mice were assayed within 3 days after habituation. As noted in our methods and figures, each mouse is given as much as 40 mins to acclimate to the chamber (determined by directly observed quiet breathing) before data acquisition. We have no reason or evidence that indicates testing order and thus timing was a factor. The detailed explanation for the plethysmography protocol has been added in the material and methods section (line 606-625).

      Please state clearly that each mouse is only exposed to one gas mixture (what I interpret is the case), or could one mouse be exposed to several different stimuli?

      Each mouse is only exposed to one gas challenge (5% CO2, 7% CO2, 10% CO2, or 10% O2) in a testing period. Each testing period for an individual mouse was separated by 24hs to allow for a full recovery. The protocol is to put the mouse under room air for 45mins, switch to one gas challenge for 20mins, and switch back to room air for 20mins.

      With apologies if I missed this, but did each of the respiratory stimuli produce a statistically significant response in the control mice? For example, the response to 10%O2?

      Yes, each respiratory stimuli including 5/7/10% CO2 and 10% O2 produced a statistically significant response in both mutant and control mice. We have labeled the statistical significance in the Figures 4-7. Thank you for pointing this out.

      Line 312: Optogenetic stimulation induced an increase from 130 to 180 breaths per min (Abbott et al., EJN 2014). It is surprising that this is called "modest". Baseline respiratory frequency was presented.

      Thank you for the point. The word “modest” has been removed and the discussion has been changed accordingly (line 355-360).

      Line 338: This discussion is not sufficiently nuanced. It is the increased Dia amplitude (to KCN only, not 10%CO2 ) and the stimulation of active expiration, to both stimuli, that is blocked by kyn in pFRG. There is no effect of breathing frequency. The current study would not detect such differences in active expiration.

      Thank you for the suggestion. The discussion has been modified accordingly (line 382-388).

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1:

      Summary:

      The authors study age-related changes in the excitability and firing properties of sympathetic neurons, which they ascribe to age-related changes in the expression of KCNQ (Kv7, "M-type") K+ currents in rodent sympathetic neurons, whose regulation by GPCRs has been most thoroughly studied for over 40 years.

      Strengths:

      The strengths include the rigor of the current-clamp and voltage-clamp experiments and the lovely, crisp presentation of the data, The separation of neurons into tonic, phasic and adapting classes is also interesting, and informative. The ability to successfully isolate and dissociate peripheral ganglia from such older animals is also quite rare and commendable! There is much useful detail here.

      Thank you for recognizing the effort we put on presenting the data and analyzing the neuronal populations. I also believe the ability to isolate neurons from old animals is worth communicating to the scientific community.

      Weaknesses:

      Where the manuscript becomes less compelling is in the rapamycin section, which does not provide much in the way of mechanistic insights. As such, the effect is more of an epi-phenomenon of unclear insight, and the authors cannot ascribe a signaling mechanism to it that is supported by data. Thus, this latter part rather undermines the overall impact and central advance of the manuscript. The problem is exacerbated by the controversial and anecdotal nature of the entire mTor/aging field, some of whose findings have very unfortunately had to be recently retracted.

      I would strongly recommend to the authors that they end the manuscript with their analysis of the role of M current/KCNQ channels in the numerous age-related changes in sympathetic neuron function that they elegantly report, and save the rapamycin, and possible mTor action, for a separate line of inquiry that the authors could develop in a more thorough and scholarly way.

      Whereas the description of the data are very nice and useful, the manuscript does not provide much in the way of mechanistic insights. As such, the effect is more of an epi-phenomenon of unclear insight, and the authors cannot ascribe changes in signaling mechanisms, such as that of M1 mAChRs to the phenomena that is supported by data.

      I appreciate the new comment. We had agreed that our rapamycin experiments did not allow to ascribe the mechanism to the signaling pathway of mTOR. The new comment mentions M1 mAChRs signaling as another potential signaling mechanism. Our work centered on determining whether aging altered the function of sympathetic motor neurons and defining the mechanism. We presented evidence showing that the mechanism is a reduction of the M-current. We did not attempt to identify the signaling mechanism linking aging to a reduction in M-current. Therefore, we agree with the reviewer that we do not provide further details on the mechanism and that that remains an open question. However, I find it harsh to say that “the effect is more of an epiphenomenon of unclear insight”. How could we possibly test that the effect of aging on the excitability of these neurons only arises as a secondary effect or that is not causal? How could we test for sufficiency and necessity of aging? How could we modify the state of aging to test for causality? We would have to reverse aging and show that the effect on the excitability is gone. And that is exactly what we tried to do with the rapamycin experiment.

      Reviewer #1 (Recommendations For The Authors):

      (1) The significance values greater than p < 0.05 do not add anything and distract focus from the results that are meaningful. Fig. 5 is a good example. What does p = 0.7 mean? Or p = 0.6? Does this help the reader with useful information?

      I thank Reviewer 1 for raising this question. We have attempted different versions of how we report p values, as we want to make sure to address rigor and transparency in reporting data. As corresponding author, I favor reporting p values for all statistical comparisons. To help the reader identifying what we considered statistically significant, we color coded the p values, with red for p-value<0.05 and black for p-value>0.05. As a reader, seeing a p-value=0.7 allows me to know that the authors performed an analysis comparing these conditions and found the mean not to be different. Not presenting the p-value makes me wonder whether the authors even analyzed those groups. In other words, I value more the ability to analyze the data seeing all p-values than not being distracted by not-significant p-values. This is just my preference.

      (2) Fig. 1 is not informative and should be removed.

      I thank Reviewer 1 for the suggestion. In previous drafts of the manuscript, this figure was included only as a panel. However, we decided it was better to guide the reader into the scope of our work. This is part of our scientific style and, therefore, we prefer to keep the figure.

      (3) The emphasis on a particular muscarinic agonist favored by many ion channel physiologists, oxotremorine, is not meaningful (lines 192, 198). The important point is stimulation of muscarinic AChRs, which physiologically are stimulated by acetylcholine. The particular muscarinic agonist used is unimportant. Unless mandated by eLife, "cholinergic type 1 muscarinic receptors" are usually referred to as M1 mAChRs, or even better is "Gq-coupled M1 mAChRs." I don't think that Kruse and Whitten, 2021 were the first to demonstrate the increase in excitability of sympathetic neurons from stimulation of M1 mAChRs. Please try and cite in a more scholarly fashion.

      A) I have modified lines 192 and 198 removing mention to oxotremorine.

      B) I have modified the nomenclature used to refer to cholinergic type 1 muscarinic receptors.

      C) I cited references on the role of M current on sympathetic motor neuron excitability. I also removed the reference (Kruse and Whitten, 2021) referring only on the temporal correlation between the decrease of KCNQ current with excitability.

      (4) The authors may want to use the term "M current" (after defining it) as the current produced by KCNQ2&3-containing channels in sympathetic neurons, and reserve "KCNQ" or "Kv7" currents as those made by cloned KCNQ/Kv7 channels in heterologous systems. A reason for this is to exclude currents KCNQ1-containing channels, which most definitely do not contribute to the "KCNQ" current in these cells. I am not mandating this, but rather suggesting it to conform with the literature.

      Thank you for the suggestion. I have modified the text to use the term M current. I maintain the use of KCNQ only when referring to KCNQ channel, such as in the section describing the abundance of KCNQ2.

      (5) The section in the text on "Aging reduces KCNQ current" is confusing. Can the authors describe their results and their interpretation more directly?

      I am not sure to understand the request. I assumed point 5 and 6 are related and decided to answer point 6.

      (6) Please explain the meaning of the increase in KCNQ2 abundance with age in Fig. 6G. How is this increase in KCNQ2 expression consistent with an increase in excitability? The explanation of "The decrease in KCNQ current and the increase in the abundance of KCNQ2 protein suggest a potential compensatory mechanism that occurs during aging, which we are actively investigating in an independent study." is rather odd, considering that the entire thesis of this paper is that changes in excitability and firing properties are underlied by changes in KCNQ2/3 channel expression/density. Suddenly, is this not the case?? What about KCNQ3? It would be very enlightening if the authors would just quantify the ratio of KCNQ2:KCNQ3 subunits in M-type channels in young and old mice using simple TEA dose/response curves (see Shapiro et al., JNS, 2000; Selyanko et al., J. Physiol., Hadley et al., Br. J. Pharm., 2001 and a great many more). It is also surprising that the authors did not assess or probe for differences in mAChR-induced suppression of M current between SCG neurons of young and old mice. This would seem to be a fundamental experiment in this line of inquiry.

      A. Please explain the meaning of the increase in KCNQ2 abundance with age in Fig. 6G. How is this increase in KCNQ2 expression consistent with an increase in excitability? The explanation of "The decrease in KCNQ current and the increase in the abundance of KCNQ2 protein suggest a potential compensatory mechanism that occurs during aging, which we are actively investigating in an independent study." is rather odd, considering that the entire thesis of this paper is that changes in excitability and firing properties are underlied by changes in KCNQ2/3 channel expression/density. Suddenly, is this not the case?? Our interpretation is that the decrease in M current is not caused by a decrease in the abundance of KCNQ (2) channels. We do not claim that changes in excitability are underlied by a reduction in the expression or density of KCNQ2 channels. On the contrary, our working hypothesis is that the reduction in M current is caused by changes in traffic, degradation, posttranslational modifications, or cofactors for KCNQ2 or KCNQ3 channels. We have modified the description in the results section to clarify this concept.

      B. What about KCNQ3? Unfortunately, we did not find an antibody to detect KCNQ3 channels. I have added a sentence to state this.

      C. KCNQ2:KCNQ3 subunits in M-type channels in young and old mice using simple TEA dose/response curves. This is a great idea. Thank you for the suggestion. Is this a necessary experiment for the acceptance of this manuscript?

      D. It is also surprising that the authors did not assess or probe for differences in mAChR-induced suppression of M current between SCG neurons of young and old mice. This would seem to be a fundamental experiment in this line of inquiry. Reviewer 1 is correct. We did not assess for differences in the suppression of M current by mAChR activation. We do not see the connection of this experiment with the scope of the current investigation.

      (7) Why do the authors use linopirdine instead of XE-991? Both are dirty drugs hardly specific to KCNQ channels at 25 uM concentrations, but linopirdine less so. The Methods section lists the source of XE991 used in the study, not linopirdine. Is there an error?

      A. Why do the authors use linopirdine instead of XE-991? After validation of KCNQ2/3 inhibition by Linopirdine, we found the effect on membrane potential recordings to be reproducible. Linopirdine has also been reported to be reversible. We wanted to assess reversibility on the excitability of young neurons. We did not find the effect to be reversible. We performed experiments applying XE-991 while recording the membrane potential. XE-991 did not show a clear effect. I was not surprised by this. It is very likely that the pharmacological inhibition of one channel leads to the activation of other channel types. This is highlighted in the work by Kimm, Khaliq, and Bean, 2015. “Further experiments revealed that inhibiting either BK or Kv2 alone leads to recruitment of additional current through the other channel type during the action potential as a consequence of changes in spike shape.” In fact, it was quite remarkable that the aged and young phenotypes were mimicked by targeting KCNQ pharmacologically.

      B. Both are dirty drugs hardly specific to KCNQ channels at 25 uM concentrations, but linopirdine less so. I have added a sentence to point out that linopirdine is less potent than XE-991. It reads: “We want to point out that linopirdine is less potent than XE-991 and that it has been reported to activate TRPV1 channels (Neacsu and Babes, 2010). Despite this limitation, the application of linopirdine to young sympathetic motor neurons led to depolarization and firing of action potentials.”

      C. The Methods section lists the source of XE991 used in the study, not linopirdine. Is there an error? Thank you for pointing out this. I have added information for both retigabine and linopirdine in the Methods section, both were missing.

      (8) Can the authors use a more scientific explanation of RTG action than "activating KCNQ channels?" For instance, RTG induces both a negative-shift in the voltage-dependance of activation and a voltage-independent increase in the open probability, both of which differing in detail between KCNQ2 and KCNQ3 subunits. The authors are free to use these exact words. Thus, the degree of "activation" is very dependent upon voltage at any voltages negative to the saturating voltages for channel activation.

      I have modified the text to reflect your suggestion.

      (9) Methods: did the authors really use "poly-l-lysine-coated coverslips?" Almost all investigators use poly-D-lysine as a coating for mammalian tissue-culture cells and more substantial coatings such as poly-D-lysine + laminin or rat-tail collagen for peripheral neurons, to allow firm attachment to the coverslip.

      That is correct. We used poly-L-lysine-coated coverslips. Sympathetic motor neurons do not adhere to poly-D-Lysine.

      (10) As a suggestion, sampling M-type/KCNQ/Kv7 current at 2 kHz is not advised, as this is far faster than the gating kinetics of the channels. Were the signals filtered?

      It is correct. Currents were sampled at 2KHz. Data were low-pass filtered at 3 KHz. Our conditions are not far from what is reported by others. Some sample at 10KHz and even 50 KHz. Others do not report the sample frequency.

      Reviewer #2:

      Weaknesses:

      None, the revised version of the manuscript has addressed all my concerns.

      I am glad we were able to satisfy previous concerns.

      Reviewer #3:

      The main weakness is that this study is a descriptive tabulation of changes in the electrophysiology of neurons in culture, and the effects shown are correlative rather than establishing causality.

      Allow me to clarify our previous responses and determine how this aligns with your concerns. In the previous revision, Reviewer 3 wrote: “It is difficult to know from the data presented whether the changes in KCNQ channels are in fact directly responsible for the observed changes in membrane excitability.” And suggested to “use of blockers and activators to provide greater relevance.” I assumed these comments were the main concern and that doing such experiments was enough to satisfy the criticism. It is discouraging to see that our experiments did not satisfy the concerns of the reviewer of being correlative.

      If Reviewer 3 is referring to stablishing causality between aging and a reduction in M current, I would like to emphasize that such endeavor is complicated as there is not a clear experiment to solve that issue. Our best attempt was to reverse aging with rapamycin, but the recommendation was to remove those experiments.

      … but the specifics of the effects and relevance to intact preparations are unclear. Additional experiments in slice cultures would provide greater significance on the potential relevance of the findings for intact preparations.

      I apologize for missing this point in the previous revision. The proposed experiments will require an upward microscope coupled to an electrophysiology rig. Unfortunately, I do not have the equipment to do these experiments.

      Summary of recommendations from the three reviewers:

      Please make corrections as suggested by reviewer 1 to improve the manuscript. Specifically, reviewer 1 suggests making changes to p values in Figure 5,

      It is not clear what the suggested changes are. The comment from Reviewer 1 says: The significance values greater than p < 0.05 do not add anything and distract focus from the results that are meaningful. If the suggested change is to remove p values > 0.05, I have explained my rational for keeping those values. If the Journal has a specific format on how to report p-values, I will be happy to make appropriate changes.

      and the importance of citing original scholarly works related to effects of increase in excitability of sympathetic neurons by M1 receptors, and the terminology for M currents and KCNQ currents. These changes will improve the manuscript and are strongly recommended.

      I cited original papers on that area, and changed the terminology for M current. I kept KCNQ when referring to the channel protein or abundance.

      The section dealing with Aging Reduces KCNQ currents seems to contain a lot of extraneous information especially in the last part of the long paragraph and this section should be rewritten for improved clarity… and - the implications or lack thereof - of the correlation of KCNQ with AP firing rates.

      A. I removed extraneous information in that section. It now reads: Previous work by our group and others demonstrated that cholinergic stimulation leads to a decrease in M current and increases the excitability of sympathetic motor neurons at young ages \cite{RN67,RN68,RN69,RN71, RN72, RN73, RN74, RN75}. The molecular determinants of the M current are channels formed by KCNQ2 and KCNQ3 in these neurons \cite{RN76, RN77, RN70}. Thus, Figure 6A shows a voltage response (measured in current-clamp mode) and a consecutive M current recording (measured in voltage-clamp mode) in the same neuron upon stimulation of cholinergic type 1 muscarinic receptors. It illustrates the temporal correlation between the decrease of M current with the increase in excitability and firing of APs upon activation with oxotremorine. This strong dependence led us to hypothesize that aging decreases M current, leading to a depolarized RMP and hyperexcitability (Figure 6B). For these experiments, we measured the RMP and evoked activity using perforated patch, followed by the amplitude of M current using a whole-cell voltage clamp in the same cell. We also measured the membrane capacitance as a proxy for cell size. Interestingly, M current density was smaller by 29\% in middle age (7.5 ± 0.7 pA/pF) and by 55\% in old (4.8 ± 0.7 pA/pF) compared to young (10.6 ± 1.5 pA/pF) neurons (Figure 6C-D). The average capacitance was similar in young (30.8 ± 2.2 pF), middle-aged (27.4 ± 1.2 pF), and old (28.8 ± 2.3 pF) neurons (Figure 6E), suggesting that aging is not associated with changes in cell size of sympathetic motor neurons, and supporting the hypothesis that aging alters the levels of M current. Next, we tested the effect on the abundance of the channels mediating M current. Contrary to our expectation, we observed that KCNQ2 protein levels were 1.5 ± 0.1 -fold higher in old compared to young neurons (Figure 6F-G). Unfortunately, we did not find an antibody to detect consistently KCNQ3 channels. We concluded that the decrease in M current is not caused by a decrease in the abundance of KCNQ2 protein.

      B. and - the implications or lack thereof - of the correlation of KCNQ with AP firing rates. I am not sure to understand the request on the section of the correlation of KCNQ with AP firing rate. I divided the long paragraph.

      The apparent lack of correlation between KCNQ current and KCNQ2 protein needs to be better explained. This is a central part of the study and this result undercuts the premise of the paper.

      Indeed, total KCNQ2 protein abundance increases while M current decreases. We do not claim in our work that changes in excitability are caused by a reduction in the expression or density of KCNQ2 channels. On the contrary, our current working hypothesis is that the reduction in M current is caused by changes in traffic, degradation, posttranslational modifications, or cofactors for KCNQ2 or KCNQ3 channels. I have modified the description in the results section and discussion to clarify this concept.

      Additionally, the poor specificity of Linordipine for KCNQ should be pointed out in the limitations.

      I pointed this limitation. It reads: We want to point out that linopirdine is less potent than XE-991 and that it has been reported to activate TRPV1 channels (Neacsu and Babes, 2010). Despite this limitation, the application of linopirdine to young sympathetic motor neurons led to depolarization and firing of action potentials.

      Finally, the editor notes that the author response should not contain ambiguities in what was addressed in the revision. In the original summary of consolidated revisions that were requested, one clearly and separately stated point (point 4) was that experiments in slice cultures should be strongly considered to extend the significance of the work to an intact brain preparation. The author response letter seems to imply that this was done, but this is not the case. The author response seems to have combined this point with another separate point (point 3) about using KCNQ drugs, and imply that all concerns were addressed. Authors should be clear about what revisions were in fact addressed.

      As corresponding author, and direct responsible of the document provided for the reply to the reviewers, I apologize for my mistake. After reviewing this comment, I realized I did not respond to the Major points in the section of the Recommendations for the authors from Reviewer 3. I missed that entire section. My previous responses addressed the Public review of reviewer 3. When doing so, I did not separate the sentences, omitting the request on performing the experiment in slices.


      The following is the authors’ response to the original reviews.

      Reviewer #1

      Summary:

      The authors study age-related changes in the excitability and firing properties of sympathetic neurons, which they ascribe to age-related changes in the expression of KCNQ (Kv7, "M-type") K+ currents in rodent sympathetic neurons, whose regulation by GPCRs has been most thoroughly studied for over 40 years. The authors suggest the ingestion of rapamycin may partially reverse the age-related decrease in M-channel expression. With the rapamycin part included, it is unclear how this work will impact the field of age-related neuronal dysfunction, as the mechanistic information is not strong.

      Strengths:

      The strengths include the rigor of the current-clamp and voltage-clamp experiments, the lovely, crisp presentation of the data, and the expert statistics. The separation of neurons into tonic, phasic, and adapting classes is also interesting, and informative. The writing is also elegant, and crisp. The above is especially true of the manuscript up until the part dealing with the effects of rapamycin, which becomes less compelling.

      We appreciate the thoughtful comments and constructive feedback to improve the impact of the manuscript.

      Weaknesses:

      Where the manuscript becomes less compelling is in the rapamycin section, which does not provide much in the way of mechanistic insights. As such, the effect is more of an epi-phenomenon of unclear insight, and the authors cannot ascribe a signaling mechanism to it that is supported by data. Thus, this latter part rather undermines the overall impact and central advance of the manuscript. The problem is exacerbated by the controversial and anecdotal nature of the entire mTor/aging field, some of whose findings have very unfortunately had to be recently retracted.

      I would strongly recommend to the authors that they end the manuscript with their analysis of the role of M current/KCNQ channels in the numerous age-related changes in sympathetic neuron function that they elegantly report, and save the rapamycin, and possible mTor action, for a separate line of inquiry that the authors could develop in a more thorough and scholarly way.

      We agree with the reviewer in that we cannot ascribe a signaling mechanism to the reversibility observed with rapamycin. Therefore, we are following the recommendation of the reviewer and have removed the rapamycin section.

      We want to emphasize that, in the aging field, any advancement in the knowledge of how drugs such as rapamycin reverse age-associated phenotypes is of crucial importance. These drugs, commonly referred to as aging interventions, include rapamycin, calorie restriction, elamipretide, and metformin. We could have used any of these interventions. And yet, the cellular and molecular mechanisms for each one of these anti-aging drugs are unknown.

      We want to note that, although the nature of the mTOR field is controversial, the effect of rapamycin in extending lifespan and improving health is not. At least these authors have not been able to find retracted papers on that subject or notices from the NIA alerting on this issue. We kindly request the reviewer to provide the references related to rapamycin that were retracted so we can evaluate how that affects the rigor of the premise for our future work.

      As authors, we also find it important to note that we are confident of our observations regarding the effect of rapamycin, and that we are not removing this section because we are retracting our claims. We will use these data to continue our research of the mechanism behind the effect of aging on sympathetic motor neurons.

      Reviewer #2:

      Summary:

      This research shows compelling and detailed evidence showing that aging influences intrinsic membrane properties of peripheral sympathetic motor neurons such that they become more excitable. Furthermore, the authors present convincing evidence that the oral administration of the anti-aging drug Rapamycin partially reversed hyperexcitability in aged neurons. This study also investigates the molecular mechanisms underlying age-associated hyperexcitability in mouse sympathetic motor neurons. In that regard, the authors found an age-associated reduction of an outward current having properties similar to KCNQ2/Q3 potassium current. They suggested a reduction of KCNQ2/Q3 current density in aged neurons as a potential mechanism behind their overactivity.

      Strengths:

      Detailed and rigorous analysis of electrical responses of peripheral sympathetic motor neurons using electrophysiology (perforated patch and whole-cell recordings). Most of the conclusions of this paper are well supported by the data.

      We thank the reviewer for valuing our effort to present a detailed and rigorous analysis.

      Weaknesses:

      (1) The identity of the age-associated reduced current as KCNQ2/Q3 is not corroborated by pharmacology (blocking the current with the specific blocker XE-991).

      We have performed experiments using blockers of KCNQ channels. See responses below.

      (2) The manuscript does not include a direct test of the reduction of KCNQ current as the mechanism behind age-induced hyperexcitability.

      Thank you for raising this point. We have performed experiments blocking KCNQ channels with Linopiridine in young neurons and found that the pharmacological reduction of KCNQ current was enough to depolarize the cell and, in some cases, elicit the firing of action potentials. We present the results in a new figure. We also added the description in the Results section.

      Reviewer #3:

      This is a descriptive study of membrane excitability and Na+ and K+ current amplitudes of sympathetic motor neurons in culture. The main findings of the study are that neurons isolated from aged animals show increased membrane excitability manifested as increased firing rates in response to electrical stimulation and changes in related membrane properties including depolarized resting membrane potential, increased rheobase, and spontaneous firing. By contrast, neuron cultures from young mice show little to no spontaneous firing and relatively low firing rates in response to current injection. These changes in excitability correlate with significant reductions in the magnitude of KCNQ currents in aged neurons compared to young neurons. Treating cultures with the immunosuppressive drug, rapamycin, which has known antiaging effects in model animals appears to reverse the firing rates in aged neurons and enhance KCNQ current. The authors conclude that aging promotes hyperexcitability of sympathetic motor neurons.

      The electrophysiological cataloging of the neuronal properties is generally well done, and the experiments are performed using perforated patch recordings which preserve the internal constituents of neurons, providing confidence that the effects seen are not due to washout of regulators from the cells.

      The main weakness is that this study is a descriptive tabulation of changes in the electrophysiology of neurons in culture, and the effects shown are correlative rather than establishing causality. It is difficult to know from the data presented whether the changes in KCNQ channels are in fact directly responsible for the observed changes in membrane excitability.

      We appreciate the constructive criticism. In an attempt to assess whether changes in KCNQ are in fact directly responsible for the changes in membrane excitability, we have performed experiments blocking KCNQ channels with Linopirdine in young neurons and found that the pharmacological reduction of KCNQ current was enough to depolarize the cell and, in some cases, elicit the firing of action potentials. Conversely, we activated KCNQ channels in old neurons with retigabine and found that the pharmacological activation was enough to hyperpolarize the membrane potential and stop the firing of action potentials. This effect was reversible. These two experiments provide solid evidence to our statement that age-associated reduction of KCNQ activity is responsible for the hyperexcited state in sympathetic motor neurons. We present the results in a new figure (Figure 8). We also added the description in the Results section.

      Furthermore, a notable omission seems to be the analysis of Ca2+ currents which have been widely linked to alterations in membrane properties in aging.

      We thank the reviewer for the comment. We did omit to include data on our studies of calcium currents. We agree that the study of the effect of calcium currents is relevant as it can influence the afterhyperpolarization. Furthermore, we believe that potential effects on calcium currents need to be studied in relation to other physiological processes that depend on calcium, including excitation-transcription coupling, calcium handling, and neurotransmitter release. Adding this information to this manuscript would only contribute to the tabulation of effects that we observe in sympathetic motor neurons with aging. As our main goal was to determine the ion channels responsible for the hyperexcited state, voltage-gated calcium channels or other calcium sources could have reflected a more indirect mechanism as compared to changes in sodium or potassium currents. We will continue our investigation on calcium currents and report our observations in the future, but for now, we have decided to leave it out of this work.

      As well, additional experiments in slice cultures would provide greater significance on the potential relevance of the findings for intact preparations. Finally, experiments using KCNQ blockers and activators could provide greater relevance that the observed changes in KCNQ are indeed connected to changes in membrane excitability.

      We are happy to report that we have performed these experiments and that the results strengthen the conclusion that changes in KCNQ are connected to changes in membrane excitability.

      Recommendations for the authors:

      We recommend the following essential revisions summarized from the reviews:

      (1) Is the change in KCNQ current responsible for the altered membrane excitability? What happens to membrane excitability when KCNQ is partially blocked (see reviewer 2 comment below)? Conversely, what happens to the excitability of aged neurons if KCNQ is activated (e.g., with retigabine)? (see reviewer 3 comment below). Results of these important experiments are needed to support the argument that KCNQ underlies the alterations in firing and membrane excitability.

      We have responded to this point. Thank you for the suggested experiments. In summary, the new experiments show that blocking KCNQ channels in young neurons lead to depolarization, and in some cases, the firing of action potentials. Conversely, the activation of KCNQ channels in aged neurons leads to hyperpolarization and a cease of firing. We have added a new figure and reported the results in the Results section.

      (2) Rapamycin experiments are underdeveloped and weak. These should be further developed by examining the effects of KCNQ blockers to see if their effects on membrane excitability are reversed. Also, see comment 2 from reviewer 1.

      We have followed the recommendation by reviewer 1 and removed the section on rapamycin.

      (3) The study should examine voltage-gated calcium currents to determine potential changes in these currents with aging. See reviewer 3 comments.

      We thank the reviewer for the comment. We performed preliminary experiments and found that aging impacts calcium currents. However, we omitted to include the data. In our opinion, the changes in calcium currents are outside the scope of this work, as the changes could be related to physiological processes that go beyond the control of firing. Effects on calcium currents need to be studied in relation to other physiological processes that depend on calcium, including excitation-transcription coupling, calcium handling, and neurotransmitter release. The study of the relationship between changes in calcium currents and those physiological processes would require multiple experiments and detailed analysis. We will continue our investigation on calcium currents and report our observations in the future, but for now, we have decided to leave it out of this work.

      We have also edited suggestions in the Figures and Legends.

      (2) In Fig.4 panel H, Y-axis must be # AP at 100 pA.

      We corrected the axis in Figure 4H.

      (3) In Legend Fig. 5, the number of cells for each subpopulation (n) needs to be corrected. In plots F-I, n= 9, 7, and 3 seem to be the number of adapting cells for 12-, 64- and 115w-old, respectively, instead of the number of single, phasic, and old cells for 12-week-old mice. A similar correction seems to be needed for 64-week-old and 115-week-old.

      We corrected the n number in Figure 5.

      (4) In Figure 6 panel C, it would be helpful for a reader to align the voltage protocol depicted with the current shown.

      We have aligned the voltage protocol to the current traces.

      (5) In the legend of Figure 7, the description of panel A ends with "Magnitude of voltage step to elicit each trace is shown in black", however in panel A there is no voltage depiction. In the description of panel D, "N = X animals, n=x cells" must be corrected.

      We have modified the legend to clarify. It now reads: “Text at the right of each current trace corresponds to the voltage used to elicit that current.”

      New Figure 8

      Author response image 1.

      Pharmacological inhibition and activation of KCNQ channels mimic the age-dependent phenotype. A. Membrane potential recordings from two young neurons treated with 25 μM linopirdine during the time illustrated by the light gray box. No holding current was applied. B. Left: Summary of the resting membrane potential measured before (light orange) and after (dark orange) the application of linopirdine. Right: Summary of the depolarization produced by linopirdine calculated by subtracting the post-drug voltage from the pre-drug voltage (V). Data points are from N = 2 animals, n = 8 cells, 14-week-old mice. C. Membrane potential recordings from two aged neurons treated with 10 μM retigabine during the time illustrated by the light gray box. No holding current was applied. D. Left: Summary of the resting membrane potential measured before (light purple) and after (dark purple) the application of retigabine. Right: Summary of the hyperpolarization produced by retigabine calculated by subtracting the post-drug voltage from the pre-drug voltage (V). Data points are from N = 2 animals, n = 7 cells, 120-week-old mice. P-values are shown at the top of the graphs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this important paper, Blin and colleagues develop a high-throughput behavioral assay to test spontaneous swimming and olfactory preference in individual Mexican cavefish larvae. The authors present compelling evidence that the surface and cave morphs of the fish show different olfactory preferences and odor sensitivities and that individual fish show substantial variability in their spontaneous activity that is relevant for olfactory behaviour. The paper will be of interest to neurobiologists working on the evolution of behaviour, olfaction, and the individuality of behaviour.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors posed a research question about how an animal integrates sensory information to optimize its behavioral outputs and how this process evolved. Their data (behavioral output analysis with detailed categories in response to the different odors in different concentrations by comparing surface and cave populations and their hybrid) partially answer this tough question. They built a new low-disturbance system to answer the question. They also found that the personality of individual fish is a good predictor of behavioral outputs against odor response. They concluded that cavefish evolved to specialize their response to alanine and histidine while surface fish are more general responders, which was supported by their data.

      Strengths:

      With their new system, the authors could generate clearer results without mechanical disturbances. The authors characterize multiple measurements to score the odor response behaviors, and also brought a new personality analysis. Their conclusion that cavefish evolved as a specialist to sense alanine and histidine among 6 tested amino acids was well supported by their data.

      Weaknesses:

      The authors posed a big research question: How do animals evolve the processes of sensory integration to optimize their behavioral outputs? I personally feel that, to answer the questions about how sensory integration generates proper (evolved) behavior, the authors at least need to show the ecological relevance of their response. For the alanine/histidine preference in cavefish, they need data for the alanine and other amino acid concentrations in the local cave water and compare them with those of surface water.

      We agree with the reviewer. This is why, in the Discussion section, we had written: “…Such significant variations in odor preferences or value may be adaptive and relate to the differences in the environmental and ecological conditions in which these different animals live. However, the reason why Pachón cavefish have become “alanine specialists” remains a mystery and prompts analysis of the chemical ecology of their natural habitat. Of note, we have not found an odor that would be repulsive for Astyanax so far, and this may relate to their opportunist, omnivorous and detritivore regime (Espinasa et al., 2017; Marandel et al., 2020).” This is also why we currently develop field work projects aimed at clarifying this question. However, such experiments and analyses are challenging, practically and technically. We hope we can reach some conclusions in the future.

      To complete the discussion we have also added an important hypothesis: “Alternatively, specialization for alanine may not need to be specific for an olfactory cue present only, or frequently, or in high amounts in caves. Bat guano for example, which is probably the main source of food in the Pachón cave, must contain many amino acids. Enhanced recognition of one of them - in the present case alanine but evolution may have randomly acted for enhanced recognition of another amino acid – should suffice to confer cavefish with augmented sensitivity to their main source of nutriment.”

      Also, as for "personality matters", I read that personality explains a large variation in surface fish. Also, thigmotaxis or wall-following cavefish individuals are exceeded to respond well to odorants compared with circling and random swimming cavefish individuals. However, I failed to understand the authors' point about how much percentages of the odorant-response variations are explained (PVE) by personality. Association (= correlation) was good to show as the authors presented, but showing proper PVE or the effect size of personality to predict the behavioral outputs is important to conclude "personality is matter"; otherwise, the conclusion is not so supported.

      From the above, I recommend the authors reconsider the title also their research questions well. At this moment, I feel that the authors' conclusions and their research questions are a little too exaggerated, with less supportive evidence.

      Thank you for this interesting suggestion, which we have fully taken into consideration. We have therefore now calculated and plotted PVE (the percentage of variation explained on the olfactory score) as a function of swimming speed or as a function of swimming pattern. The results are shown in modified Figure 8 of our revised ms and they suggest that the personality (here, swimming patterns or swimming speed) indeed predicts the olfactory response skills. Therefore, we would like to keep our title as we provide support for the fact that “personality matters”.

      Also, for the statistical method, Fisher's exact test is not appropriate for the compositional data (such as Figure 2B). The authors may quickly check it at https://en.wikipedia.org/wiki/Compositional_data or https://www.annualreviews.org/doi/pdf/10.1146/annurev-statistics-042720-124436.

      The authors may want to use centered log transformation or other appropriate transformations (Rpackage could be: https://doi.org/10.1016/j.cageo.2006.11.017). According to changing the statistical tests, the authors' conclusion may not be supported.

      Actually, in most cases, the distributions are so different (as seen by the completely different colors in the distribution graphs) that there is little doubt that swimming behaviors are indeed different between surface and cavefish, or between ‘before’ and ‘after’ odor stimulation. However, it is true that Fisher’s exact test is not fully appropriate because data can be considered as compositional type. For this kind of data, centered log transformation have been suggested. However, our dataset contains many zeros, and this is a case where log transformations have difficulty handling.

      To help us dealing with our data, the reviewer proposed to consider the paper by Greenacre (2021) (https://www.annualreviews.org/doi/pdf/10.1146/annurev-statistics-042720-124436). In his paper, Greenacre clearly wrote: "Zeros in compositional data are the Achilles heel of the logratio approach (LRA)."

      Therefore, we have now tested our data using CA (Correspondence Analysis), that can deal with table containing many zeros and is a trustable alternative to LRA (Cook-Thibeau, 2021; Greenacre, 2011).

      The results of CA analysis are shown in Supplemental figure 8 and they fully confirm the difference in baseline swimming patterns between morphs as well as changes (or absence of changes) in behavioral patterns after odor stimulation suggested by the colored bar plots in main figures, with confidence ellipses overlapping or not overlapping, depending on cases. Therefore, the CA method fully confirms and even strengthens our initial interpretations.

      Finally, we have kept our initial graphical representation in the ms (color-coded bar plots; the complete color code is now given in Suppl. Fig7), and CA results are shown in Suppl. Figure 8 and added in text.

      Reviewer #2 (Public Review):

      In their submitted manuscript, Blin et al. describe differences in the olfactory-driven behaviors of river-dwelling surface forms and cave-dwelling blind forms of the Mexican tetra, Astyanax mexicanus. They provide a dataset of unprecedented detail, that compares not only the behaviors of the two morphs but also that of a significant number of F2 hybrids, therefore also demonstrating that many of the differences observed between the two populations have a clear (and probably relatively simple) genetic underpinning.

      To complete the monumental task of behaviorally testing 425 six-week-old Astyanax larvae, the authors created a setup that allows for the simultaneous behavioral monitoring of multiple larvae and the infusion of different odorants without introducing physical perturbations into the system, thus biasing the responses of cavefish that are particularly fine-tuned for this sensory modality. During the optimization of their protocol, the authors also found that for cave-dwelling forms one hour of habituation was insufficient and a full 24 hours were necessary to allow them to revert to their natural behavior. It is also noteworthy that this extremely large dataset can help us see that population averages of different morphs can mask quite significant variations in individual behaviors.

      Testing with different amino-acids (applied as relevant food-related odorant cues) shows that cavefish are alanine- and histidine-specialists, while surface fish elicit the strongest behavioral responses to cysteine. It is interesting that the two forms also react differently after odor detection: while cave-dwelling fish decrease their locomotory activity, surface fish increase it. These differences are probably related to different foraging strategies used by the two populations, although, as the observations were made in the dark, it would be also interesting to see if surface fish elicit the same changes in light as well.

      Thank you for these nice comments.

      Further work will be needed to pinpoint the exact nature of the genetic changes that underlie the differences between the two forms. Such experimental work will also reveal how natural selection acted on existing behavioral variations already present in the SF population.

      Yes. Searching for genetic underpinnings of the sensory-driven behavioral differences is our current endeavor through a QTL study and we should be able to report it in the near future.

      It will be equally interesting, however, to understand what lies behind the large individual variation of behaviors observed both in the case surface and cave populations. Are these differences purely genetic, or perhaps environmental cues also contribute to their development? Does stochasticity provided by the developmental process has also a role in this? Answering these questions will reveal if the evolvability of Astyanax behavior was an important factor in the repeated successful colonization of underground caves.

      Yes. We will also access (at least partially) responses to most of these questions in our current QTL study.

      Reviewer #3 (Public Review):

      Summary:

      The paper explores chemosensory behaviour in surface and cave morphs and F2 hybrids in the Mexican cavefish Astyanax mexicanus. The authors develop a new behavioural assay for the longterm imaging of individual fish in a parallel high-throughput setup. The authors first demonstrate that the different morphs show different basal exploratory swimming patterns and that these patterns are stable for individual fish. Next, the authors test the attraction of fish to various concentrations of alanine and other amino acids. They find that the cave morph is a lot more sensitive to chemicals and shows directional chemotaxis along a diffusion gradient of amino acids. For surface fish, although they can detect the chemicals, they do not show marked chemotaxis behaviour and have an overall lower sensitivity. These differences have been reported previously but the authors report longer-term observations on many individual fish of both morphs and their F2 hybrids. The data also indicate that the observed behavior is a quantitative genetic trait. The approach presented will allow the mapping of genes' contribution to these traits. The work will be of general interest to behavioural neuroscientists and those interested in olfactory behaviours and the individual variability in behavioural patterns.

      Strengths:

      A particular strength of this paper is the development of a new and improved setup for the behavioural imaging of individual fish for extended periods and under chemosensory stimulation. The authors show that cavefish need up to 24 h of habituation to display a behavioural pattern that is consistent and unlikely to be due to the stressed state of the animals. The setup also uses relatively large tanks that allow the build-up of chemical gradients that are apparently present for at least 30 min.

      The paper is well written, and the presentation of the data and the analyses are clear and to a high standard.

      Thank you for these nice comments.

      Weaknesses:

      One point that would benefit from some clarification or additional experiments is the diffusion of chemicals within the behavioural chamber. The behavioural data suggest that the chemical gradient is stable for up to 30 min, which is quite surprising. It would be great if the authors could quantify e.g. by the use of a dye the diffusion and stability of chemical gradients.

      OK. We had tested the diffusion of dyes in our previous setup and we also did in the present one (not shown). We think that, due to differences of molecular weight and hydrophobicity between the tested dyes and the amino acid molecules we are using, their diffusion does not constitute a proper read-out of actual amino acid diffusion. We anticipate that amino acid diffusion is extremely complex in the test box, possibly with odor plumes diffusing and evolving in non-gradient patterns, in the 3 dimensions of the box, and potentially further modified by the fish swimming through it, the flow coming from the opposite water injection side and the borders of the box. This is the reason why we have designed the assay with contrasting “odor side” and “water control side”. Moreover, our question here is not to determine the exact concentration of amino acid to which the fish respond, but to compare the responses in cavefish, surface fish and F2 hybrids. Finally and importantly, we have performed dose/response experiments whereby varying concentrations have been presented for 3 of the 6 amino acids tested, and these experiments clearly show a difference in the threshold of response of the different morphs.

      The paper starts with a statement that reflects a simplified input-output (sensory-motor) view of the organisation of nervous systems. "Their brains perceive the external world via their sensory systems, compute information and generate appropriate behavioral outputs." The authors' data also clearly show that this is a biased perspective. There is a lot of spontaneous organised activity even in fish that are not exposed to sensory stimulation. This sentence should be reworded, e.g. "The nervous system generates autonomous activity that is modified by sensory systems to adapt the behavioural pattern to the external world." or something along these lines.

      Done

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In addition to my comments in the "weakness" section above, here are my other comments.

      How many times fish were repeatedly assayed and what the order (alanine followed by cysteine, etc) was, is not clear (Pg 24, Materials and Methods). I am afraid that fish memorize the prior experience to get better/worse their response to the higher conc of alanine, etc. Please clarify this point.

      Many fish were tested in different conditions on consecutive days, indeed. Most often, control experiments (eg, water/nothing; water/water; nothing/nothing) were followed by odor testing. In such cases, there is no risk that fish memorize prior experience and that such previous experience interferes with response to odor. In other instances, fish were tested with a low concentration of one amino acid, followed by a high concentration of another amino acid, which is also on the safe side. Of note, on consecutive days, the odors were always perfused on alternate sides of the test box, to avoid possibility of spatial memory. Finally, in the few cases where increasing concentrations of the same amino acids were perfused consecutively, 1) they were perfused on alternate sides, 2) if the fish does not detect a low concentration below threshold / does not respond, then prior experience should not interfere for responding to higher concentrations, and 3) we have evidence (unpublished, current studies) that when a fish is given increasing concentrations of the same amino acid above detection threshold, then the behavioral response is stable and reproducible (eg does not decrease or increase).

      Minor points:

      Thygmotaxis and wall following.

      Classically, thigmotaxis and wall following are treated as the same (sharma et al., 2009; https://pubmed.ncbi.nlm.nih.gov/19093125/) but the authors discriminate it in thigmotaxis at X-axis and Y-axis because fish repeatedly swam back and forth on x-axis wall or y-axis wall. I understand the authors' point to discriminate WF and T but present them with more explanations (what the differences between them) in the introduction and result sections.

      Done

      Pg5 "genetic architecture" in the introduction.

      "Genetic architecture" analysis needs a more genomic survey, such as GWAS, QTL mapping, and Hi-C. Phenotype differences in F2 generation can be stated as "genetic factor(s)" "genetic component(s)", etc. please revise.

      Done

      Pg10 At the serine treatment, the authors concluded that "...suggesting that their detection threshold for serine is lower than for alanine." I believe that the 'threshold for serine is higher' according to the authors' data. Their threshold-related statement is correct in Pg21 "as SF olfactory concentration detection threshold are higher than CF,..." So the statement on page 10 is a just mistake, I think. Please revise.

      Done (mistake indeed)

      Pg11 After explaining Fig5, the statement "In sum, the responses of the different fish types to different concentrations of different amino acids were diverse and may reflect complex, case-bycase, behavioral outputs" does not convey any information. Please revise.

      OK. Done : “In sum, the different fish types show diverse responses to different concentrations of different amino acids.”

      For the personality analysis (Fig 7)

      The index value needs more explanation. I read the materials and methods three times but am still confused. From the equation, the index does not seem to exceed 1.0, unless the "before score" was a negative value, and the "after score" value was positive. I could not get why the authors set a score of 1.5 as the threshold for the cumulative score of these different behavior index values (= individual score). Please provide more description. Currently, I am skeptical about this index value in Fig 7.

      Done, in results and methods.

      Pg15 the discussion section

      Please discuss well the difference between the authors' finding (cavefish respond 10^-4M for position and surface fish responded 10^-4 for thig-Y; Fig 4AB), and those in Hinaux et al. 2016 (cavefish responded 10^-10M alanine but surface fish responded 10^-5M or higher). It seems that surface fish could respond to the low conc of alanine as cavefish do, which is opposed to the finding in Hinaux 2016.

      The increase in NbrtY at population level for surface fish with 10-4M alanine (~10-6M in box) was most probably due to only a few individuals. Contrarily to cavefish, all other parameters were unchanged in surface fish for this concentration. Moreover, at individual level, only 3.2% of surface fish had significant olfactory scores (to be compared to 81.3% for cavefish). Thus, we think that globally this result does not contradict our previous findings in Hinaux et al (2016), and solely represent the natural, unexplained variations inherent to the analysis of complex animal behaviors – even when we attempt to use the highest standards of controlled conditions.

      Of note, in the revised version, we have now included a full dose/response analysis for alanine concentration ranging from 10-2M to 10-10M, on cavefish. Alanine 10-5M has significant effects (now shown in Suppl Fig2 and indicated in text; a column has been added for 10-5M in Summary Table 1). Lower concentrations have milder effects (described in text) but confirm the very low detection threshold of cavefish for this amino acid.

      Pg19, "In sum, CF foraging strategy has evolved in response to the serious challenge of finding food in the dark"

      My point is the same as explained in the 'weakness' section above: how this behavior is effective in the cave life, if they conclude so? Please explain or revise this statement.

      The present manuscript reports on experiments performed in “artificial” and controlled laboratory conditions. We are fully aware that these conditions are probably distantly related to conditions encountered in the wild. Note that we had written in original version (page 20) “…for 6-week old juveniles in a rectangular box - but the link may be more elusive when considering a fish swimming in a natural, complex environment.” As the reviewer may know, we also perform field studies in a more ethological approach of animal behaviors, thus we may be able to discuss this point more accurately in the future.

      Pg20 "To our knowledge, this is the first time individual variations are taken into consideration in Astyanax behavioral studies."

      This is wrong. Please see Fernandes et al., 2022. (https://pubmed.ncbi.nlm.nih.gov/36575431/).

      OK. The sentence is wrong if taken in its absolute sense, i.e., considering inter-individual variations of a given parameter (e.g., number of neuromasts per individual or number of approaches to vibrating rod in Fernandez et al, 2022). In this same sense, Astyanax QTL studies on behaviors in the past also took into account variations among F2 individuals. Here, we wanted to stress that personality was taken into consideration. The sentence has been changed: “To our knowledge, this is the first time individual temperament is taken into consideration in Astyanax behavioral studies.”

      Figure 2B and others.

      The order of categories (R, R-TX, etc) should match in all columns (SF, F2, and CF). Currently, the category orders seem random or the larger ratio categories at the bottom, which is quite difficult to compare between SF, F2, and CF. Also, the writings in Fig 2A (times, Y-axis labels, etc), and the bargraphs' writings are quite difficult to read in Fig 2B, Fig 3B 4H, 5GN, 6EFG. Also, no need to show fish ID in Fig 2C in the current way, but identify the fish data points of the fish in Fig 2D (SF#40, CF#65, and F2#26) in Fig 2C if the authors want to show fish ID numbers in the boxplots. Fish ID numbers in other boxplot figures are recommended to be removed too.

      We have thought a lot on how to best represent the distributions of swimming patterns in graphs such as Fig 2B and others. The difficulty is due to the existence of many combinations (33 possibilities in total, see new Suppl Fig7), which are never the same in different plots/conditions because individual tested fish are different. We decided that that the best way was to represent, from bottom to top, the most used to the less used swimming patterns, and to use a color code that matches at best the different combinations. It was impossible to give the full color code on each figure, therefore it was simplified, and we believe that the results are well conveyed on the graphs. We would like to keep it as it is. To respond (partially) to the reviewer’s concern, we have now added a full color code description in a new Supplemental Figure 7 (associated to Methods).

      Size of lettering has been modified in all pattern graphs like Fig2A. Thanks for the suggestion, it reads better now.

      Finally, we would like to keep the fish ID numbers because this contributes to conveying the message of the paper, that individuality matters.

      Raw data files were not easy to read in Excel or LibreOffice. Please convert them into the csv format to support the rigor in the authors' conclusion.

      We do not understand this request. Our very large dataset must be analysed with R, not excel for stats or for plotting and pattern analysis. However, raw data files can be opened in excel with format conversion.

      Reviewer #2 (Recommendations For The Authors):

      I think most of the experimental procedures (with few exceptions, see below) are well-defined and nicely described, so the majority of my suggestions will be related to the visualization of the data. I think the authors have done a great job in presenting this complex dataset, but there are still some smaller tweaks that could be used to increase the legibility of the presented data.

      First and perhaps foremost, a better definition of the swimming pattern subsets is needed. I have no problem understanding the main behavioral types, but whereas the color codes for these suggest that there is continuous variance within each pattern, it is not clear (at least to me), what particular aspect(s) of the behaviors vary. Also, whereas the sidebars/legends suggest a continuum within these behaviors, the bar charts themselves clearly present binned data. I did not find a detailed description of how the binning was done. As this has been - according the Methods section - a manual process, more clarity about the details of the binning would be welcome. I would also suggest using binned color codes for the legends as well.

      Done, in Results and Methods. We hope it is now clear that there is no “continuum”, rather multiple combinations of discrete swimming patterns. The gradient aspect in color code in figures has been removed to avoid the idea of continuum. According to the chosen color code, WF is in red, R in blue, T in yellow and C in green. Then, combination are represented by colors in between, for example, R+WF is purple. We have now added a full color code description for the swimming patterns and their combinations in a new Supplemental Figure 7 (associated to Methods).

      Also, to better explain the definition of the swimming patterns and the graphical representation, it now reads (in Methods):

      “The determination of baseline swimming patterns and swimming patterns after odor injection was performed manually based on graphical representations such as in Figure 2A or Figure 3A. Four distinctive baseline behaviors clearly emerged: random swim (R; defined as haphazard swimming with no clear pattern, covering entirely or partly the surface of the arena), wall following (WF; defined as the fish continuously following along the 4 sides of the box and turning around it, in a clockwise or counterclockwise fashion), large or small circles (C; self explanatory), and thigmotactism (T, along the X- or the Y-axis of the box; defined as the fish swimming back and forth along one of the 4 sides of the box). On graphical representations of swimming pattern distributions, we used the following color code: R in blue, WF in red, C in green, T in yellow. Of note, many fish swam according to combination(s) of these four elementary swimming patterns (see descriptions in the legends of Supplemental figures, showing many examples). To fully represent the diversity and the combinations of swimming patterns used by individual fish, we used an additional color code derived from the “basic” color code described above and where, for example R+WF is purple. The complete combinatorial color code is shown in Suppl. Fig7.”

      It would be also easier to comprehend the stacked bar charts, presenting the particular swimming patterns in each population, if the order of different swimming patterns was the same for all the plots (e.g. the frequency of WF always presented at the bottom, R on the top, and C and T in the middle). This would bring consistency and would highlight existing differences between SF, CF, and F2s. Furthermore, such a change would also make it much easier to see (and compare) shifts in behaviors.

      We have thought a lot on how to best represent the distributions of swimming patterns in graphs such as Fig 2B and others. The difficulty is due to the existence of many combinations, which are never the same in different plots/conditions because the individual fish tested are different. We decided to keep it as it currently stands, because we think re-doing all the graphs and figures would not significantly improve the representation. In fact, we think that the differences between morphs (dominant blue in SF, dominant red in CF) and between conditions (bar charts next to each other) are easy to interpret at first glance in the vast majority of cases. Moreover, they are now completed by CA analyses (Suppl Figure 8).

      While the color coding of the timeline in the "3D" plots presented for individual animals is a nice feature, at the moment it is slightly confusing, as the authors use the same color palette as for the stacked bar charts, representing the proportionality of the particular swimming patterns. As the y-axis is already representing "time" here, the color coding is not even really necessary. If the authors would like to use a color scheme for aesthetic reasons, I would suggest using another palette, such as "grey" or "viridis".

      We would like to keep the graphical aspect of our figures as they are, for aesthetic reasons. To avoid confusion with stacked bar chart color code, we have added a sentence in Methods and in the legend of Figure 2, where the colors first appear:

      “The complete combinatorial color code is shown in Suppl. Figure 7. Of note, in all figures, the swimming pattern color code does not relate whatsoever with the time color code used in the 2D plus time representation of swimming tracks such as in Figure 2A”.

      I would also suggest changing the boxplots to violin-plots. Figure 7 clearly shows bimodality for F2 scores (something, as the authors themselves note, not entirely surprising given the probably poligenic nature of the trait), but looking at SF and CF scores I think there are also clear hints for non-normal distributions. If non-normal distribution of traits is the norm, violin-plots would capture the variance in the data in a more digestible way. (The existence of differently behaving cohorts within the population of both SF and CF forms would also help to highlight the large pre-existing variance, something that was probably exploited by natural selection as well, as mentioned briefly in the Discussion by the authors, too.)

      The bimodal distribution of scores shown by F2s in Figure 7B is indeed probably due to the polygenic nature of the trait. However, such distribution is rather the exception than the norm. Moreover, the boxplot representations we have used throughout figures include all the individual points, and outliers can be identified as they have the fish ID number next to them. This allows the reader to grasp the variance of the data. Again, redoing all graphs and figures would constitute a lot of work, for little gain in term of conveying the results. Therefore, we choose not to change the boxplot for violin plots.

      The summary data of individual scores in Table 1B shows some intriguing patterns, that warrant a bit further discussion, in my opinion. For example, we can see opposite trends in scores of SF and CF forms with increasing alanine concentration. Is there an easy explanation for this? Also, in the case of serine, the CF scores do not seem to respond in a dose-dependent manner and puzzlingly at 10^(-3)M serine concentration F2 scores are above those of both grandparental populations.

      That is true. However, we have no simple explanation for this. To begin responding to this question, we have now performed full dose/responses expts for alanine (concentrations tested from 10-2M to 10-10M on cavefish; confirm that CF are bona fide “alanine specialists”) and for serine (10-2M to 104M tested on both morphs; confirm that both morphs respond well to this amino acid). These complementary results are now included in text and figures (partially) and in the summary table 1.

      If anything is known about this, I would also welcome some discussion on how thigmotactic behavior, a marker of stress in SF, could have evolved to become the normal behavior of CF forms, with lower cortisol levels and, therefore lower anxiety.

      We actually think thigmotactism is a marker of stress in both morphs. See Pierre et al, JEB 2020, Figure S3A: in both SF and CF thigmotaxis behavior decreases after long habituation times. In our hands, the only difference between the two morphs is that surface fish (at 5 month of age) express stress by thigmotactism but also freezing and rapid erratic movements, while cavefish have a more restricted stress repertoire.

      This is why in the present paper we have carefully made the distinction between thigmotactism (= possible stress readout) and wall following (= exploratory behavior). Our finding that WF and large circles confers better olfactory response scores to cavefish is in strong support of the different nature of these two swimming patterns. Then, why is swimming along the 4 walls of a tank fundamentally different from swimming along one wall? The question is open, although the number of changes of direction is probably an important parameter: in WF the fish always swims forward in the same direction, while in T the fish constantly changes direction when reaching the corner of the tank – which is similar to erratic swim in stressed surface fish.

      Finally two smaller suggestions:

      • When referring to multiple panels on the same figure it would be better to format the reference as "Figure 4D-G" instead of "Figure 4DEFG";

      Done

      • On page 4, where the introduction reads as "although adults have a similar olfactory rosette with 2025 lamellae", in my opinion, it would be better to state that "while adults of the two forms have a similar olfactory rosette with 20-25 lamellae".

      Done

      Reviewer #3 (Recommendations For The Authors):

      Consider moving Figure 3 to be a supplement of Figure 4. This figure shows a water control and therefore best supplements the alanine experiment.

      We would like to keep this figure as a main figure: we consider it very important to establish the validity of our behavioral setup at the beginning of the ms, and to establish that in all the following figures we are recording bona fide olfactory responses.

      "sensory changes in mecano-sensory and gustatory systems " - mechano-sensory.

      Done

      Figure 2 legend: "(3) the right track is the 3D plus time (color-coded)" - shouldn't it be 2D plus time or 3D (x,y, time).

      True! Thanks for noting this, corrected.

      Figure 4 legend "E, Change in swimming patterns" should be H.

      Done

      "suggesting that their detection threshold for serine is lower than for alanine" - higher?

      Done

      In the behavioural plots, I assume that the "mean position" value represents the mean position along the X-axis of the chamber - this should be clarified and the axis label updated accordingly.

      That is correct and has been updated in Methods and Figures and legends.

      "speed, back and forth trips in X and Y, position and pattern changes (see Methods; Figure 7A)." - here it would be helpful to add an explanation like "to define an olfactory score for individual fish."

      This has been changed in Results and more detailed explanations on score calculations are now given in Methods.

      "possess enhanced mecanosensory lateral line" - mechanosensory.

      Done

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript "comparative transcriptomics reveal a novel tardigrade specific DNA binding protein induced in response to ionizing radiation" aims to provide insights into the mediators and mechanisms underlying tardigrade radiation tolerance. The authors start by assessing the effect of ionizing radiation (IR) on the tardigrade lab species, H. exemplaris, as well as the ability of this organism to recover from this stress - specifically, they look at DNA double and single-strand breaks. They go on to characterize the response of H. exemplaris and two other tardigrade species to IR at the transcriptomic level. Excitingly, the authors identify a novel gene/protein called TDR1 (tardigrade DNA damage response protein 1). They carefully assess the induction of expression/enrichment of this gene/protein using a combination of transcriptomics and biochemistry - even going so far as to use a translational inhibitor to confirm the de novo production of this protein. TDR1 binds DNA in vitro and co-localizes with DNA in tardigrades.

      Reverse genetics in tardigrades is difficult, thus the authors use a heterologous system (human cells) to express TDR1 in. They find that when transiently expressed TDR1 helps improve human cell resistance to IR.

      This work is a masterclass in integrative biology incorporating a holistic set of approaches spanning next-gen sequencing, organismal biology, biochemistry, and cell biology. I find very little to critique in their experimental approaches.

      Strengths:

      (1) Use of trans/interdisciplinary approaches ('omics, molecular biology, biochemistry, organismal biology)

      (2) Careful probing of TDR1 expression/enrichment

      (3) Identification of a completely novel protein seemingly involved in tardigrade radio-tolerance.

      (4) Use of multiple, diverse, tardigrade species of 'omics comparison.

      Weaknesses:

      (1) No reverse genetics in tardigrades - all insights into TDR1 function from heterologous cell culture system.

      (2) Weak discussion of Dsup's role in preventing DNA damage in light of DNA damage levels measured in this manuscript.

      (3) Missing sequence data which is essential for making a complete review of the work.

      Overall, I find this to be one of the more compelling papers on tardigrade stress-tolerance I have read. I believe there are points still that the authors should address, but I think the editor would do well to give the authors a chance to address these points as I find this manuscript highly insightful and novel.

      We thank the reviewer for his comments.

      We agree that it will be important to further investigate the role of Dsup in radio-tolerance. We briefly mentioned this point in the discussion (p14). Our findings show that tardigrades undergo DNA damage at levels roughly similar to radio-sensitive organisms and therefore support a major role for DNA repair in the maintenance of genome integrity after exposure to IR. Nevertheless, we believe that more precise quantification of DNA damage may still reveal a contribution of genome protection to radio-tolerance of tardigrades compared to radio-sensitive organisms. Dsup loss of function experiments in tardigrades would clearly be the best way to assess this possibility. In the absence of experiments directly addressing the function of Dsup, we prefer to refrain from drawing any firm conclusion on prevention of DNA damage by Dsup and thus to keep a more open position. In any case, as discussed in the text, we note that Dsup has only been reported in Hypsibioidea and other molecular players, such as TDR1, are likely involved in radio-tolerance in other tardigrade species.

      The sequence data can be accessed at the NCBI SRA database with Bioproject ID PRJNA997229.

      Reviewer #3 (Public Review):

      Summary:

      This paper describes transcriptomes from three tardigrade species with or without treatment with ionizing radiation (IR). The authors show that IR produces numerous single-strand and double-strand breaks as expected and that these are substantially repaired within 4-8 hours. Treatment with IR induces strong upregulation of transcripts from numerous DNA repair proteins including Dsup specific to the Hypsobioidea superfamily. Transcripts from the newly described protein TDR1 with homologs in both Hypsibioidea and Macrobiotoidea supefamilies are also strongly upregulated. They show that TDR1 transcription produces newly translated TDR1 protein, which can bind DNA and co-localizes with DNA in the nucleus. At higher concentrations, TDR appears to form aggregates with DNA, which might be relevant to a possible function in DNA damage repair. When introduced into human U2OS cells treated with bleomycin, TDR1 reduces the number of double-strand breaks as detected by gamma H2A spots. This paper will be of interest to the DNA repair field and to radiobiologists.

      Strengths:

      The paper is well-written and provides solid evidence of the upregulation of DNA repair enzymes after irradiation of tardigrades, as well as upregulation of the TRD1 protein. The reduction of gamma-H2A.X spots in U2OS cells after expression of TRD1 supports a role in DNA damage.

      Weaknesses:

      Genetic tools are still being developed in tardigrades, so there is no mutant phenotype to support a DNA repair function for TRD1, but this may be available soon.

      We thank the reviewer for his comments.

      Reviewer #4 (Public Review):

      The manuscript brings convincing results regarding genes involved in the radio-resistance of tardigrades. It is nicely written and the authors used different techniques to study these genes. There are sometimes problems with the structure of the manuscript but these could be easily solved. According to me, there are also some points which should be clarified in the result sections. The discussion section is clear but could be more detailed, although some results were actually discussed in the results section. I wish that the authors would go deeper in the comparison with other IR-resistant eucaryotes. Overall, this is a very nice study and of interest to researchers studying molecular mechanisms of ionizing radiation resistance.

      I have two small suggestions regarding the content of the study itself.

      (1) I think the study would benefit from the analyses of a gene tree (if feasible) in order to verify if TDR1 is indeed tardigrade-specific.

      (2) It would be appreciated to indicate the expression level of the different genes discussed in the study, using, for example, transcript per millions (TPMs).Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors

      We thank the reviewer for his comments.

      (1) To identify TDR1 homologous sequences in non-tardigrade species, we conducted extensive homology searches using multiple homology-based approaches (Blastp and Diamond against the NCBI non-redundant protein sequences (nr) database and hmmsearch against the EBI reference proteomes), which failed to identify TDR1 homologs in non-tardigrade ecdysozoans, thus strongly supporting that TDR1 is indeed tardigrade-specific.

      To be clearer in the manuscript, we now state the absence of hits for TDR1 in non-tardigrade ecdysozoans. Given the absence of homologs in non-tardigrade species, it is not possible to make a gene tree with non-tardigrade species.

      (2) To further document expression levels (which were already available from the Tables in the initial submission), we added MAplots (representing log2foldchange and logNormalized read counts) in the supplementary materials (Supp Figure 3 and Supp Figure 8). These additional figures clearly document that the DNA repair genes discussed in the main text and TDR1 are highly expressed genes after IR and after Bleomycin treatment.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      We thank the reviewer for his comments.

      (1) It has always seemed strange to me that tardigrades accumulate just as much DNA damage as any other organism when irradiated and yet their Dsup protein is supposed to shield and protect their DNA from damage. Perhaps this is an appropriate time for this idea to be reconsidered given the Dsup was NOT induced by IR in this study and the authors found that their animals incurred just as much damage as other biological systems. While Dsup is clearly not the focus of this manuscript, it is the protein most associated with tardigrade radio-tolerance and I would argue this new paper would call into question previous conclusions made about Dsup.

      We agree that it will be important to further investigate the role of Dsup in radio-tolerance. We briefly mentioned this point in the discussion (p14). Our findings show that tardigrades undergo DNA damage at levels roughly similar to radio-sensitive organisms and therefore support a major role for DNA repair in the maintenance of genome integrity after exposure to IR. Nevertheless, we believe that more precise quantification of DNA damage may still reveal a contribution of genome protection to radio-tolerance of tardigrades compared to radio-sensitive organisms. Dsup loss of function experiments in tardigrades would clearly be the best way to assess this possibility. In the absence of experiments directly addressing the function of Dsup, we prefer to refrain from drawing any firm conclusion on prevention of DNA damage by Dsup and thus to keep a more open position. In any case, as discussed in the text, we note that Dsup has only been reported in Hypsibioidea and other molecular players, such as TDR1, are likely involved in radio-tolerance in other tardigrade species.

      (2) While reverse genetics are difficult in tardigrades, they are not impossible, and RNAi can be used to good effect in these animals. In fact several authors on this manuscript have used RNAi to examine the necessity of genes in tardigrade stress tolerance in the past. Was an attempt made to RNAi TDR1? If not, why? With the large amount of work that the authors put into showing the sufficiency of TDR1 for increasing radiotolerance in cell culture, one would think looking at necessity in tardigrades would be of great interest. If RNAi was performed, what were the results? Even a negative result here is informative since a protein can be sufficient but not necessary for a function - if this were the case it would mean tardigrades have some redundant mechanism(s) for surviving radiation exposure beyond TDR1.

      We have attempted RNAi experiments targeting TDR1 or a mix of DNA repair genes (including XRCC5) and examined response to a bleomycin treatment of 2 weeks. Unfortunately, we could not distinguish any difference between uninjected animals and animals injected with TDR1 dsRNAs , or the mix of DNA repair genes dsRNAs. We concluded that, bleomycin treatment, that we used because it is much easier to perform than irradiation, was perhaps not the best way to assay a potential impact of RNAi on survival since it required long term treatment for several days during which the effect of RNAi may have waned. Another attempt was therefore made injecting with TDR1 or control GFP dsRNAs and exposing animals to a 2000Gy IR treatment. We noticed that the viability was lower after injection with GFP dsRNAs than with TDR1 dsRNAs (likely due to problems we had with the injection needle during injections). The next day, animals were irradiated and we observed after 24h that animals injected with GFP dsRNAs exhibited higher lethality rates than animals injected with TDR1 dsRNAs or uninjected animals. We found that this set of experiments were not conclusive. Our current experimental set up will make it difficult to distinguish lethality due to injections from lethality due to potentially decreased resistance to IR. In particular, many key controls are difficult to make (in particular, we could not confirm the efficiency of target gene knockdown, as it is very challenging given the low amount of biological material available and the poor expression of these genes without irradiation). From a practical point of view, performing these experiments is thus very challenging. We nevertheless agree that, in future work, further experimentation is needed to examine the impact of knock-down by RNAi of TDR1 or of other genes such as DNA repair genes or Dsup, in tardigrade DNA repair and survival after IR. Gene knock-out with CRISPR-Cas9 is a very promising alternative to RNAi given that studies in mutant lines will eliminate the confounding effect of lethality due to injections.

      (3) Regarding the U2OS experiments. I have several questions/points of clarification:

      a. Were survival/proliferation levels tested or only H2AX foci? I think that showing decreased H2AX foci (fewer double-stranded breaks) correlates with higher survival rates would be important.

      In the experiments reported in Figure 6, cells were transiently transfected with expression vectors and we did not examine the impact on survival rates. U2OS cells are resistant to high doses of Bleomycin and testing survival would require longer exposure at much higher concentrations (Buscemi et al, 2014, PMID: 25486478). In order to try and better address an impact on cell survival, we therefore generated populations of cells stably expressing the candidate tardigrade proteins fused to GFP. Despite trying different experiment conditions for treatment with Bleomycin, we could not detect a reproducibly significant benefit on cell survival for any of the tardigrade proteins tested, including RvDsup which was used as a positive control (since it was previously reported to improve cell survival in response to X-rays). One possibility is that the analysis should be performed in clones and not in populations of cells with heterogeneous expression levels of the tardigrade protein tested. For example, expression levels of the tardigrade protein needed to reduce the number of phospho-H2AX foci in response to DNA damage may interfere with cell division. We note that in the original Dsup paper, the benefit of RvDsup on cell survival was reported in specific transgenic clones. Experiments in different biological systems have also started to document toxic effects of RvDsup expression, illustrating the challenge, when performing experiments in heterologous systems, to achieve suitable expression levels of the tested protein. Trying to perform such a finer analysis, in our opinion, would go beyond the scope of our manuscript and will be best addressed in future studies. We are therefore careful in the text not to make any claim on the benefit of TDR1 expression on cell survival in response to Bleomycin in human cultured cells.

      (b) From the methods I am a bit confused as to how the images were treated/foci quantified. With the automatic segmentation and foci identification, is this done through the entire Z-series or a single layer? If the latter then I am not sure the results are meaningful, since we do not know how many foci might be present in other layers of the nuclei analyzed. If the former, please clarify this in the method since it is a very important consideration.

      We have acquired images throughout the entire Z-series and edited the text to make it more clear ; We now write: “ Z-stacks were maximum projected and analyzed with Zen Blue software (v2.3)...”. To limit the time needed for image analysis, we have generated an artificial image by projecting the entire Z-series into a single image and counted foci in that single maximum projection image. Although there are potential drawbacks, such as potentially only counting one focus when two foci are superposed along the Z axis, this approach overcomes the limitations of quantification from a single layer. We further ensured statistical robustness of the analysis by performing quantification from several independent fields of the labelled cells and several independent biological replicates (n>=3 as now specified in the legend of figure 6a).

      (c) RvDsup reduced levels of HXA1 foci in these experiments, however, HeDsup was not found to be enriched in the transcriptomic analysis performed here. Was there a reason HeDsup was not used in the cell-based experiments? One could argue that RvDsup is from a different species of tardigrade, but it is a bit concerning that an ortholog of a protein found NOT to be induced by radiation exposure seems to perform as well (if not better) than some versions of TDR1.

      RvDsup is the protein initially shown to increase survival of human HEK293 cells treated with X-rays and reduce the number of phospho-H2AX foci induced: it was therefore used as a positive control in our experiments. The sequence of HeDsup is only poorly similar to RvDsup (with 26% identity) and activity of HeDsup in cultured cells has not been reported before. We therefore believe that HeDsup is not well suited to provide a positive control for the experiments performed in our manuscript.

      (d) From the methods, it seems that cells were treated with Bleomycin and then immediately fixed without any sort of recovery time. In this short timeframe, the presence of TDR1 appears to be enough to deal with a substantial amount of double-stranded breaks (as evidenced by the reduced number of HXA1 foci). Does this make sense? How quickly could one expect DNA repair machinery to make significant progress in resolving damaged DNA? This response seems much faster than what was observed in tardigrades. Perhaps the authors to comment on this.

      Kinetic studies in human cells show extremely rapid repair of DNA double-strand breaks. Sensing of DNA double strand breaks by PARP proteins takes place within seconds after irradiation by IR (Pandey and Black, 2021, PMID: 33674152). NHEJ is then observed to take place by formation of 53BP1 foci within 15 minutes (Schultz et al, 2000, PMID: 11134068). The number of phospho-H2AX and 53BP1 foci peaks at 30 minutes and starts declining thereafter, showing that at a significant number of sites, DNA repair is proceeding very rapidly (by NHEJ). Although we are not aware of any studies of DNA repair kinetics in U2OS cells after addition of Bleomycin, DNA damage must be instantaneous and further take place during exposure to the drug in parallel to DNA repair, which would be expected to have similar kinetics than after irradiation with IR.

      In our experiments, several mechanisms may be involved in reducing the number of phospho-H2AX foci induced by Bleomycin, such as DNA protection (for Dsup expression) or stimulation of DNA repair (for RNF146 expression). For TDR1, the molecular mechanism involved remains to be determined. Given our finding that TDR1 can form aggregates with DNA, an additional possibility is that clustering of phospho-H2AX foci is induced.

      (4) I could not find the sequences of the TDR1 proteins studied here. I did find the cDNA sequence of HeTDR1 in the final supplementary file, but not the other TDR1 orthologs. In the place where it appeared the TDR1 sequences from other tardigrades should be there were very short segments of the HETDR1 sequence. All sequences of proteins used in this study should be easily accessible to the reader and reviewers as it is not possible to review this work without accessing the sequences.

      Our apologies for the inappropriate documentation of TDR1 sequences in the original manuscript. As requested, we have now included the TDR1 sequences in the Supplementary Table 4.

      (5) Likewise, the RNA sequence data is said to be deposited in NCBI under PRJNA997229, but I do not find this available on NCBI.

      The RNA sequence data was deposited in NCBI under the indicated reference before submission of the manuscript. The data has now been released and is fully available on NCBI.

      (6) A few typographical errors: e.g., Page 10 - sentence 4 has two periods ". ." or page 14 which has an open parenthesis that is not closed.

      These typos have been corrected in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      We thank the reviewer for his comments.

      In Figure 4C, what fraction of the 50 genes upregulated in all species and treatments are DNA repair genes? Is there any other notable commonality between these 50 genes? The bulk of upregulated genes are specific to a species and to treatment with IR or bleomycin. What fraction of DNA repair genes are specific to a species or treatment?

      The results in Figure 4C on the 50 putative orthologous genes upregulated in all species and treatments are further detailed in supp Figure 10. The legend to supp Figure 10 now provides the requested information: 14/50 genes are DNA repair genes and the other notable commonality is that 21/50 are “stress response genes”. We did not further breakdown the analysis to evaluate the fraction of DNA repair genes specific to a species or treatment. It will be interesting to gather data in more species to hed light on the evolutionary history of DNA repair gene regulation in response to IR.

      How does the suite of upregulated tardigrade DNA repair proteins after IR or bleomycin compare with DNA or repair proteins upregulated under similar treatments in human cells? Are they quantitatively or qualitatively different, or both?

      There is a great wealth of studies documenting genes differentially expressed in human cells in response to IR (e.g. Borras-Fresneda et al, 2016, PMID: 27245205; Rieger and Chu, 2004, PMID: 15356296; Budwoeth et al, 2012, PMID: 23144912 ; Rashi-Elkeles et al, 2011, PMID: 21795128; Jen and Cheung, 2003, PMID: 12915489...). Upregulation of DNA repair and cell cycle genes is commonly found. However, the number of DNA repair genes induced is always very limited and fold stimulation very modest compared to the massive upregulation observed in tardigrades.

      On page 14, please explain the acronym BER. Do the authors mean Base Excision Repair? Or something else?

      As assumed by the reviewer, the acronym BER stands for Base Excision Repair. The acronym has been removed from the main text and replaced by the full name.

      Reviewer #4 (Recommendations For The Authors):

      We thank the reviewer for his comments.

      Abstract:

      The abstract is fine. What was hard to grasp at the beginning is why TDR1 gene was named that way. It should be clearer that this study decided to further focus on that gene, one of the most overexpressed gene after IR, with an unknown function. Then maybe introduce that it was found to be unique to tardigrade and to interact with DNA. Therefore, it was named TDR1.

      Introduction:

      The introduction has been modified according to the suggestions of Reviewer#4 below. One of the suggested references, Nicolas et al 2023 from the Van Doninck lab, was published while our manuscript was under review and cannot be considered as background information for our study.

      1st paragraph:

      The study is on tardigrades, I found it strange that the first paragraph is on D. radiodurans. I think it is fine to mention what is known in bacteria and eucaryotes but we should already know what will be the main topic in the first paragraph of the introduction. Some details about D. radiodurans seem less important and distracting from the main topic (3D conformation).

      2nd paragraph:

      When mentioning radio-resistant eurcaryotes the authors do not mention the larvae of the anhydrobiotic insect Polypedilum vanderplanki. Stating that the mechanisms of resistance are poorly characterized should perhaps be nuanced. There are some recent studies on D. radiodurans (Ujaoney et al., 2017) the insect P. vanderplanki (Ryabova et al., 2017), tardigrades (Kamilari et al., 2019), and rotifers (Nicolas et al., 2023, Moris et al., 2023). Perhaps these papers are worth indicating that if mechanisms are not elucidated yet, recent studies suggest some actors involved in their resistance. Regarding the sentence stating that DNA repair rather than DNA protection plays a predominant role in the radio-resistance of bdelloid rotifers should also be nuanced. Indeed, many chaperones, antioxidants were mentioned to play a role in the radio-resistance of bdelloid rotifers (Moris et al., 2023). The authors mentioned the reference Hespeels et al., 2023 which is not found in their list of references, I am not sure which paper they refer to. The last sentence of the second paragraph does not mean much. I am not sure what the authors want to state with this. Perhaps they should specify if they mean that the function of many other genes overexpressed after IR remains unknown.

      Still, in the second paragraph, the authors focus on rotifers. They also do not mention what is known in the insect P. vanderplanki, which should be added. They still do not mention tardigrades. I think it is nice to first start with eucaryotes and then focus on tardigrades but as I mentioned before it would help to understand the aim of the paper if the first paragraph mentioned briefly the tardigrades and then could go into detail in the third paragraph.

      3rd paragraph:

      The sentence starting "with over 1400 species" best to remove from it "but they can differ in their resistance" and start the next sentence with that.

      4th paragraph:

      Very clear, we finally understand what is the focus of the manuscript.

      5th paragraph:

      Very clear. The authors should mention the names of the three studied species. Here, A. antarcticus is missing. The sentence "Further analyses in H. exemplaris... showed that TDR1 protein is present and upregulated". The authors should mention in which conditions the protein is upregulated. In that paragraph the authors mention phospho-H2AX: it might be good to introduce its functions before in the introduction (it is mentioned in the second sentence of the results: best to move it to the introduction).

      Results:

      There are a few sentences in this section which rather discuss the results than describe them. I think the manuscript might gain in quality if these interpretations of the results are moved into the discussion section. That would make the result section more concise and the discussion enriched.

      For instance, I suggest to move these sentences into the discussion:

      • "the finding of persistent DSBs in gonads at 72h.... likely explains...".

      • "suggesting that (i) DNA synthesis..."

      • " Phospho-H2AX....also suggested"

      • "Moreover, expression of TDR1-GFP..., supporting the potential role of TDR1 proteins..."

      • "our results suggest that RNF146 upreguation could contribute..."

      • "AMNP gene g12777 was shown to increase...Based on our results, it is possible that..."

      Interpretations mentioned here above were always introduced cautiously (-"suggesting that (i) DNA synthesis..." ; -" Phospho-H2AX....also suggested" ; -"Moreover, expression of TDR1-GFP..., supporting the potential role of TDR1 proteins..." ; -"our results suggest that RNF146 upreguation could contribute..." ). These cautious interpretations were usually important in deciding next steps of the work. We therefore believe it is important to mention these interpretations in the results section to clearly expose the milestones marking the progression of the study.

      For some results, they were directly discussed in the results section for the sake of concision (for example -"the finding of persistent DSBs in gonads at 72h.... likely explains..."; -"AMNP gene g12777 was shown to increase...Based on our results, it is possible that..." ) since, in our opinion, there was no need to mention them again in the main discussion.

      Some other parts could be good to be moved into the introduction:

      • "Previous studies have indicated that irradiation with IR increases expression of Rad51,..." none of the actors involved in DNA repair are mentioned in the introduction. Also, change resistant into resistance

      • "A. antarcticus ..., known for its resistant to high doses of UV....

      We have moved these parts to the introduction as recommended.

      It was in O. areolatus.... that the first demonstration..."

      This piece of information is somewhat anecdotical. We choose to keep it it here in the results section. This information on the radio-resistance of the species P. areolatus is only relevant at this specific step of the study because it encouraged us to consider that P. fairbanksi, which we isolated fortuitously, would be a good model species for studying radio-resistance of tardigrades.

      Here are some additional comments/suggestions on the result section:

      1st section

      • Remove the Gross et al., 2018 from the sentence "using confocal microscopy", it looks otherwise that these results are from their study, not yours.

      We have changed the text to make it clear that this is indeed a finding of Gross et al which was previously made in non-irradiated tardigrades. We replicated this finding, which showed that the protocol was working appropriately, and that we could use this control result for comparison with irradiated animals. We apologize for this confusion.

      The text now states: “Using confocal microscopy, we could detect DNA synthesis in replicating intestinal cells of control animals, as previously shown by (Gross et al. 2018).”

      2nd section

      • It is confusing what has been found induced by IR and/or by Bleomycin.

      • I think it might help if the authors first present what is induced after IR, then write if it is similar after Bleomycin. Especially since they start to do it in the first paragraph of that section. However, they only mention TDR1 in the second paragraph dedicated to Bleomycin treatment which is confusing as it is also overexpressed after IR. It is also not clear if RNF146 is also induced by Bleomycin.

      As recommended, the text presents first what is induced after IR and then what is induced by Bleomycin in the following paragraph. When reporting results with Bleomycin, we have provided a global assessment of what is common to both treatments in Supp Figure 3 and in Supp Table 3. In this figure, we also specifically highlighted several key genes of DNA repair induced by both treatments. These are also mentioned in the text (p8) to illustrate the point that many key DNA repair genes are common to both treatments. We have now added RNF146 to that list as recommended.

      • Regarding TDR1, it is not clear when introduced in the text as "promising candidate" why it is the case. It is clear in the figures but perhaps the authors should explain why they chose these genes for further analyses: high log2foldchange and expression level for instance. Regarding that last comment, it would be interesting to have an idea about the expression level of the genes with high log2foldchange. In Figures 2, 3, and 4 the pvalue and log2foldchange are represented but not the expression level (ideally Transcript per Millions). These values would give an additional idea on the importance of that gene. While looking at the figures, it is unclear why you did not further characterize other genes with high log2foldchange (some with even hints of their function): the mentioned RNF146, macroH2A1 (not even mentioned in the results), some genes unannotated in the figures with likely unknown functions,

      When selecting genes of interest, we did indeed take into account high expression levels. To more clearly document expression levels (which were already available from the Tables), we added MAplots (representing log2foldchange and logNormalized read counts) in the supplementary materials (Supp Figure 3 and Supp Figure 8).

      • It is also unclear at that stage why you named it "Tardigrade DNA damage response protein", as it is characterized as DNA repair/damage proteins by specific GO id or is it based on your downstream analyses, I think it might be worth to quickly mention the reason of that name.

      The name illustrates two points which were already characteristic at this point in time of the study i.e. 1) it is a tardigrade specific protein and 2) it is induced in response to DNA damage.

      • Regarding the BLAST analyses the protein was searched in C. elegans, D. melanogaster and H. sapiens. Why only these three species? What were the threshold evalues used for these analyses. As mentioned in the main comment, it would be worth searching species phylogenetically close to tardigrades to verify if it is well-tardigrade specific. Did you try to make a gene tree, after looking for a conserved domain (using hmmersearch)?

      As indicated in the methods section, the “Tardigrade-specific" annotation was determined by absence of hits after high-throughput alignment (with diamond using –ultrasensitive-option) on the NCBI nr database and absence of hits after blast search on C. elegans, D. melanogaster and H. sapiens proteomes as a complementary criterion (the latter blast search was primarily performed to enrich for functional annotations). Based on these criteria, TDR1 was annotated as “Tardigrade-specific”. As stated in the text, we also searched for TDR1 related sequences with 1) blastp (which is more sensitive than diamond) on the NCBI nr database and 2) HMMER on Reference Proteomes, and no hits were found among non-tardigrade ecdysozoans organisms, confirming TDR1 is specific to tardigrades. For Blast search for example, there were five hits in non-ecdysozoans organisms (two cephalochordates, one mollusc and two echinoderma). The blastp and HMMER results are now included in the revised supplementary material (Supp Table 5). These very few hits in species phylogenetically distant from tardigrades cannot be taken to support the existence of TDR1 genes outside tardigrades.

      To be clearer in the manuscript, we now state the absence of hits for TDR1 in non-tardigrade ecdysozoans. Given the absence of homologs in non-tardigrade species, it is not possible to make a gene tree with non-tardigrade species.

      • Page 9: "Proteins extracts from H. exemplaris... at 4h and 24h..." I think this sentence can be removed as this is mentioned again 2 paragraphs after: "...we conducted an unbiased proteome analysis... at 4h..." The log2foldchange threshold mentioned for the proteomic analyses is 0.3: why this threshold, was it chosen randomly?

      This is threshold is commonly used when considering log2foldchange with the technology used in our study, an isobaric multiplexed quantitative proteomic strategy which is known to compress ratios (Hogrebe et al. 2018).

      • Page 10:

      It would be good for more clarity to indicate at the beginning of the new section which species were investigated after IR or Bleomycin treatment.

      TDR1 homologs in the other tardigrade species were identified based on what? Best reciprocal hit?

      As indicated in the methods section of the manuscript, we searched for homologs in other tardigrade species by BLAST. A best reciprocal hit approach was not performed to try to determine which homologs might be orthologs. In particular, most TDR1 homologs identified are known from transcriptome assemblies and high-contiguity genome assemblies are needed to more confidently identify orthology (using synteny). The results of the BLASTP search are now provided as supplementary material (Supp Table 5).

      Preliminary experiments indicated that A. antarcticus and P. fairbanski survived exposure to 1000 Gy: is there a supplementary graph showing this?

      We have corrected the text to avoid any confusion. We have not rigorously examined the dose-dependent survival of P. fairbanksi in response to irradiation. Text was changed to: “We found by visual inspection of animals after IR that A. antarcticus and P. fairbanksi readily survived exposure to 1000 Gy.”

      • Page 11:

      "A set of 50 genes was upregulated in the three species": please be precise if only after IR.

      Done

      These genes cannot be the same as they are from different species. Did the author mean that they are coding for similar proteins? It might be good to give some more details even if the supplementary figure is mentioned.

      Obviously, these genes are putative orthologs. We have changed the text to:

      ” a set of 50 putative orthologous genes was upregulated in response to IR in all three species”

      Discussion:

      • General comment: the discussion is focused mainly on TDR1, it would be nice to also discuss the other results: DNA repair genes, RNF146.

      A whole paragraph is devoted to discussion of results on DNA repair genes and RNF146. We have extended that discussion following on the suggestion of the reviewer. In particular, we have explicitly mentioned the apparent paradox that XRCC5 and XRCC6, which are among the most highly stimulated genes at the mRNA level, only display modest upregulation at the protein level. Although further studies would be needed to examine the mechanisms involved, we propose that upregulation of RNF146, whose human homolog has been shown to drive degradation of PARylated XRCC5 and XRCC6 proteins in response to IR (Kang et al. 2011), may be responsible for higher degradation rates and may thus counterbalance increased levels of protein synthesis.

      • Pulse field electrophoresis would be nice to be performed. It has been used to assess DSBs in bdelloid rotifers, is it possible in tardigrades?

      As stated in the discussion, we believe that it would be challenging to perform pulse field electrophoresis in tardigrades. However, if possible, these experiments would certainly bring invaluable information to complement our analysis of DNA damage induced by IR.

      • "By comparative transcriptomics": please rephrase that sentence.

      • Proteins acting early in DNA repair: I am not sure I understand this sentence. Actors as ligases act not at the beginning of the repair pathways.

      Well noted. We have removed ligases from the list.

      • It is confusing that the authors mention NHEJ and double-strand break repair pathways as different pathways. There are 2 main pathways to repair DBSs: NHEJ and HR. It would be nice to add a reference to the sentence "PARP proteins act as sensors of DNA damage etc."

      A typo in the sentence gave rise to the misleading suggestion that NHEJ is not a double strand repair pathway. It has been corrected.

      A reference has been added for PARP proteins.

      • It would be nice if the authors can explain deeper their suggestion that degradation of DNA repair actors is essential for tardigrade IR resistance.

      We have expanded this part of the discussion and hope that it is clearer.

      “For XRCC5 and XRCC6, our studyestablished, by two independent methods, proteomics and Western blot analysies, that the stimulation at the protein level could be much more modest (6 and 20-fold at most (Supp Figure 6) than at the RNA level (420 and 90 fold respectively). This finding suggests that the abundance of DNA repair proteins does not simply increase massively to quantitatively match high numbers of DNA damages. Interestingly, in response to IR, the RNF146 ubiquitin ligase was also found to be strongly upregulated. RNF146 was previously shown to interact with PARylated XRCC5 and XRCC6 and to target them for degradation by the ubiquitin-proteasome system (Kang et al. 2011). To explain the lower fold stimulation of XRCC5 and XRCC6 at the protein levels, it is therefore tempting to speculate that, XRCC5 and XRCC6 protein levels (and perhaps that of other scaffolding complexes of DNA repair as well) are regulated by a dynamic balance of synthesis, promoted by gene overexpression, and degradation, made possible by RNF146 upregulation. Consistent with this hypothesis, we found that, similar to human RNF146 (Kang et al. 2011), He-RNF146 expression in human cells reduced the number of phospho-H2AX foci detected in response to Bleomycin (Figure 6).”

      • Page 15: Please add a reference for the sentence "Functional analysis of promotor sequences in transgenic tardigrades etc."

      The reference has been added to fix this omission.

      Material and Methods:

      Small comments:

      • 40 μm mesh: space missing

      • 100 μm mesh: space missing

      • (for Bleomycin)): parenthesis missing

      • remove "as indicated in the text"

      • The investigated time points after radiation need to be clearly stated in the method section. It is also unclear in the IR and Bleomycin section which tardigrades were treated with what. Not all were treated with Bleomycin.

      The small comments above have been fixed in the revised version of the manuscript.

      • Page 21: please precise the coverage of the RNA sequencing

      Statistics on mapping of RNAseq reads are now provided in Supp Table 10.

      • Page 22: Was any read trimming performed? Anything about the quality check of the reads?

      Trimming was conducted using trimmomatic (v0.39) and quality check using FastQC (v. ?) This information has been added to the Methods section.

      • Were the analyses confirmed by a second approach: for instance, EdgeR? Deseq2 and EdgeR do not always have the same results. For more robust analyses it is advised to use both.

      Differential transcriptome analyses were conducted with DESeq2 only. The robustness of our identification of differentially expressed genes in response to IR stems from performing comparative analyses in three different species, rather than from using two bioinformatics pipelines in a single species. We also note that benchmarking reported in the initial DEseq2 paper showed that identification of differentially expressed genes with large log fold changes (which, as reported in our manuscript, is characteristic of many DNA repair genes in response to IR) is very consistent between DEseq2 and EdgeR.

      Figures:

      • Figure 2: Legend vertical dotted line does not indicate log2foldchange value of 4 in all panels: it would be good to indicate for panels a and c as well.

      Figure 2has been improved following on the suggestions of the reviewer. Dotted lines now show log2foldchange value of 2 in all panels (ie Fold Change of 4 as mentioned in the main text).

      • Figure 2C: There are a few points with high log2foldchange which are not annotated: was it because nothing was found in the blast research? If yes, it would be good to indicate their functions. If not, it would be good to mention in the discussion that there are some genes with still unknown functions which might play an important role in the resistance of tardigrades to IR.

      The few points which are not annotated in figure 2c can now be found in Supp Table 3 Some of them have no hit in Blast search, some others such as BV898_09662 or BV898_07145 have hits on DNA repair genes as RBBP8/CtIP or XRCC6 respectively but are not annnotated as such by eggnog in KEGG pathway.

      • Figure 4C: Why not have included the response of P. fairbanski to bleomycin? I guess it was not done, but it is unclear in the results and methods sections.

      P.fairbanksi response to bleomycin wasn’t assessed as we didn’t get enough animals to run the study. The method section has been modified to precise this point.

    1. Author response:

      Reviewer #1 (Public Review):

      “… it remains unclear how ninein reduction causes bone defects …”

      We have added several control experiments that permit us to conclude that osteoblast numbers remain unaltered in the ninein-knockout embryos, and that bone abnormalities in vivo are caused by fusion defects of osteoclast precursor cells, whereas the proliferation, viability, or the adhesion of these precursor cells remain unaffected. For details, please see our comments below.

      “Discussion includes several unfounded potential mechanisms that really need to be thoroughly analyzed to gain a mechanistic understanding of the bone defects…”

      The new data back up our claim of fusion defects as a cause for limited osteoclast function. We have re-written parts of the discussion, to take into account our new findings.

      “Data showing normal osteoblasts in ninein-null mice was qualitative and requires further in-depth analysis and quantification of osteoblast …”

      To address this point, quantification of osteoblast numbers in tibiae at E16.5 and E18.5 was performed in control and ninein-deleted mouse embryos. The data are presented in the new Figures 3G and J.

      “In ninein knock-out mice, reduced TRAP+ve multinuclear cells were observed (Figure 6A and 6B). However, the magnitude of difference (about 5% decrease in multinucleated cells) is not consistent with the skeletal deformities reported in Figures 2-4, potentially suggesting the contribution of additional mechanisms.”

      We agree that the difference appears to be small at first glance, but nevertheless it remains statistically significant (a more than three-fold difference). We would like to recall that these observations (Fig. 6A) were performed at E14.5, i.e. at a stage when no ossification has occurred yet. We are looking at the first fusion events of myeloid precursors, likely derived from the fetal liver, that colonize the area of the first bone to form, and small differences in the number of functional osteoclasts may account for different timing of ossification. We think that differences in osteoclast fusion also account for the premature appearance of ossification centers for other skeletal elements, at later time points during development.

      “The fusion assay in Figure 6C needs further clarification. How was the syncytia perimeter defined to measure cell surface? The x-axis suggests that there are syncytia that contain up to 160 nuclei at day 3. How were the nuclei differentially stained and quantified?”

      We provide now additional information on the experimental approach in the revised manuscript, on pages 16-17 (Materials and Methods). For information: high numbers of syncytial nuclei in cultures were also observed by other groups in the past (Tiedemann et al., 2017, Front Cell Dev Biol. 5:54). In addition, we performed new experiments and quantified the fusion of osteoclast precursors by staining for actin and nuclei (new Figure 7C). This allowed us to quantify several additional parameters related to cell fusion (as initially performed in Raynaud-Messina et al., 2018, PNAS, 115:E2556-E2565).

      “Some text needs clarification. … What is the definition of "large syncytia"? Is the fusion index increase by day 5 diminished in later days? A graph of the syncytia size/ nuclei number or fusion index in the above-mentioned days will be helpful.”

      Information on the definition of “large syncytia” is now provided on page 10 (1st paragraph). We added further experimental details on osteoclast size for days 3, 4, and 5 in the supplemental Figures 7A and B. Most importantly, we performed additional assays of the fusion index by quantifying syncytial versus non-syncytial nuclei in a semi-automated manner. The new data are presented in Figure 7C, and the methods are explained on page 17. Together with our new analysis of cell proliferation, cell viability, and cell adhesion (Figure 7C, D, suppl. Fig. 7C-G), we provide now solid evidence for a fusion defect at the origin of impaired formation of ninein del/del osteoclasts.

      “Assessment of resorption was qualitative in Figure 6E and since the fusion deficiencies are transient, quantification of a corresponding resorption activity is needed. This should be described in the Materials and Methods section.”

      Quantifications of the bone resorption activities are now provided in the new Figure 7E, and a reference for the methods is provided on page 16.

      “Further experiments are needed to show connections between reduced centrosome clustering and reduced osteoclast formation as there is no evidence to date that suggest centrosome clustering is required for cell fusion. Multi-color live imaging and dynamic analysis can be used to determine if the ninein deficient cells show defective movement/migration/ fusion dynamics.”

      We agree that it is an important question, and studying potential links between centrosomal microtubule organization and osteoclast fusion is an ongoing project of the team. However, we estimate that in order to obtain conclusive results this will require 1-2 additional years of research activity, and we intend to present this as a separate project in the future. At the current point of our investigation, we think that providing a solid link between ninein, osteoclast fusion, and controlled timing of ossification, as shown in this manuscript, represents valuable progress to understand previously published bone abnormalities in patients with ninein mutations.

      “Quantification of the % of multinucleated osteoclasts that contain clustered and dispersed centrosomes is needed.”

      New quantification experiments on centrosome clustering are now provided in Figure 8H. These quantifications demonstrate that the potential of centrosome clustering is almost completely lost in osteoclasts without ninein.

      Reviewer #2 (Public Review):

      “Based on the decrease in the number of osteoclasts (Fig 5E, G, and also per coverslip after 2 days in culture), the authors suggest that the loss of ninein impacts osteoclast proliferation. First, proliferation can be directly quantified using Ki67 staining or EdU incorporation. Second, other interpretations are also plausible and can also be experimentally tested. These include less adhesion and attachment of the mutants to the coverslips, but perhaps more relevant in vivo is cell death of the ninein mutant osteoclasts. It has been established that the loss of centrosome function activates p53- dependent cell death and osteoclasts might be a vulnerable cell population. Quantifying p53 immunoreactivity and/or cell death in osteoclasts might help clarify the phenotype of osteoclast reduction.”

      In response to the reviewers, we have performed a series of new experiments that include

      1) A careful analysis of the fusion index, using a semi-automated approach, indicating significant differences in the fusion of precursor cells into osteoclasts (Fig. 7C).

      2) We have repeated the quantification of cell numbers prior to fusion and find variations between samples from different mice (also among mice of the same genotype), but we see on average comparable cell adhesion between samples from control mice and ninein-del/del mice. The data are provided in the supplemental Figure 7F. Moreover, we have quantified the expression of three main beta-integrins at the surface of control and ninein del/del osteoclast precursors (suppl. Fig. 7G), without detecting significant differences. Altogether, these data suggest the cell adhesion is comparable for the two genotypes.

      3) We have addressed the question of altered cell proliferation, by performing flow cytometry experiments and by quantifying the different cell cycle stages (Fig. 7D), and by quantifying Ki67 expression (suppl. Fig. 7C). We see no significant differences between samples from control and ninein-del/del mice.

      4) We have addressed the question of cell death, by performing Annexin V staining and flow cytometry (suppl. Fig. 7D), and by immunoblotting for cleaved caspase 3 and PARP (suppl. Fig. 7E). These experiments reveal no significant differences between the control and ninein del/del samples. Our data permit us to exclude cell death as a likely cause for the reduction of fused osteoclasts in the absence of ninein.

      Overall, the new experiments show that the defects in osteoclast formation from ninein-deleted samples are due to defects in cell fusion, but not in cell proliferation, cell adhesion or viability.

      Reviewer #3 (Public Review):

      “The authors put much emphasis on the centrosome in the Introduction session. However, it was not until Figure 7 did they show abnormal centriole clustering in osteoclasts. The introduction should include more background on osteoclast and osteoblast balance during skeletal development.”

      To address this, we included more background on the role of osteoclasts and osteoblasts in the revised introduction (page 4).

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank all of the reviewers for their helpful and the effort they made in reading and evaluating our manuscript. In response to them, we have made major changes to the text and figures and performed substantial new experiments. These new data and changes to the text and figures have substantially strengthened the manuscript. We believe that the manuscript is now very strong in both its impact and scope and we hope that reviewers will find it suitable for publication in eLife

      A point-by-point response to the reviewers' specific comments is provided below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this report, Yu et al ascribe potential tumor suppressive functions to the non-core regions of RAG1/2 recombinases. Using a well-established BCR-ABL oncogene-driven system, the authors model the development of B cell acute lymphoblastic leukemia in mice and found that RAG mutants lacking non-core regions show accelerated leukemogenesis. They further report that the loss of non-core regions of RAG1/2 increases genomic instability, possibly caused by increased off-target recombination of aberrant RAG-induced breaks. The authors conclude that the non-core regions of RAG1 in particular not only increase the fidelity of VDJ recombination, but may also influence the recombination "range" of off-target joints, and that in the absence of the non-core regions, mutant RAG1/2 (termed cRAGs) catalyze high levels of off-target recombination leading to the development of aggressive leukemia.

      Strengths:

      The authors used a genetically defined oncogene-driven model to study the effect of RAG non-core regions on leukemogenesis. The animal studies were well performed and generally included a good number of mice. Therefore, the finding that cRAG expression led to the development of more aggressive BCR-ABL+ leukemia compared to fRAG is solid.

      Weaknesses:

      In general, I find the mechanistic explanation offered by the authors to explain how the non-core regions of RAG1/2 suppress leukemogenesis to be less convincing. My main concern is that cRAG1 and cRAG2 are overexpressed relative to fRAG1/2. This raises the possibility that the observed increased aggressiveness of cRAG tumors compared to fRAG tumors could be solely due to cRAG1/2 overexpression, rather than any intrinsic differences in the activity of cRAG1/2 vs fRAG1/2; and indeed, the authors allude to this possibility in Fig S8, where it was shown that elevated expression of RAG (i.e. fRAG) correlated with decreased survival in pediatric ALL. Although it doesn't mean the authors' assertions are incorrect, this potential caveat should nevertheless be discussed.

      We appreciate the valuable suggestions from the reviewer. BCR-ABL1+ B-ALL is characterized by halted early B-lineage differentiation. In BCR-ABL1+ B cells, RAG recombinases are highly expressed, leading to the inactivation of genes that encode essential transcription factors for B-lineage differentiation. This results in cells being trapped within the precursor compartment, thereby elevating RAG gene expression. Our interpretation of the data suggests that, in BCR-ABL1+ B-ALL mouse models, the high expression of both cRAG and fRAG and the deletion of the non-core regions influence the precision of RAG targeting within the genome. This causes more genomic damage in cRAG tumors than in fRAG tumors, consequently leading to the observed increased aggressiveness of cRAG tumors compared to fRAG tumors. We discussed the issues on Page 12, lines 295-307 in the revised manuscript.

      Some of the conclusions drawn were not supported by the data.

      (1) I'm not sure that the authors can conclude based on μHC expression that there is a loss of pre-BCR checkpoint in cRAG tumors. In fact, Fig. 2B showed that the differences are not statistically significant overall, and more importantly, μHC expression should be detectable in small pre-B cells (CD43-). This is also corroborated by the authors' analysis of VDJ rearrangements, showing that it has occurred at the H chain locus in cRAG cells.

      We appreciate the insightful comment from the reviewer. Upon reevaluation of the data presented in Fig. 2B, we identified and rectified certain errors. The revised analysis now shows that the differences in μHC expression are statistically significant. This significant expression of μHC in fRAG leukemic cells implies that these cells may progress further in differentiation, potentially acquiring an immune phenotype. These modifications have been incorporated into the manuscript on page 7, lines 153-156 in the revised manuscript.

      (2) The authors found a high degree of polyclonal VDJ rearrangements in fRAG tumor cells but a much more limited oligoclonal VDJ repertoire in cRAG tumors. They concluded that this explains why cRAG tumors are more aggressive because BCR-ABL induced leukemia requires secondary oncogenic hits, resulting in the outgrowth of a few dominant clones (Page 19, lines 381-398). I'm not sure this is necessarily a causal relationship since we don't know if the oligoclonality of cRAG tumors is due to selection based on oncogenic potential or if it may actually reflect a more restricted usage of different VDJ gene segments during rearrangement.

      Thank you for your insightful comments and questions regarding the relationship between the oligoclonality of V(D)J rearrangements and the aggressiveness of cRAG tumors. You raise an important point regarding whether the observed oligoclonality is a result of selective pressure favoring clones with specific oncogenic potential, or if it reflects inherent limitations in V(D)J segment usage during rearrangement in cRAG models. In our study, we observed a marked difference in the V(D)J rearrangement patterns between fRAG and cRAG tumor cells, with cRAG tumors exhibiting a more limited, oligoclonal repertoire. This observation led us to speculate that the aggressive nature of cRAG tumors might be linked to a selective advantage conferred by specific V(D)J rearrangements that cooperate with the BCR-ABL1 oncogene to drive leukemogenesis. However, we acknowledge that our current data do not definitively establish a causal relationship between oligoclonality and tumor aggressiveness. The restricted V(D)J repertoire in cRAG tumors could indeed be due to a more constrained rearrangement process, possibly influenced by the altered expression or function of RAG1/2 in the absence of non-core regions. This could limit the diversity of V(D)J rearrangements, leading to the emergence of a few dominant clones not necessarily because they have greater oncogenic potential, but because of a narrowed field of rearrangement possibilities.

      To address this question more thoroughly, future studies could examine the functional consequences of specific V(D)J rearrangements found in dominant cRAG tumor clones. This could include assessing the oncogenic potential of these rearrangements in isolation and in cooperation with BCR-ABL1, as well as exploring the mechanistic basis for the restricted V(D)J repertoire. Such studies would provide deeper insight into the interplay between RAG-mediated recombination, clonal selection, and leukemogenesis in BCR-ABL1+ B-ALL.

      We appreciate your feedback on this matter and agree that further investigation is required to unravel the precise relationship between V(D)J rearrangement diversity and leukemic progression in cRAG models. We have revised our discussion to reflect these considerations and to clarify the speculative nature of our conclusions regarding the link between oligoclonality and tumor aggressiveness. We added more discussion on this issue on Page 7, lines 166-170 in the revised manuscript.

      (3) What constitutes a cancer gene can be highly context- and tissue-dependent. Given that there is no additional information on how any putative cancer gene was disrupted (e.g., truncation of regulatory or coding regions), it is not possible to infer whether increased off-target cRAG activity really directly contributed to the increased aggressiveness of leukemia.

      We totally agree you raised the issues. In Supplementary Table 3, we have presented data on off-target gene disruptions, specifically in introns, exons, downstream regions, promoters, 3' UTRs, and 5' UTRs. However, this dataset alone does not suffice to conclusively determine whether the increased off-target activity of cRAG directly influences the heightened aggressiveness of leukemia. To bridge this knowledge gap, our future research will extend to include both knockout and overexpression experiments targeting these off-target genes.

      (4) Fig. 6A, it seems that it is really the first four nucleotide (CACA) that determines fRAG binding and the first three (CAC) that determine cRAG binding, as opposed to five for fRAG and four for cRAG, as the author wrote (page 24, lines 493-497).

      We thank the reviewer for the insightful comment. In response, we have revised the text to accurately reflect the nucleotide sequences responsible for RAG binding and cleavage. Specifically, we now clarify that the first four nucleotides (CACA) are crucial for fRAG binding and cleavage, while the initial three nucleotides (CAC) are essential for cRAG binding and cleavage. These updates have been made on page 10, lines 242-245 of the revised manuscript.

      (5) Fig S3B, I don't really see why "significant variations in NHEJ" would necessarily equate "aberrant expression of DNA repair pathways in cRAG leukemic cells". This is purely speculative. Since it has been reported previously that alt-EJ/MMEJ can join off target RAG breaks, do the authors detect high levels of microhomology usage at break points in cRAG tumors?

      We appreciate the reviewer's comment. Currently, we have not observed microhomology usage at breakpoints in cRAG tumors. We plan to address this aspect in a future, more detailed study. Regarding the 'aberrant expression of DNA repair pathways in cRAG leukemic cells, we acknowledge that this is speculative. Therefore, we have carefully rephrased this to 'suggesting a potential aberrant expression of DNA repair pathways in cRAG leukemic cells.' This modification is reflected on page 12, lines 290-291 of the revised manuscript.

      (6) Fig. S7, CDKN2B inhibits CDK4/6 activation by cyclin D, but I don't think it has been shown to regulate CDK6 mRNA expression. The increase in CDK6 mRNA likely just reflects a more proliferative tumor but may have nothing to do with CDKN2B deletion in cRAG1 tumors.

      We fully concur with the reviewer's comment. We have deleted this inappropriate part from the text.

      Insufficient details in some figures. For instance, Fig. 1A, please include statistics in the plot showing a comparison of fRAG vs cRAG1, fRAG vs cRAG2, cRAG1 vs cRAG2. As of now, there's a single p-value (0.0425) stated in the main text and the legend but why is there only one p-value when fRAG is compared to cRAG1 or cRAG2? Similarly, the authors wrote "median survival days 11-26, 10-16, 11-21 days, P < 0.0023-0.0299, Fig. S2B." However, it is difficult for me to figure out what are the numbers referring to. For instance, is 11-26 referring to median survival of fRAG inoculated with three different concentrations of GFP+ leukemic cells or is 11-26 referring to median survival of fRAG, cRAG1, cRAG2 inoculated with 10^5 cells? It would be much clearer if the authors can provide the numbers for each pair-wise comparison, if not in the main text, then at least in the figure legend. In Fig. 5A-B, do the plots depict SVs in cRAG tumors or both cRAG and fRAG cells? Also in Fig. 5, why did 24 SVs give rise to 42 breakpoints, and not 48? Doesn't it take 2 breaks to accomplish rearrangement? In Fig. 6B-C, it is not clear how the recombination sizes were calculated. In the examples shown in Fig. 4, only cRAG1 tumors show intra-chromosomal joins (chr 12), while fRAG and cRAG2 tumors show exclusively inter-chromosomal joins.

      We appreciate the reviewer's feedback and have made the following revisions:

      (1) The text has been adjusted to rectify the previously mentioned error in the figure legends (page 1, lines 5-6).

      (2) We have clarified the intended message in the revised text (page 6, lines 129-130) and the figure legend (page 4-5, lines 107-113) for greater precision.

      (3) Figure 5A-B now presents an overview of all structural variants (SVs) identified in both cRAG and fRAG cells, offering a comprehensive comparison.

      (4) Among the analyzed SVs, 24 generated a total of 48 breakpoints, with 41 occurring within gene bodies and the remaining 7 in adjacent flanking sequences. This informs our exon-intron distribution profile analysis.

      (5) We have defined recombination sizes as ‘the DNA fragment size spanning the two breakpoints’ for clarity (page 10, lines 251-252).

      (6) All off-target recombinations identified in the genome-wide analyses of fRAG, cRAG1, and cRAG2 leukemic cells were determined to be intra-chromosomal joins, highlighting their specific nature within the genomic context.

      Insufficient details on certain reagents/methods. For instance, are the cRAG1/2 mice of the same genetic background as fRAG mice (C57BL/6 WT)? On Page 23, line 481, what is a cancer gene? How are they defined? In Fig. 3C, are the FACS plots gated on intact cells? Since apoptotic cells show high levels of gH2AX, I'm surprised that the fraction of gH2AX+ cells is so much lower in fRAG tumors compared to cRAG tumors. The in vitro VDJ assay shown in Fig 3B is not described in the Method section (although it is described in Fig S5b). Fig. 5A-B, do the plots depict SVs in cRAG tumors or both cRAG and fRAG cells?

      We are grateful for the reviewer's feedback and have incorporated their insights as follows:

      (1) We clarify that both cRAG1/2 and fRAG mice share the same genetic background, specifically the C57BL/6 WT strain, ensuring consistency across experimental models.

      (2) We define a 'cancer gene' as one harboring somatic mutations implicated in cancer. To support our analysis, we refer to the Catalogue Of Somatic Mutations In Cancer (COSMIC) at http://cancer.sanger.ac.uk/cosmic. COSMIC serves as the most extensive repository for understanding the role of somatic mutations in human cancers.

      (3) Upon thorough review of the raw data for γ-H2AX and the fluorescence-activated cell sorting (FACS) plots gated on intact cells, we propose that the observed discrepancies might stem from the limited sensitivity of the γ-H2AX flow cytometry detection method. This insight prompts our commitment to employing more efficient detection methodologies in forthcoming studies.

      (4) Detailed procedures for the in vitro V(D)J recombination assay have been included in the Methods section (page 15, lines 384-388) to enhance the manuscript's comprehensiveness and reproducibility.

      (5) The presented plots offer a comprehensive overview of structural variants (SVs) identified in both cRAG and fRAG cells, providing a holistic view of the genomic landscape across different models.

      Reviewer #3 (Public Review):

      Summary:

      In the manuscript, the authors summarized and introduced the correlation between the non-core regions of RAG1 and RAG2 in BCR-ABL1+acute B lymphoblastic leukemia and off-target recombination which has certain innovative and clinical significance.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      I would suggest that the authors tone down some of their conclusions, which are not necessarily supported by their own data. in addition, there are some minor mistakes in figure assembly/presentation. For instance, I believe that the axes labels in Fig. 1E were flipped. BrdU should be on y-axis and 7-AAD on the x-axis. Fig. 3B, the y-axis contains a typo, it should be "CD90.1..." and not "D90.1...". In Fig. 5C, the numbers seem to be flipped, with 93% corresponding to cRAG1 and 100% to cRAG2 (compare with the description on page 23, lines 474-475). Fig. 5C, y-axis, "hybrid" is a typo. Page 3, line 59: The abbreviation of RSS has already been described earlier (p4, line 53).

      We thank the reviewer for these suggestions. We carefully checked the raw data and corrected these mistakes in the revised manuscript.

      Page 3, line 63: "signal" segment (commonly referred to as signal ends), not "signaling" segment.

      We have changed “signaling segment” to “signal ends in the revised manuscript. (page 3, lines 54-55)

      Page 3, lines 64-65: VDJ recombination promotes the development of both B and T cells, and aberrant recombination can cause both B and T cell lymphomas.

      The statement about the role of V(D)J recombination in B and T cell development and its link to lymphomagenesis is grounded in a substantial body of research. Theoretical frameworks and empirical studies delineate how aberrations in the recombination process can lead to genomic instability, potentially triggering oncogenic events. This connection is extensively documented in immunology and oncology literature, illustrating the critical balance between necessary genetic rearrangements for immune diversity and the risk of malignancy when these processes are dysregulated (Thomson, et al.,2020; Mendes, et al.,2014; Onozawa and Aplan,2012).

      Page 4, line 72: "recombinant dispensability" is not a commonly used phrase. Do the authors mean the say that the non-core regions of RAG1/2 are not strictly required for VDJ recombination?

      We thank the reviewers for their insightful suggestion. We have revised the sentence to read, 'Although the non-core regions of RAG1/2 are not essential for V(D)J recombination, the evolutionary conservation of these regions suggests their potential significance in vivo, possibly affecting RAG activity and expression in both quantitative and qualitative manners.' This revision appears on page 3, lines 61-62, in the revised manuscript.

      Fig. 4. It would have been nice to show at least one more cRAG1 tumor circus plot.

      We appreciate the reviewer's comment and concur with the suggestion. In future sequencing experiments, we will consider including additional replicates. However, due to time and financial constraints, the current sequencing effort was limited to a maximum of three replicates.

      Reviewer #3 (Recommendations For The Authors):

      In the manuscript, the authors summarized and introduced the correlation between the non-core regions of RAG1 and RAG2 in BCR-ABL1+acute B lymphoblastic leukemia and off-target recombination which has certain innovative and clinical significance. The following issues need to be addressed by the authors.

      (1) Authors should check and review extensively for improvements to the use of English.

      We thank the reviewer for their comment. With assistance from a native English speaker, we have carefully revised the manuscript to enhance its readability.

      (2) Authors should revise the conclusion so that the above can be clearly reviewed and summarized.

      The conclusion has been partially revised in the revised manuscript.

      (3) The article should state that the experiment was independently repeated three times.

      The experiment was repeated under the same conditions three times and the information has been descripted in Statistics section on page 19, lines 473-475 in the revised manuscript.

      (4) The article will be more convincing if it uses references in the last 5 years.

      We are grateful to the reviewer for their guidance in enhancing our manuscript. We have incorporated additional references from the past five years in the revised version.

      (5) Additional experiments are suggested to elucidate the molecular mechanisms related to off-target recombination.

      We thank the reviewer for this suggestion. In future experiments, we plan to perform ChIP-seq analysis to investigate the relationship between chromatin accessibility and off-target effects, as well as to examine the impact of knocking out and overexpressing off-target genes on cancer development and progression.

      (6) It is suggested to further analyze the effect of the absence of non-core RAG region on the differentiation and development of peripheral B cells in mice by flow analysis and expression of B1 and B2.

      Thank you very much for highlighting this crucial issue. FACS analysis was performed, revealing that leukemia cells in peripheral B cells in mice did not express CD5. The data are presented as follows:

      Author response image 1.

      (7) Fig3A should have three biological replicates and the molecular weight should be labeled on the right side of the strip.

      Thank you for this suggestion. The experiment was independently repeated three times, and the molecular weights have been labeled on the right side of the bands in the revised version

      References:

      Mendes RD, Sarmento LM, Canté-Barrett K, Zuurbier L, Buijs-Gladdines JG, Póvoa V, Smits WK, Abecasis M, Yunes JA, Sonneveld E, Horstmann MA, Pieters R, Barata JT, Meijerink JP. 2014. PTEN microdeletions in T-cell acute lymphoblastic leukemia are caused by illegitimate RAG-mediated recombination events. BLOOD 124:567-578. doi:10.1182/blood-2014-03-562751

      Onozawa M, Aplan PD. 2012. Illegitimate V(D)J recombination involving nonantigen receptor loci in lymphoid malignancy. Genes Chromosomes Cancer 51:525-535. doi:10.1002/gcc.21942

      Thomson DW, Shahrin NH, Wang P, Wadham C, Shanmuganathan N, Scott HS, Dinger ME, Hughes TP, Schreiber AW, Branford S. 2020. Aberrant RAG-mediated recombination contributes to multiple structural rearrangements in lymphoid blast crisis of chronic myeloid leukemia. LEUKEMIA 34:2051-2063. doi:10.1038/s41375-020-0751-y

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank both Editors and reviewers for their valuable time, careful reading, and constructive comments. The comments have been highly valuable and useful for improving the quality of our study, as well as important in guiding the direction of our present and future research. In the revised manuscript, we have incorporated the necessary changes including additional experimental data as suggested; please find our detailed pointby-point response to the reviewer’s comments and the changes we have made in the manuscript as follows.

      Reviewer #1 (Public Review):

      In this work, the authors have explored how treating C. albicans fungal cells with EDTA affects their growth and virulence potential. They then explore the use of EDTA-treated yeast as a whole-cell vaccine in a mouse model of systemic infection. In general, the results of the paper are unsurprising. Treating yeast cells with EDTA affects their growth and the addition of metals rescues the phenotype. Because of the significant growth defects of the cells, they don't infect mice and you see reduced virulence. Injection with these cells effectively immunises the mice, in the same way that heatkilled yeast cells would. The data is fairly sound and mostly well-presented, and the paper is easy to follow. However, I feel the data is an incremental advance at best, and the immune analysis in the paper is very basic and descriptive.

      Strengths:

      Detailed analysis of EDTA-treated yeast cells

      Weaknesses:

      • Basic immune data with little advance in knowledge.

      • No comparison between their whole-cell vaccine and others tried in the field.

      • The data is largely unsurprising and not novel.

      Reply: Thank you so much for appreciating our effort to generate a whole cell anti-fungal vaccine by treating C. albicans cells with EDTA. Also, we appreciate your comment that the manuscript is sound and well-presented. However, we are afraid that the respected reviewer assumed the CAET cells as dead cells while they only divide relatively slower than the untreated cells. In the revised manuscript, we have presented additional evidence to show that CAET are live cells (Supp. Figs 2) and based on the new data, we expect a positive change in the reviewer’s opinion. Since CAET is a live strain, the data presented here is novel.

      Reviewer #2 (Public Review):

      Summary:

      Invasive fungal infections are very difficult to treat with limited drug options. With the increasing concern of drug resistance, developing an antifungal vaccine is a high priority. In this study, the authors studied the metal metabolism in Candida albicans by testing some chelators, including EDTA, to block the metal acquisition and metabolism by the fungus. Interestingly, they found EDTAtreated yeast cells grew poorly in vitro and non-pathogenic in vivo in a murine model. Mice immunized by EDTA-treated Candida (CAET) were protected against challenge with wild-type Candida cells. RNA-Seq analysis to survey the gene expression profile in response to EDTA treatment in vitro revealed upregulation of genes in metal homeostasis and downregulation of ribosome biogenesis. They also revealed an induction of both pro- and anti-inflammatory cytokines involved in Th1, Th2 and Th17 host immune response in response to CAET immunization. Overall, this is an interesting study with translational potential.

      Strengths:

      The main strength of the report is that the authors identified a potential whole-cell live vaccine strain that can provide full protection against candidiasis. Abundant data both on in vitro phenotype, gene expression profile, and host immune response have been presented.

      Weaknesses:

      A weakness is that the immune mechanism of CAET-mediated host protection remains unclear. The immune data is somewhat confusing. The authors only checked cytokines and chemokines in blood. The immune response in infected tissues and antibody response may be investigated.

      Reply: Thank you very much for appreciating our work and finding our strain to be a live whole-cell anti-fungal vaccine strain with translational potential. Since the current study focused on the identification and detailed characterizations of a non-genetically modified live-attenuated strain and determination of its safety and efficacy as a potential vaccine candidate in the preclinical model, we have excluded the possible immune mechanisms involving CAET. In a separate study, we are currently investigating both cellular and molecular mechanisms that provide protective immunity in CAET-vaccinated mice.

      Reviewer #3 (Public Review):

      Summary:

      The authors are trying to find a vaccine solution for invasive candidiasis.

      Strengths:

      The testing of the antifungal activity of EDTA on Candida is not new as many other papers have examined this effect. The novelty here is the use of this EDTA-treated strain as a vaccine to protect against a secondary challenge with wild-type Candida.

      Weaknesses:

      However, data presented in Figure 5 and Figure 6 are not convincing and need further experimental controls and analysis as the authors do not show a time-dependent effect on the CFU of their vaccine formulation. The methodology used is also an issue. As it stands, the impact is minor.

      Reply: Thank you so much for appreciating our efforts to develop a novel vaccine against fungal infections. We are extremely sorry for the lack of clarity in our writing related to Figs. 5 and 6, we have now modified the text and hope that the respected reviewer will find these convincing.

      Recommendations for the authors:

      Although the reviewers recognize the importance of the manuscript, they would like to see: 1) comparisons between their whole-cell vaccine and others tried in the field, 2) an investigation of the immune response in infected tissues and antibody response, and 3) more controls in Figures 5 and 6, and a time-dependent effect on the colony-forming units of their vaccine formulation. Please, address the questions and submit a revised version together with a rebuttal letter addressing point-by-point raised by each reviewer.

      Reply: (1) We are afraid that a comparative study of a live and heat-killed cell vaccines will mislead the information presented here. This is the only non-genetically modified antifungal vaccine candidate therefore a comparison with a dead strain at present is unwarranted. We have now added supporting data to confirm that, the survivability of C. albicans cells was unaffected at 6 hr of EDTA treatment (CAET, Supp. Fig. S2). (2) Since the current study focused on the identification and a detailed characterization of a non-genetically modified live attenuated strain and its safety and efficacy as a potential vaccine candidate in the preclinical model, we have excluded the possible immune mechanisms involving CAET. However, in a separate study, we are currently investigating both cellular and molecular mechanisms that provide protective immunity in CAET-vaccinated mice. (3) The results of Figs 5 and 6 were misinterpreted by the respected reviewer, please see the explanation below.

      Reviewer #1 (Recommendations For The Authors):

      Some specific comments/suggestions for the authors: (1) What was the viability of the yeast after EDTA treatment? Is the delayed growth response because many cells died and it takes a while for remaining viable cells to catch up? This is important to know because it may mean the dose given to mice is substantially different and that should be accounted for. Some PI staining of the cells after treatment would help.

      Reply: The growth curve assays (Fig. 1A and 1E) were initiated with O.D.600nm=0.5 of each cultures (~ 107 cells/mL) and the analyses suggested that the EDTA-treated C. albicans cells grew slower than the untreated cells. Fig. 1B and 1F further demonstrated that EDTA has minimal effect on the survival of the strain up to 8 hrs post-exposure. The proportion of the number of cells increased without and with metal chelators almost remained the same for this duration (0 – 8 hrs). Therefore, for subsequent analyses, 6 hr treatment was selected and such treated cells were considered as CAET, which were actively dividing live cells, albeit slower than untreated cells. As suggested and to strengthen our finding, a time dependent SYTOX Green and Propidium iodide staining of C. albicans cells without and with EDTA treatment was carried out and analysed by flow cytometry and microscopy, respectively. Both analyses revealed that the percentage of dead cells up to 12 hrs of without and with EDTA treatment remained the same. The new data has now been added in the revised version of the manuscript as Supplementary figure 2.

      Author response image 1.

      (2) In line with the above, what was the viability of the CAET cells after 3h in media? In the macrophage in vitro experiments, how do you know the reduced viability of the CAET cells is macrophage-specific? Did you run a control of CAET cells in media on their own to determine how CFU changed in macrophage-free conditions? Is the proliferation rates of untreated and CAET cells different? That would affect CFSE labelling and results. These experiments would work better with a GFP-expressing C. albicans strain, which is widely available. In the images in Figure 4c, it looks like there are more hyphae in CAET than untreated - was hyphal induction checked/measured? That's important to know because more hyphae usually means more clumping and this can affect CFU counts (giving the impression of less CFU when actually there is more). Because of all the issues above, I'm not fully convinced by the uptake/killing data.

      Reply: As explained in response 1, we used actively dividing WT and CAET cells, and equal number of these cells were CFSE labelled. As can be seen in Fig.4A, the rate of phagocytosis was the same in 1 hr of pre-culture, but in the subsequent time points the double-positive cells were reduced in the case of CAET cells and that is due to fungal killing by macrophages. Fungal cells were released from the macrophages by warm water treatment and CFU was determined. Fig. 4B suggested that at 1hr of co-culture, the CFU of both fungal cells (WT and CAET) were the same and the fungal clearance was observed at later time points. Thus, the reduced viability of CAET cells was macrophagespecific. EDTA has minimal effect on hyphal transition without and with the presence of serum and the new data has now been provided in the revised version (Supplementary Fig. 3).

      Author response image 2.

      (3) Pooled data should be shown for all animal experiments.

      Reply: Thank you for the suggestion, wherever it was meaningful pooled data for the animal experiments have now been provided.

      (4) Immune cell counts/analysis in the kidney and bone marrow would be hugely helpful and more relevant to understanding immune responses following immunisation/infection. I think a more interesting analysis for the authors to consider would be to immunise with heat-killed yeast vs EDTAtreated yeast and see if there is a qualitative difference or better protection, i.e. is the EDTA-treated whole-cell vaccine superior to the heat-killed version? That is a better question to address. As it stands, the data in the paper is not surprising.

      Reply: The studies on cellular and molecular mechanisms underlying protective immunity in CAETvaccinated mice are under progress in a separate study. This study mostly focused on the identification and detailed characterization of a non-genetically modified live-attenuated strain and its safety and efficacy as a potential vaccine candidate in a preclinical model. We are afraid that a comparison of a live cell (CAET) with a dead cell (heat-killed) will dilute the content of the manuscript and will not be meaningful. It is well accepted that the heat-killed C. albicans strain only provides partial short-lived protection to re-challenge (Refs-PMIDs: 12146759, and 9916097), thus, it does not warrant any comparison with CAET.

      Reviewer #2 (Recommendations For The Authors):

      Overall, this is a highly interesting study. I have the following specific comments for clarification.

      (1) In the introduction, the authors mentioned other anti-candida vaccines that are mostly effective against Candida infection by inducing neutralizing antibodies. However, in their CAET vaccine candidate, they only checked the cellular immunity in blood and found a balanced immune response (both pro- and anti-inflammatory responses are induced). How about the antibody production in these mice? It is a bit surprising that both untreated Candida infection and CAET Candida infection produced similar immune activation based on Figure 6, yet the CAET immunization provides protection. Some innate cell recruitment is higher in untreated Ca infection than the CAET infected mice (Figure 5F). The overall results on immune response characterization did not seem to explain why the CAET infection led to host protection while untreated Ca infection cannot. Characterizing infected tissue immune cell differentiation and cytokine production may offer some additional insights.

      Reply: We agree with you that in this manuscript we have not provided any mechanistic study on the protective immunity in CAET-vaccinated mice. This will be demonstrated in a subsequent study.

      (2) In Figure 5, some critical data seem to be missing in panels B and C. The CFU and histopathological images for CAET-treated mice challenged by Ca should also be shown there for comparison. Although they did show some data in Figure 5E and Figure S4, it is necessary to have that data in 5B and 5C from the same experiment. Figure S4 is a very busy figure and the images are quite small. It may be necessary to use arrows to point out what information authors want to emphasize.

      Reply: Fig 5 B and 5C showed the data for mice that succumbed to infection. Since the other mice (saline control groups, CAET infected, CAET vaccinated, and re-challenged groups) survived, they were not sacrificed; therefore, the CFU data was not collected. In addition, we wanted to see the longevity of these survived mice and after 1 year of observations, they were handed over to the animal house for clearance as per the institutional guidelines. However, Figure 5E and Figure S4 (now Fig. S6) included all the mice groups as they were sacrificed at various time points irrespective of humane end points. As suggested FigS6 has now been modified and fungal cells were denoted by yellow arrows.

      (3) EDTA-treated yeast cells showed poor growth but also had thicker cell walls with high chitin, glucan, and mannan levels. What leads to its clearance in vivo remains unclear, as usually, cells with thick cell wall structures and low metabolism are more resistant to stress, e.g., dormant cells. Macrophages were shown to contribute to CAET killing in a phagocytosis assay (Figure 4). Checking cytokines produced by macrophages during co-incubation may offer some insights. In all, additional discussion on what caused in vivo clearance would be helpful.

      Reply: Mechanistic study on the protective immune responses of CAET will be demonstrated in a separate study. As suggested, the discussion section now contains additional information emphasising the in vivo clearance of CAET cells in the 3rd paragraph of discussion section.

      (4) Long paragraphs in the discussion section could be divided into a bigger number of shorter paragraphs.

      Reply: Thank you for the suggestion, it has now been modified in the revised version (7 short paragraphs). To make it more comprehensive, some of the content has been removed.

      Reviewer #3 (Recommendations For The Authors):

      (1) It is unclear how many cells were treated with 250 micromolar of EDTA for 6 hours before preparing the inoculum. It seems that only the OD was measured before adding EDTA. This is not a very rigorous and reproducible method.

      Reply: In this manuscript, we have repeatedly used the same protocol to generate CAET cells for various analyses. The O.D.600nm= 0.5 culture is equivalent to 107 C. albicans cells per mL and this information has now been added in the revised manuscript.

      (2) Upon treatment with 250 micromolar of EDTA, cells were harvested and counted to prepare the inoculum (5x10e5) for injecting it in mice. However, it appears that CFU of the inoculum was not done. Based on data shown in Fig. 1B, 250 micromolar of EDTA does inhibit Candida cell replication. Thus, the authors may have counted dead cells and, thus, injected dead cells together with live cells for the CAET inoculum. Thus, mice receiving this inoculum may have been infected (and vaccinated) with a lower number of live Candida cells.

      Reply: Please see a similar response to reviewer #1. EDTA has minimal effect on the survival of C. albicans cells at 6 hr (also see supp. Fig. S2). We have already mentioned the CFU analysis of untreated and CAET cells in the methodology section related to inoculum preparation.

      (3) It is unclear if 6 hours of treatment with 250 micromolar of EDTA is enough to induce a block of Candida cell replication. In Figure 1B, the authors treated for 24h. The authors are encouraged to wash the cells after 6 hours of treatment and see if their cell division will recover upon removal of EDTA.

      Reply: Thank you for the suggestion. At 6 hr treatment, survivability of C. albicans cells was unaffected upon EDTA exposure. PI and SYTOX GREEN staining confirmed it (Supp. Fig. 2). Additionally, as suggested a rescue experiment was carried out by exogenous addition of divalent metals after 6 hr EDTA treatment and growth/CFU analyses were followed thereafter. A modified Fig. 1 A and B with new data has been provided.

      (4) The data shown in Figure 5A is extremely exciting. However, the number of mice in each group (n=6) is too low. Normally, 10 mice per group are used for virulence studies unless the authors provide a power analysis that 6 mice per group will be sufficient. Also, CFU data were only provided for Ca and saline-Ca groups (Fig. 5B) and not for the other groups. CFU data should be provided for all mice.

      Reply: Thank you for the suggestion and a statistical analysis of Fig. 5A was provided in the revised version. The rationale behind not including all mice groups in Fig. 5B is already explained in a response to reviewer #2.

      (5) It is unclear how the authors differentiate between CFU arising from CAET or from WT Candida.

      Reply: Since the Fig 5 E demonstrated that no CAET cells were detected in the kidney beyond 10 days of inoculation, in the re-challenged mice group (1CAET 2 Ca), the fungal cells those detected in the 3rd and 7th days were from the later inoculated cells (brown colour).

      (6) Figure 5E: it is unclear if a 1 saline-2 saline (Figure legend) or if 1 saline-2 Ca (text) group was included. If the latter, where are the CFU? It is impossible that 1 saline-2 Ca mice have no CFU.

      Reply: Thank you so much for pointing this out. The legend has now been modified that include 1saline-2saline and 1CAET-2Ca.

      (7) It seems that CFU is significantly present in the kidney in the 1 CAET - 2 Ca group at day 7 but not at day 3. How is this possible? This is an extremely invasive model of infection, and the authors are challenging intravenously 500,000 live Candida cells. If by the 3rd day, the authors detect no CFU, then how is it possible that CFUs are arising on day 7?

      Reply: We do detect fungal cells on 3rd day in 1CAET 2 WT mice group (~2000 cells), albeit much lower than in 7 days (~11200 cells). A Log10 scale graph has now been provided for better representation.

      (8) Most importantly, if the authors are not detecting CFU at day 3, then earlier time points (e.g. day 2, day 1, or even 12 hours post-challenge) must be analyzed. The authors should show that CFU from the organs is decreasing in a time-dependent manner. Also, all CFU should be shown as Log10.

      Reply: please see the previous response.

      (9) Fig. 6: because it is unclear if the mice were challenged with the same inoculum of live Candida cells (untreated and treated with EDTA), the different cytokine profiles between the two groups could be simply due to the different inoculum sizes and not to the effect of EDTA on Ca.

      Reply: please see the previous response as given also for Reviewer 1.

    1. Reviewer #3 (Public Review):

      Summary:

      The authors consider several known aspects of PV and SOM interneurons and tie them together into a coherent single-cell model that demonstrates how the aspects interact. These aspects are:<br /> (1) While SOM interneurons target distal parts of pyramidal cell dendrites, PV interneurons target perisomatic regions.<br /> (2) SOM interneurons are associated with beta rhythms, PV interneurons with gamma rhythms.<br /> (3) Clustered excitation on dendrites can trigger various forms of dendritic spikes independent of somatic spikes. The main finding is that SOM and PV interneurons are not simply associated with beta and gamma frequencies respectively, but that their ability to modulate the activity of a pyramidal cell "works best" at their assigned frequencies. For example, distally targeting SOM interneurons are ideally placed to precisely modulate dendritic Ca-spikes when their firing is modulated at beta frequencies or timed relative to excitatory inputs. Outside those activity regimes, not only is modulation weakened, but overall firing reduced.

      Strengths:

      I think the greatest strength is the model itself. While the various individual findings were largely known or strongly expected, the model provides a coherent and quantitative picture of how they come together and interact.

      The paper also powerfully demonstrates that an established view of "subtractive" vs. "divisive" inhibition may be too soma-focused and provide an incomplete picture in cells with dendritic nonlinearities giving rise to a separate, non-somatic all-or-nothing mechanism (Ca-spike).

      Weaknesses:

      While the authors overall did an admirable job of simulating the neuron in an in-vivo-like activity regime, I think it still provides an idealized picture that it optimized for the generation of the types of events the authors were interested in. That is not a problem per se - studying a mechanism under idealized conditions is a great advantage of simulation techniques - but this should be more clearly characterized. Specifics on this are very detailed and will follow in the comments to authors.

      What disappointed me a bit was the lack of a concise summary of what we learned beyond the fact that beta and gamma act differently on dendritic integration. The individual paragraphs of the discussion often are 80% summary of existing theories and only a single vague statement about how the results in this study relate. I think a summarizing schematic or similar would help immensely.

      Orthogonal to that, there were some points where the authors could have offered more depth on specific features. For example, the authors summarized that their "results suggest that the timescales of these rhythms align with the specialized impacts of SOM and PV interneurons on neuronal integration". Here they could go deeper and try to explain why SOM impact is specialized at slower time scales. (I think their results provide enough for a speculative outlook.)

      Beyond that, the authors invite the community to reappraise the role of gamma and beta in coding. This idea seems to be hindered by the fact that I cannot find a mention of a release of the model used in this work. The base pyramidal cell model is of course available from the original study, but it would be helpful for follow-up work to release the complete setup including excitatory and inhibitory synapses and their activation in the different simulation paradigms used. As well as code related to that.

      Impact:

      Individually, most results were at least qualitatively known or at least expected. However, demonstrating that beta-modulation of dendritic events and gamma-modulation of soma spiking can work together, at the same time and in the same model can lead to highly valuable follow-up work. For example, by studying how top-down excitation onto apical compartments and bottom-up excitation on basal compartments interacts with the various rhythms; or what the impact of silencing of SOM neurons by VIP interneuron activation entails. But this requires - again - public release of the model and the code controlling the simulation setups.

      Beyond that, the authors clearly demonstrated that a single compartment, i.e., only a soma-focused view is too simple, at least when beta is considered. Conversely, the authors were able to describe the impact of most things related to the apical dendrite on somatic spiking as "going through" the Ca-spike mechanism. Therefore, the setup may serve as the basis of constraining simplified two-compartment models in the future.

    1. Reviewer #1 (Public Review):

      This manuscript describes the pattern of relaxed selection observed at spermatogenesis genes in gorillas, presumably due to the low sperm competition associated with single-male polygyny. The analyses to detect patterns of selection are very thorough, as are the follow up analyses to characterize the function of these genes. Furthermore, the authors take the extra steps of in vivo determination of function with a Drosophila model.

      This is an excellent paper. It addresses the interesting phenomenon of relaxation of selection as a genomic signal of reproductive strategies using multiple computational approaches and follow-up analyses by pulling in data from GO, mouse knockouts, human infertility database, and even Drosophila RNAi experiments. I really appreciate the comprehensive and creative approach to analyze and explore the data. As far as I can tell, the analyses were performed soundly and statistics are appropriate. The Introduction and Discussion sections are thoughtful and well-written. I have no major criticisms of the manuscript.

      The main area that I would suggest for improvement is in the "Caveats and Limitations" section of the Discussion. Currently, the first paragraph of this section states the obvious that genetic manipulation of gorillas is not feasible. Beyond a reminder to the reader that this was a rationale for the Drosophila work, it isn't really adding much insight. The second paragraph is a brief discussion of the directionality of change. I think it comes across as overly simplistic, with a sort of "well, we can never know" feel. Obviously, there are plenty of researchers who do model change to infer direction and causation, and there are plenty of published papers attempting to do so with respect to mating systems in primates.

      I do not think the authors need to remove these paragraphs, but I do encourage them to turn the "Caveats and Limitations" section into something more meaningful by addressing limitations of the work that was actually done rather than limitations of hypothetical things that were not done. A few areas come to mind. First, the authors should discuss the effect of gene-tree vs species-tree inconsistencies in the analyses, which could affect the identification of gorilla-specific amino acid changes and/or the dN/dS estimates. Incomplete lineage sorting is very common in primates including the gorilla-chimp-human splits (Rivas-González et al. 2023). It would be nice to hear the authors' thoughts on how that might affect their analyses. Second, the dN/dS-based analyses assume the neutrality of synonymous substitutions. Of course, that assumption is not completely true; it might be true enough, and the authors should at least note it as a caveat. Third, and potentially related, is the consideration that these protein-coding genes may be functioning in other ways such as via antisense transcription. The genes under relaxed selection may be on their way to becoming pseudogenes and evolving as such at the sequence level, but many pseudogenes continue to be transcribed sense or anti-sense in a regulatory purpose. I don't think there is a way to incorporate this into the authors' analyses but it would be nice to see it acknowledged as a caveat or limitation.

    1. As we look at the above examples we can see examples of intersectionality [q13], which means that not only are people treated differently based on their identities (e.g., race, gender, class, disability, weight, height, etc.), but combinations of those identities can compound unfair treatment in complicated ways. For example, you can test a resume filter and find that it isn’t biased against Black people, and it isn’t biased against women. But it might turn out that it is still biased against Black women. This could happen because the filter “fixed” the gender and race bias by over-selecting white women and Black men while under-selecting Black women.

      I think intersectionality is essential because it helps us understand how different identity traits can intersect to affect an individual's experience. For example, an Asian pansexual male may face discrimination based on both race and sexual orientation, and the impact of such compounded discrimination may be much more complex than discrimination based on a single identity. This situation shows that we cannot consider only one factor when addressing discrimination and inequality; we must consider how multiple identity factors interact and may lead to more complex injustices. Such insights are critical to developing more effective equality policies and interventions.

    1. In what ways have you found social media bad for your mental health and good for your mental health?

      When it comes to mental health issues in our generation, I can say with almost 100% confidence, that social media is the main culprit. With technology integrated in every aspect of our lives nowadays, we can blame social media for being so easily accessible and influential on our generation and generations to come. From my experiences, many social media sites encourage people to really only showcase the good aspects of our lives, which leaves a very one-sided angle of everyone. This is very harmful as many younger people (and probably older people too) tend to compare themselves to the influencers and accounts, leading to accelerating conditions like depression, anxiety, and jealousy. Additionally, with all these way to edit photos, these posts may not even be real but they sure seem real, leading to negative self body image thoughts and unhealthy diet and workout plans. I think in some ways social media can be good for your mental health, whether that is watching funny videos to lighten the mood, or looking into what other people with similar hobbies do to mimic and get inspiration. However, it is hard to filter out the harmful content with the content with want and enjoy.

    1. C. L. Lynch. “Autism is a Spectrum” Doesn’t Mean What You Think. NeuroClastic, May 2019. URL: https://neuroclastic.com/its-a-spectrum-doesnt-mean-what-you-think/ (visited on 2023-12-08).

      The article addresses misconceptions about autism being a gradient rather than a true spectrum. This article also use vivid analogies like the visible light spectrum to illustrate its point. It highlights the diversity within autism by explaining how different individuals can have unique combinations of traits, emphasizing that autism is not just one condition but a complex collection of related neurological conditions. The author thinks we shouldn't compare different types of autism as "more" or "less" severe, and emphasizes the importance of understanding individual strengths and challenges rather than making assumptions based on outward behaviors.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to reviewer comments

      • *

      We extend our gratitude to the reviewers for their time and valuable feedback on our manuscript. We especially appreciate the insightful suggestions that have significantly contributed to refining our work and elucidating our findings. With the revisions made to the text and the inclusion of new experimental data, we believe our manuscript now effectively addresses all reviewer comments. We eagerly await your evaluation of our revised submission.

      Small ARF-like GTPases play fundamental roles in dynamic signaling processes linked with vesicular trafficking in eukaryotes. Despite of their evolutionary conservation, there is little known about the ARF-like GTPase functions in plants. Our manuscript reports the biochemical and cell biological characterization of the small ARF-like GTPase TTN5 from the model plant Arabidopsis thaliana*. Fundamental investigations like ours are mostly lacking for ARF and ARL GTPases in Arabidopsis. *

      We employed fluorescence-based enzymatic assays suited to uncover different types of the very rapid GTPase activities for TTN5. The experimental findings are now illustrated in a more comprehensive modified Figure 2 and in the form of a summary of the GTPase activities for TTN5 and its mutant variants in the NEW Figure 7A in the Discussion part. Taken together, we found that TTN5 is a non-classical GTPase based on its enzymatic kinetics. The reviewers appreciated these findings and highlighted them as being „impressive in vitro biochemical characterization" and "major conceptual advance". Since such experiments are "uncommon" for being conducted with plant GTPases, reviewers regarded this analysis as "useful addition to the plant community in general". The significance of these findings is given by the circumstance that „the ARF-like proteins are poorly addressed in Arabidopsis while they could reveal completely different function than the canonical known ARF proteins". Reviewers saw here clearly a "strength" of the manuscript.

      With regard to the cell biological investigation and initial assessment of cell physiological roles of TTN5, we now provide requested additional evidence. First of all, we provide NEW data on the localization of TTN5 by immunolocalization using a complementing HA3-TTN5 construct, supporting our initial suggestions that TTN5 may be associated with vesicles and processes of the endomembrane system. The previous preprint version had left the reviewers „less convinced" of cell biological data due to the lack of complementation of our YFP-TTN5 construct, lack of Western blot data and the low resolution of microscopic images. We fully agree that these points were of concern and needed to be addressed. We have therefore intensively worked on these „weaknesses" and present now a more detailed whole-mount immunostaining series with the complementing HA3-TTN5 transgenic line (NEW Figure 4, NEW Figure 3P), Western blot data (NEW Supplementary Figures S7C and D), and we will provide all original images upon publication of our manuscript at BioImage Archives which will provide the high quality for re-analysis. BioImage Archives is an online storage for biological image data associated with a peer-reviewed publication. This way, readers will be able to inspect each image in detail. The immunolocalization data are of particular importance as they indicate that HA3-TTN5 can be associated with punctate vesicle structures and BFA bodies as seen with YFP studies of YFP-TTN5 seedlings. We have re-phrased very carefully and emphasized those localization patterns which are backed up by immunostaining and YFP fluorescence detection of YFP-TTN5 signals. To improve the comprehension, the findings are summarized in a schematic overview in NEW Figure 7B of the Discussion. We have also addressed all other comments related to the cell biological experiments to "provide the substantial improvement" that had been requested. We emphasize that we found two cell physiological phenotypes for the TTN5T30N mutant. YFP-TTN5T30N confers phenotypes, which are differing mobility of the fluorescent vesicles in the epidermis of hypocotyls (see Video material and NEW Supplementary Video Material S1M-O), and a root growth phenotype of transgenic HA3-TTN5T30N seedlings (NEW Figure 3O). We explain the cell physiological phenotypes in relation to enzymatic GTPase data. These findings convince us of the validity of the YFP-TTN5 analysis indicative of TTN5 localization.

      *We are deeply thankful to the reviewers for judging our manuscript as "generally well written", "important" and "of interest to a wide range of plant scientists" and "for scientists working in the trafficking field" as it "holds significance" and will form the basis for future functional studies of TTN5. *

      We prepared very carefully our revised manuscript in which we address all reviewer comments one by one. Please find our revision and our detailed rebuttal to all reviewer comments below. Changes in the revised version are highlighted by yellow and green color. In the "revised version with highlighted changes".

      With these adjustments, we hope that our peer-reviewed study will receive a positive response.

      We are looking forward to your evaluation of our revised manuscript and thank you in advance,

      Sincerely

      Petra Bauer and Inga Mohr on behalf of all authors

      *

      • *

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)): __

      The manuscript from Mohr and collaborators reports the characterization of an ARF-like GTPase of Arabidopsis. Small GTPases of the ARF family play crucial role in intracellular trafficking and plant physiology. The ARF-like proteins are poorly addressed in Arabidopsis while they could reveal completely different function than the canonical known ARF proteins. Thus, the aim of the study is important and could be of interest to a wide range of plant scientists. I am impressed by the biochemical characterization of the TTN5 protein and its mutated versions, this is clearly a very nice point of the paper and allows for proper interpretations of the other results. However, I was much less convinced on the cell biology part of this manuscript and aside from the subcellular localization of the TTN5 I think the paper would benefit from a more functional angle. Below are my comments to improve the manuscript:

      1- In the different pictures and movies, TTN5 is quite clearly appearing as a typical ER-like pattern. The pattern of localization further extends to dotty-like structures and structures labeled only at the periphery of the structure, with a depletion of fluorescence inside the structure. These observations raise several points. First, the ER pattern is never mentioned in the manuscript while I think it can be clearly observed. Given that the YFP-TTN5 construct is not functional (the mutant phenotype is not rescued) the ER-localization could be due to the retention at the ER due to quality control. The HA-TTN5 construct is functional but to me its localization shows a quite different pattern from the YFP version, I do not see the ER for example or the periphery-labeled structures. In this case, it will be a crucial point to perform co-localization experiments between HA-TTN5 and organelles markers to confirm that the functional TTN5 construct is labeling the Golgi and MVBs, as does the non-functional one. I am also quite sure that a co-localization between YFP-TTN5 and HA-TTN5 will not completely match... The ER is contacting so many organelles that the localization of YFP-TTN5 might not reflects the real location of the protein.

      __Our response: __

      At first, we like to state that specific detection of intracellular localization of plant proteins in plant cells is generally technically very difficult, when the protein abundance is not overly high. In this revised version, we extended immunostaining analysis to different membrane compartments, including now immunostaining of complementing HA3-TTN5 in the absence and presence of BFA, along with immunodetection of ARF1 and FM4-64 labeling in roots (NEW Figure 3P, NEW Figure 4A, B). In the revised version, we focus the analysis and conclusions on the fluorescence patterns that overlap between YFP-TTN5 detection and HA3-TTN5 immunodetection. With this, we can be most confident about subcellular TTN5 localization. Please find this NEW text in the Result section (starting Line 323):

      „For a more detailed investigation of HA3-TTN5 subcellular localization, we then performed co-immunofluorescence staining with an Alexa 488-labeled antibody recognizing the Golgi and TGN marker ARF1, while detecting HA3-TTN5 with an Alexa 555-labeled antibody (Robinson et al. 2011, Singh et al. 2018) (Figure 4A). ARF1-Alexa 488 staining was clearly visible in punctate structures representing presumably Golgi stacks (Figure 4A, Alexa 488), as previously reported (Singh et al. 2018). Similar structures were obtained for HA3-TTN5-Alexa 555 staining (Figure 4A, Alexa 555). But surprisingly, colocalization analysis demonstrated that the HA3-TTN5-labeled structures were mostly not colocalizing and thus distinct from the ARF1-labeled ones (Figure 4A). Yet the HA3-TTN5- and ARF1-labeled structures were in close proximity to each other (Figure 4A). We hypothesized that the HA3-TTN5 structures can be connected to intracellular trafficking steps. To test this, we performed brefeldin A (BFA) treatment, a commonly used tool in cell biology for preventing dynamic membrane trafficking events and vesicle transport involving the Golgi. BFA is a fungal macrocyclic lactone that leads to a loss of cis-cisternae and accumulation of Golgi stacks, known as BFA-induced compartments, up to the fusion of the Golgi with the ER (Ritzenthaler et al. 2002, Wang et al. 2016). For a better identification of BFA bodies, we additionally used the dye FM4-64, which can emit fluorescence in a lipophilic membrane environment. FM4-64 marks the plasma membrane in the first minutes following application to the cell, then may be endocytosed and in the presence of BFA become accumulated in BFA bodies (Bolte et al. 2004). We observed BFA bodies positive for both, HA3-TTN5-Alexa 488 and FM4-64 signals (Figure 4B). Similar patterns were observed for YFP-TTN5-derived signals in YFP-TTN5-expressing roots (Figure 4C). Hence, HA3-TTN5 and YFP-TTN5 can be present in similar subcellular membrane compartments."

      We did not find evidence that HA3-TTN5 can localize at the ER using whole-mount immunostaining (NEW Figure 3P; NEW Figure 4A, B). Hence, we are careful with describing that fluorescence at the ER, as seen in the YFP-TTN5 line (Figure 3M, N) reflects TTN5 localization. We therefore do not focus the text on the ER pattern in the Result section (starting Line 295):

      „Additionally, YFP signals were also detected in a net-like pattern typical for ER localization (Figure 3M, N). (...) We also found multiple YFP bands in α-GFP Western blot analysis using YFP-TTN5 Arabidopsis seedlings. Besides the expected and strong 48 kDa YFP-TTN5 band, we observed three weak bands ranging between 26 to 35 kDa (Supplementary Figure S7C). We cannot explain the presence of these small protein bands. They might correspond to free YFP, to proteolytic products or potentially to proteins produced from aberrant transcripts with perhaps alternative translation start or stop sites. On the other side, a triple hemagglutinin-tagged HA3-TTN5 driven by the 35S promoter did complement the embryo-lethal phenotype of ttn5-1 (Supplementary Figure S7D, E). α-HA Western blot control performed with plant material from HA3-TTN5 seedlings showed a single band at the correct size, but no band that was 13 to 18 kDa smaller (Supplementary Figure S7D). (...) We did not observe any staining in nuclei or ER when performing HA3-TTN5 immunostaining (Figure 3P; Figure 4A, B), as was the case for fluorescence signals in YFP-TTN5-expressing cells. Presumably, this can indicate that either the nuclear and ER signals seen with YFP-TTN5 correspond to the smaller proteins detected, as described above, or that immunostaining was not suited to detect them. Hence, we focused interpretation on patterns of localization overlapping between the fluorescence staining with YFP-labeled TTN5 and with HA3-TTN5 immunostaining, such as the particular signal patterns in the specific punctate membrane structures."

      *And we discuss in the Discussion section (starting Line 552): *

      „We based the TTN5 localization data on tagging approaches with two different detection methods to enhance reliability of specific protein detection. Even though YFP-TTN5 did not complement the embryo-lethality of a ttn5 loss of function mutant, we made several observations that suggest YFP-TTN5 signals to be meaningful at various membrane sites. We do not know why YFP-TTN5 does not complement. There could be differences in TTN5 levels and interactions in some cell types, which were hindering specifically YFP-TTN5 but not HA3-TTN5. (...) Though constitutively driven, the YFP-TTN5 expression may be delayed or insufficient at the early embryonic stages resulting in the lack of embryo-lethal complementation. On the other hand, the very fast nucleotide exchange activity may be hindered by the presence of a large YFP-tag in comparison with the small HA3-tag which is able to rescue the embryo-lethality. The lack of complementation represents a challenge for the localization of small GTPases with rapid nucleotide exchange in plants. Despite of these limitations, we made relevant observations in our data that made us believe that YFP signals in YFP-TTN5-expressing cells at membrane sites can be meaningful."

      2- What are the structures with TTN5 fluorescence depleted at the center that appear in control conditions? They look different from the Golgi labeled by Man1 but similar to MVBs upon wortmannin treatment, except that in control conditions MVBs never appear like this. Are they related to any kind of vacuolar structures that would be involved in quality control-induced degradation of non-functional proteins?

      Our response:

      The reviewer certainly refers to fluorescence images from N. benthamiana leaf epidermal cells where different circularly shaped structures are visible. In these respective structures, the fluorescent circles are depleted from fluorescence in the center, e.g. in Figure 5C, YFP- fluorescent signals in TTN5T30N transformed leaf discs. We suspect that these structures can be of vacuolar origin as described for similar fluorescent rings in Tichá et al., 2020 for ANNI-GFP (reference in manuscript). The reviewer certainly does not refer to swollen MVBs that are seen following wortmannin treatment, as in Figure 5N-P, which look similar in their shape but are larger in size. Please note that we always included the control conditions, namely the images recorded before the wortmannin treatment, so that we were able to investigate the changes induced by wortmannin. Hence, we can clearly say that the structures with depleted fluorescence in the center as in Figure 5C are not wortmannin-induced swollen MVBs.To make these points clear to the reader, we added an explanation into the text (Line 385-388):

      „We also observed YFP fluorescence signals in the form of circularly shaped ring structures with a fluorescence-depleted center. These structures can be of vacuolar origin as described for similar fluorescent rings in Tichá et al. (2020) for ANNI-GFP."

      3- The fluorescence at nucleus could be due to a proportion of YFP-TTN5 that is degraded and released free-GFP, a western-blot of the membrane fraction vs the cytosolic fraction could help solving this issue.

      Our response:

      In an α-GFP Western blot using YFP-TTN5 Arabidopsis seedlings, we detected besides the expected and strong 48 kDa YFP-TTN5 band, three additional weak bands ranging between 26 to 35 kDa (NEW Supplementary Figure S7C). We cannot explain the presence of these small protein bands. They might correspond to free YFP, to proteolytic products or potentially to proteins expressed from aberrant transcripts. α-HA Western blot controls performed with plant material from HA3-TTN5 seedlings showed a single band at the correct size (Supplementary Figure S7D). We must therefore be cautious about nuclear TTN5 localization and we rephrased the text carefully (starting Line 300):

      „We also found multiple YFP bands in α-GFP Western blot analysis using YFP-TTN5 Arabidopsis seedlings. Besides the expected and strong 48 kDa YFP-TTN5 band, we observed three weak bands ranging between 26 to 35 kDa (Supplementary Figure S7C). We cannot explain the presence of these small protein bands. They might correspond to free YFP, to proteolytic products or potentially to proteins produced from aberrant transcripts with perhaps alternative translation start or stop sites. On the other side, a triple hemagglutinin-tagged HA3-TTN5 driven by the 35S promoter did complement the embryo-lethal phenotype of ttn5-1 (Supplementary Figure S7D, E). α-HA Western blot control performed with plant material from HA3-TTN5 seedlings showed a single band at the correct size, but no band that was 13 to 18 kDa smaller (Supplementary Figure S7D). (...) We did not observe any staining in nuclei or ER when performing HA3-TTN5 immunostaining (Figure 3P; Figure 4A, B), as was the case for fluorescence signals in YFP-TTN5-expressing cells. Presumably, this can indicate that either the nuclear and ER signals seen with YFP-TTN5 correspond to the smaller proteins detected, as described above, or that immunostaining was not suited to detect them. Hence, we focused interpretation on patterns of localization overlapping between the fluorescence staining with YFP-labeled TTN5 and with HA3-TTN5 immunostaining, such as the particular signal patterns in the specific punctate membrane structures."

      4- It is not so easy to conclude from the co-localization experiments. The confocal pictures are not always of high quality, some of them appear blurry. The Golgi localization looks convincing, but the BFA experiments are not that clear. The MVB localization is pretty convincing but the images are blurry. An issue is the quantification of the co-localizations. Several methods were employed but they do not provide consistent results. As for the object-based co-localization method, the authors employ in the text co-localization result either base on the % of YFP-labeled structures or the % of mCherry/mRFP-labeled structures, but the results are not going always in the same direction. For example, the proportion of YFP-TTN5 that co-localize with MVBs is not so different between WT and mutated version but the proportion of MVBs that co-localize with TTN5 is largely increased in the Q70L mutant. Thus it is quite difficult to interpret homogenously and in an unbiased way these results. Moreover, the results coming from the centroid-based method were presented in a table rather than a graph, I think here the authors wanted to hide the huge standard deviation of these results, what is the statistical meaning of these results?

      Our response:

      First of all, we like to point out that, as explained above, the BFA experiments are now more clear. We performed additional BFA treatment coupled with immunostaining using HA3-TTN5-expressing Arabidopsis seedlings and coupled with fluorescence analysis using YFP-TTN5-expressing Arabidopsis plants. In both experiments, we observed the typical BFA bodies very clearly (NEW Figure 4B, C).

      Second, we like to insist that we performed colocalization very carefully and quantified the data in three different manners. We like to state that there is no general standardized procedure that best suits the idea of a colocalization pattern. Results of colocalization are represented in stem diagrams and table format, including statistical analysis. Colocalization was carried out with the ImageJ plugin JACoP for Pearson's and Overlap coefficients and based on the centroid method. The plotted Pearson's and Overlap coefficients are presented in bar diagrams in Supplementary Figure S8A and C, including statistics. The obtained values by the centroid method are represented in table format in Supplementary Figure S8B and D, which *can be considered a standard method (see Ivanov et al., 2014). *

      Colocalization of two different fluorescence signals was performed for the two channels in a specific chosen region of interest (indicating in % the overlapping signal versus the sum of signal for each channel). The differences between the YFP/mRFP and mRFP/YFP ratios indicate that a higher percentage of ARA7-RFP signal is colocalizing with YFP-TTN5Q70L signal than with the TTN5WT or the TTN5T30N mutant form signals, while the YFP signals have a similar overlap with ARA7-positive structures. This is not a contradiction. Presumably this answers well the questions on colocalization.

      Please note that upon acceptance for publication, we will upload all original colocalization data to BioImage Archive. Hence, the high-quality data can be reanalyzed by readers.

      5- The use of FM4-64 to address the vacuolar trafficking is a hazardous, FM4-64 allows the tracking of endocytosis but does not say anything on vacuolar degradation targeting and even less on the potential function of TTN5 in endosomal vacuolar targeting. Similarly, TTN5, even if localized at the Golgi, is not necessarily function in Golgi-trafficking. __Our response: __

      *Perhaps our previous description was misleading. Thank you for pointing this out. We reformulated the text and modified the schematic representation of FM4-64 in NEW Figure 6A: *

      "(A), Schematic representation of progressive stages of FM4-64 localization and internalization in a cell. FM4-64 is a lipophilic substance. After infiltration, it first localizes in the plasma membrane, at later stages it localizes to intracellular vesicles and membrane compartments. This localization pattern reflects the endocytosis process (Bolte et al. 2004)."

      6- The manuscript lacks in its present shape of functional evidences for a role of TTN5 in any trafficking steps. I understand that the KO mutant is lethal but what are the phenotypes of the Q70L and T30N mutant plants? What is the seedling phenotype, how are the Golgi and MVBs looking like in these mutants? Do the Q70L or T30N mutants perturbed the trafficking of any cargos?

      __Our response: __

      *We agree fully that functional evidences are interesting to assign roles for TTN5 in trafficking steps. A phenotype associated with TTN5T30N and TTN5Q70L is clearly meaningful. *

      First of all, we like to emphasize that it is incorrect that the manuscript lacks functional evidences for a role of TTN5 and the two mutants. In fact, the manuscript even highlights several functional activities that are meaningful in a cellular context. These include different types of kinetic GTPase enzyme activities, subcellular localization in planta and association with different endomembrane compartments and subcellular processes such as endocytosis. We surely agree that future research can focus even more on cell physiological aspects and the physiological functions in plants to examine the proposed roles of TTN5 in intracellular trafficking steps. For such studies, our findings are the fundamental basis.

      Concerning the aspect of colocalization of the mutants with the markers we show in Figure 5C, D and G, H that YFP-TTN5T30N- and YFP-TTN5Q70L-related signals colocalize with the Golgi marker GmMan1-mCherry. Figure 5K, L and O, P show that YFP-TTN5T30N and YFP-TTN5Q70L-related signals can colocalize with the MVB marker, and this may affect relevant vesicle trafficking processes and plasma membrane protein regulation involved in root cell elongation.

      *At present, we have not yet investigated perturbed cargo trafficking. These aspects are certainly interesting but require extensive work and testing of appropriate physiological conditions and appropriate cargo targets. We discuss future perspectives in the Discussion. We agree that such functional information is of great importance, but needs to be clarified in future studies. *

      __Reviewer #1 (Significance (Required)): __

      In conclusion, I think this manuscript is a good biochemical description of an ARF-like protein but it would need to be strengthen on the cell biology and functional sides. Nonetheless, provided these limitations fixed, this manuscript would advance our knowledge of small GTPases in plants. The major conceptual advance of that study is to provide a non-canonical behavior of the active/inactive cycle dynamics for a small-GTPase. Of course this dynamic probably has an impact on TTN5 function and involvement in trafficking, although this remains to be fully demonstrated. Provided a substantial amount of additional experiments to support the claims of that study, this study could be of general interest for scientist working in the trafficking field.

      __Our response: __

      We thank reviewer 1 for the very fruitful comments. We hope that with the additional experiments, NEW Figures and NEW Supplementary Figures as well as our changes in the text, all comments by the reviewer have been addressed.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __

      The manuscript by Mohr and colleagues characterizes the Arabidopsis predicted small GTPase TITAN5 in both biochemical and cell biology contexts using in vitro and in planta techniques. In the first half of the manuscript, the authors use in vitro nucleotide exchange assays to characterise the GTPase activity and nucleotide binding properties of TITAN5 and two mutant variants of it. The in vitro data they produce indicates that TITAN5 does indeed have general GTPase and nucleotide binding capability that would be expected for a protein predicted to be a small GTPase. Interestingly, the authors show that TITAN5 favors a GTP-bound form, which is different to many other characterized GTPases that favor GDP-binding. The authors follow their biochemical characterisation of TITAN with in planta experiments characterizing TITAN5 and its mutant variants association with the plant endomembrane system, both by stable expression in Arabidopsis and transient expression in N.benthamiana.

      The strength of this manuscript is in its in vitro biochemical characterisation of TITAN5 and variants. I am not an expert on in vitro GTPase characterisation and so cannot comment specifically on the assays they have used, but generally speaking this appears to have been well done, and the authors are to be commended for it. In vitro characterisation of plant small GTPases is uncommon, and much of our knowledge is inferred for work on animal or yeast GTPases, so this will be a useful addition to the plant community in general, especially as TITAN5 is an essential gene. The in planta data that follows is sadly not as compelling as the biochemical data, and suffers from several weaknesses. I would encourage the authors to consider trying to improve the quality of the in planta data in general. If improved and then combined with the biochemical aspects of the paper, this has the potential to make a nice addition to plant small GTPase and endomembrane literature.

      The manuscript is generally well written and includes the relevant literature.

      Major issues:

      1. The authors make use of a p35s: YFP-TTN5 construct (and its mutant variants) both stably in Arabidopsis and transiently in N.benthamiana. I know from personal experience that expressing small GTPases from non-endogenous promoters and in transient expression systems can give very different results to when working from endogenous promoters/using immunolocalization in stable expression systems. Strong over-expression could for example explain why the authors see high 'cytosolic' levels of YFP-TTN5. It is therefore questionable how much of the in planta localisation data presented using p35S and expression in tobacco is of true relevance to the biological function of TITAN5. The authors do present some immunolocalization data of HA3-TTN5 in Arabidopsis, but this is fairly limited and it is very difficult in its current form to use this to identify whether the data from YFP-TTN5 in Arabidopsis and tobacco can be corroborated. I would encourage the authors to consider expanding the immunolocalization data they present to validate their findings in tobacco. __Our response: __

      We are aware that endogenous promoters may be preferred over 35S promoter. However, the two types of lines we generated with endogenous promoter did both not show fluorescent signals so that we could unfortunately not use them (not shown). Besides 35S promoter-mediated expression we were also investigating inducible expression vectors for fluorescence imaging in N. benthamiana (not shown). Both inducible and constitutive expression showed very similar expression patterns so that we chose characterizing in detail the 35S::YFP-TTN5 fluorescence in both N. bethamiana*and Arabidopsis. *

      We have expanded immunolocalization using the HA3-TTN5 line and compare it now along with YFP fluorescence signal in YFP-TTN5 seedlings (NEW Figure 3P; NEW Figure 4).

      „For a more detailed investigation of HA3-TTN5 subcellular localization, we then performed co-immunofluorescence staining with an Alexa 488-labeled antibody recognizing the Golgi and TGN marker ARF1, while detecting HA3-TTN5 with an Alexa 555-labeled antibody (Robinson et al. 2011, Singh et al. 2018) (Figure 4A). ARF1-Alexa 488 staining was clearly visible in punctate structures representing presumably Golgi stacks (Figure 4A, Alexa 488), as previously reported (Singh et al. 2018). Similar structures were obtained for HA3-TTN5-Alexa 555 staining (Figure 4A, Alexa 555). But surprisingly, colocalization analysis demonstrated that the HA3-TTN5-labeled structures were mostly not colocalizing and thus distinct from the ARF1-labeled ones (Figure 4A). Yet the HA3-TTN5- and ARF1-labeled structures were in close proximity to each other (Figure 4A). We hypothesized that the HA3-TTN5 structures can be connected to intracellular trafficking steps. To test this, we performed brefeldin A (BFA) treatment, a commonly used tool in cell biology for preventing dynamic membrane trafficking events and vesicle transport involving the Golgi. BFA is a fungal macrocyclic lactone that leads to a loss of cis-cisternae and accumulation of Golgi stacks, known as BFA-induced compartments, up to the fusion of the Golgi with the ER (Ritzenthaler et al. 2002, Wang et al. 2016). For a better identification of BFA bodies, we additionally used the dye FM4-64, which can emit fluorescence in a lipophilic membrane environment. FM4-64 marks the plasma membrane in the first minutes following application to the cell, then may be endocytosed and in the presence of BFA become accumulated in BFA bodies (Bolte et al. 2004). We observed BFA bodies positive for both, HA3-TTN5-Alexa 488 and FM4-64 signals (Figure 4B). Similar patterns were observed for YFP-TTN5-derived signals in YFP-TTN5-expressing roots (Figure 4C). Hence, HA3-TTN5 and YFP-TTN5 can be present in similar subcellular membrane compartments."

      • *

      Many of the confocal images presented are of poor quality, particularly those from N.benthamiana.

      Our response:

      All confocal images are of high quality in their original format. To make them accessible, we will upload all raw data to BioImage Archive upon acceptance of the manuscript.

      The authors in some places see YFP-TTN5 in cell nuclei. This could be a result of YFP-cleavage rather than genuine nuclear localisation of YFP-TTN5, but the authors do not present western blots to check for this.

      __Our response: __

      As described in our response to reviewer 1, comment 3, Fluorescence signals were detected within the nuclei of root cells of YFP-TTN5 plants, while immunostaining signals of HA3-TTN5 were not detected in the nucleus. In an α-GFP Western blot using YFP-TTN5 Arabidopsis seedlings, we detected besides the expected and strong 48 kDa YFP-TTN5 band, three additional weak bands ranging between 26 to 35 kDa (NEW Supplementary Figure S7C). We cannot explain the presence of these small protein bands. They might correspond to free YFP, to proteolytic products or potentially to proteins expressed from aberrant transcripts. α-HA Western blot controls performed with plant material from HA3-TTN5 seedlings showed a single band at the correct size (Supplementary Figure S7D). We must therefore be cautious about nuclear TTN5 localization and we rephrased the text carefully (starting Line 300):

      • *

      „We also found multiple YFP bands in α-GFP Western blot analysis using YFP-TTN5 Arabidopsis seedlings. Besides the expected and strong 48 kDa YFP-TTN5 band, we observed three weak bands ranging between 26 to 35 kDa (Supplementary Figure S7C). We cannot explain the presence of these small protein bands. They might correspond to free YFP, to proteolytic products or potentially to proteins produced from aberrant transcripts with perhaps alternative translation start or stop sites. On the other side, a triple hemagglutinin-tagged HA3-TTN5 driven by the 35S promoter did complement the embryo-lethal phenotype of ttn5-1 (Supplementary Figure S7D, E). α-HA Western blot control performed with plant material from HA3-TTN5 seedlings showed a single band at the correct size, but no band that was 13 to 18 kDa smaller (Supplementary Figure S7D). (...) We did not observe any staining in nuclei or ER when performing HA3-TTN5 immunostaining (Figure 3P; Figure 4A, B), as was the case for fluorescence signals in YFP-TTN5-expressing cells. Presumably, this can indicate that either the nuclear and ER signals seen with YFP-TTN5 correspond to the smaller proteins detected, as described above, or that immunostaining was not suited to detect them. Hence, we focused interpretation on patterns of localization overlapping between the fluorescence staining with YFP-labeled TTN5 and with HA3-TTN5 immunostaining, such as the particular signal patterns in the specific punctate membrane structures."

      That YFP-TTN5 fails to rescue the ttn5 mutant indicates that YFP-tagged TTN5 may not be functional. If the authors cannot corroborate the YFP-TTN5 localisation pattern with that of HA3-TTN5 via immunolocalization, then the fact that YFP-TTN5 may not be functional calls into question the biological relevance of YFP-TTN5's localisation pattern.

      __Our response: __

      This refers to your comment 1, please check this comment for a detailed response. Please also see our answer to reviewer 1, comment 1.

      At first, we like to state that specific detection of intracellular localization of plant proteins in plant cells is generally technically very difficult, when the protein abundance is not overly high. In this revised version, we extended immunostaining analysis to different membrane compartments, including now immunostaining of complementing HA3-TTN5 in the absence and presence of BFA, along with immunodetection of ARF1 and FM4-64 labeling in roots (NEW Figure 3P, NEW Figure 4A, B). In the revised version, we focus the analysis and conclusions on the fluorescence patterns that overlap between YFP-TTN5 detection and HA3-TTN5 immunodetection. With this, we can be most confident about subcellular TTN5 localization. Please find this NEW text in the Result section (starting Line 323):

      „For a more detailed investigation of HA3-TTN5 subcellular localization, we then performed co-immunofluorescence staining with an Alexa 488-labeled antibody recognizing the Golgi and TGN marker ARF1, while detecting HA3-TTN5 with an Alexa 555-labeled antibody (Robinson et al. 2011, Singh et al. 2018) (Figure 4A). ARF1-Alexa 488 staining was clearly visible in punctate structures representing presumably Golgi stacks (Figure 4A, Alexa 488), as previously reported (Singh et al. 2018). Similar structures were obtained for HA3-TTN5-Alexa 555 staining (Figure 4A, Alexa 555). But surprisingly, colocalization analysis demonstrated that the HA3-TTN5-labeled structures were mostly not colocalizing and thus distinct from the ARF1-labeled ones (Figure 4A). Yet the HA3-TTN5- and ARF1-labeled structures were in close proximity to each other (Figure 4A). We hypothesized that the HA3-TTN5 structures can be connected to intracellular trafficking steps. To test this, we performed brefeldin A (BFA) treatment, a commonly used tool in cell biology for preventing dynamic membrane trafficking events and vesicle transport involving the Golgi. BFA is a fungal macrocyclic lactone that leads to a loss of cis-cisternae and accumulation of Golgi stacks, known as BFA-induced compartments, up to the fusion of the Golgi with the ER (Ritzenthaler et al. 2002, Wang et al. 2016). For a better identification of BFA bodies, we additionally used the dye FM4-64, which can emit fluorescence in a lipophilic membrane environment. FM4-64 marks the plasma membrane in the first minutes following application to the cell, then may be endocytosed and in the presence of BFA become accumulated in BFA bodies (Bolte et al. 2004). We observed BFA bodies positive for both, HA3-TTN5-Alexa 488 and FM4-64 signals (Figure 4B). Similar patterns were observed for YFP-TTN5-derived signals in YFP-TTN5-expressing roots (Figure 4C). Hence, HA3-TTN5 and YFP-TTN5 can be present in similar subcellular membrane compartments."

      We did not find evidence that HA3-TTN5 can localize at the ER using whole-mount immunostaining (NEW Figure 3P; NEW Figure 4A, B). Hence, we are careful with describing that fluorescence at the ER, as seen in the YFP-TTN5 line (Figure 3M, N) reflects TTN5 localization. We therefore do not focus the text on the ER pattern in the Result section (starting Line 295):

      „Additionally, YFP signals were also detected in a net-like pattern typical for ER localization (Figure 3M, N). (...) We also found multiple YFP bands in α-GFP Western blot analysis using YFP-TTN5 Arabidopsis seedlings. Besides the expected and strong 48 kDa YFP-TTN5 band, we observed three weak bands ranging between 26 to 35 kDa (Supplementary Figure S7C). We cannot explain the presence of these small protein bands. They might correspond to free YFP, to proteolytic products or potentially to proteins produced from aberrant transcripts with perhaps alternative translation start or stop sites. On the other side, a triple hemagglutinin-tagged HA3-TTN5 driven by the 35S promoter did complement the embryo-lethal phenotype of ttn5-1 (Supplementary Figure S7D, E). α-HA Western blot control performed with plant material from HA3-TTN5 seedlings showed a single band at the correct size, but no band that was 13 to 18 kDa smaller (Supplementary Figure S7D). (...) We did not observe any staining in nuclei or ER when performing HA3-TTN5 immunostaining (Figure 3P; Figure 4A, B), as was the case for fluorescence signals in YFP-TTN5-expressing cells. Presumably, this can indicate that either the nuclear and ER signals seen with YFP-TTN5 correspond to the smaller proteins detected, as described above, or that immunostaining was not suited to detect them. Hence, we focused interpretation on patterns of localization overlapping between the fluorescence staining with YFP-labeled TTN5 and with HA3-TTN5 immunostaining, such as the particular signal patterns in the specific punctate membrane structures."

      *And we discuss in the Discussion section (starting Line 552): *

      „We based the TTN5 localization data on tagging approaches with two different detection methods to enhance reliability of specific protein detection. Even though YFP-TTN5 did not complement the embryo-lethality of a ttn5 loss of function mutant, we made several observations that suggest YFP-TTN5 signals to be meaningful at various membrane sites. We do not know why YFP-TTN5 does not complement. There could be differences in TTN5 levels and interactions in some cell types, which were hindering specifically YFP-TTN5 but not HA3-TTN5. (...) Though constitutively driven, the YFP-TTN5 expression may be delayed or insufficient at the early embryonic stages resulting in the lack of embryo-lethal complementation. On the other hand, the very fast nucleotide exchange activity may be hindered by the presence of a large YFP-tag in comparison with the small HA3-tag which is able to rescue the embryo-lethality. The lack of complementation represents a challenge for the localization of small GTPases with rapid nucleotide exchange in plants. Despite of these limitations, we made relevant observations in our data that made us believe that YFP signals in YFP-TTN5-expressing cells at membrane sites can be meaningful."

      • *

      Without a cell wall label/dye, the plasmolysis data presented in Figure 5 is hard to visualize.

      __Our response: __

      Figure 6E-G (previously Fig. 5) show the results of plasmolysis experiments with YFP-TTN5 and the two mutant variant constructs. It is clearly possible to observe plasmolysis when focusing on the Hechtian strands. Hechtian strands are formed due to the retraction of the protoplast as a result of the osmotic pressure by the added mannitol solution. Hechtian strands consist of PM which remained in contact with the cell wall, visible as thin filamental structures. We stained the PM and the Hechtian strands by the PM dye FM4-64. This is similary done in Yoneda et al., 2020. We could detect in the YFP-TTN5-transformed cells, colocalization with the YFP channels and the PM dye in filamental structures between two neighbouring FM4-64-labelled PMs. Although an additional labeling of the cell wall may further indicate plasmolysis, it is not needed here.

      Please consider that we will upload all original image data to BioImage Archive so that a detailed re-investigation of the images can be done.

      • *

      __Minor issues: __

      In some of the presented N.benthamiana images, it looks like YFP-TTN5 may be partially ER-localised. However, co-localisation with an ER marker is not presented.

      Our response:

      *Referring to our response to comments 1 and 3 of reviewer 2 and to comment 1 of reviewer 1: *

      We did not find evidence that HA3-TTN5 can localize at the ER using whole-mount immunostaining (NEW Figure 3P; NEW Figure 4A, B). Hence, we are careful with describing that fluorescence at the ER, as seen in the YFP-TTN5 line (Figure 3M, N) reflects TTN5 localization. We therefore do not focus the text on the ER pattern in the Result section (starting Line 295):

      „Additionally, YFP signals were also detected in a net-like pattern typical for ER localization (Figure 3M, N). (...) We also found multiple YFP bands in α-GFP Western blot analysis using YFP-TTN5 Arabidopsis seedlings. Besides the expected and strong 48 kDa YFP-TTN5 band, we observed three weak bands ranging between 26 to 35 kDa (Supplementary Figure S7C). We cannot explain the presence of these small protein bands. They might correspond to free YFP, to proteolytic products or potentially to proteins produced from aberrant transcripts with perhaps alternative translation start or stop sites. On the other side, a triple hemagglutinin-tagged HA3-TTN5 driven by the 35S promoter did complement the embryo-lethal phenotype of ttn5-1 (Supplementary Figure S7D, E). α-HA Western blot control performed with plant material from HA3-TTN5 seedlings showed a single band at the correct size, but no band that was 13 to 18 kDa smaller (Supplementary Figure S7D). (...) We did not observe any staining in nuclei or ER when performing HA3-TTN5 immunostaining (Figure 3P; Figure 4A, B), as was the case for fluorescence signals in YFP-TTN5-expressing cells. Presumably, this can indicate that either the nuclear and ER signals seen with YFP-TTN5 correspond to the smaller proteins detected, as described above, or that immunostaining was not suited to detect them. Hence, we focused interpretation on patterns of localization overlapping between the fluorescence staining with YFP-labeled TTN5 and with HA3-TTN5 immunostaining, such as the particular signal patterns in the specific punctate membrane structures."

      *And we discuss in the Discussion section (starting Line 552): *

      „We based the TTN5 localization data on tagging approaches with two different detection methods to enhance reliability of specific protein detection. Even though YFP-TTN5 did not complement the embryo-lethality of a ttn5 loss of function mutant, we made several observations that suggest YFP-TTN5 signals to be meaningful at various membrane sites. We do not know why YFP-TTN5 does not complement. There could be differences in TTN5 levels and interactions in some cell types, which were hindering specifically YFP-TTN5 but not HA3-TTN5. (...) Though constitutively driven, the YFP-TTN5 expression may be delayed or insufficient at the early embryonic stages resulting in the lack of embryo-lethal complementation. On the other hand, the very fast nucleotide exchange activity may be hindered by the presence of a large YFP-tag in comparison with the small HA3-tag which is able to rescue the embryo-lethality. The lack of complementation represents a challenge for the localization of small GTPases with rapid nucleotide exchange in plants. Despite of these limitations, we made relevant observations in our data that made us believe that YFP signals in YFP-TTN5-expressing cells at membrane sites can be meaningful."

      • *

      There is some inconsistency within the N.benthamiana images. For example, compare Figure 4C of YFP-TTN5T30N to Figure 4O of YFP-TTN5T30N. Figure 4O is presented as being significant because wortmannin-induced swollen ARA7 compartments are labelled by YFP-TTN5T30N. However, structures very similar to these can already been seen in Figure 4C, which is apparently an unrelated experiment. This, to my mind, is likely a result of the very different expression levels between different cells that can be produced by transient expression in N.benthamiana.

      __Our response: __

      Former Figure 4 is now Figure 5. As detailed in our response to comment 2 of reviewer 1:

      The reviewer certainly refers to fluorescence images from N. benthamiana leaf epidermal cells where different circularly shaped structures are visible. In these respective structures, the fluorescent circles are depleted from fluorescence in the center, e.g. in Figure 5C, YFP- fluorescent signals in TTN5T30N transformed leaf discs. We suspect that these structures can be of vacuolar origin as described for similar fluorescent rings in Tichá et al., 2020 for ANNI-GFP (reference in manuscript). The reviewer certainly does not refer to swollen MVBs that are seen following wortmannin treatment, as in Figure 5N-P, which look similar in their shape but are larger in size. Please note that we always included the control conditions, namely the images recorded before the wortmannin treatment, so that we were able to investigate the changes induced by wortmannin. Hence, we can clearly say that the structures with depleted fluorescence in the center as in Figure 5C are not wortmannin-induced swollen MVBs.To make these points clear to the reader, we added an explanation into the text (Line 385-388):

      „We also observed YFP fluorescence signals in the form of circularly shaped ring structures with a fluorescence-depleted center. These structures can be of vacuolar origin as described for similar fluorescent rings in Tichá et al. (2020) for ANNI-GFP."

      **Referees cross-commenting**

      It sems that all of the reviewers have converged on the conclusion that the in planta characterisation of TTN5 is insufficient to be of substantial interest to the field, highlighting the fact that major improvements are required to strengthen this part of the manuscript and increase its relevance.

      __Reviewer #2 (Significance (Required)): __

      General assessment: the strengths of this work are in its in vitro characterisation of TITAN5, however, the in planta characterisation lacks depth.

      Significance: the in vitro characterisation of TITAN5 is commendable as such work is lacking for plant GTPases. However, the significance of the work would be boosted substantially by better in planta characterisation, which is where most the most broad interest will lie.

      My expertise: my expertise is in in planta characterisation of small GTPases and their interactors.

      __Our response: __

      We thank the reviewer for the kind evaluation of our manuscript. We are confident that the changes in the text and NEW Figures and NEW Supplementary Figures will be convincing to consider our work.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __

      Summary: Cellular traffic is an important and well-studied biological process in animal and plant systems. While components involved in transport are known the mechanism by which these components control activity or destination remains to be studied. A critical step in regulating traffic is proper budding and tethering of vesicles. A critical component in determining this step is a family proteins with GTPase activity, which act as switches facilitating vesicle interaction between proteins, or cytoskeleton. The current manuscript by Mohr and colleagues have characterized a small GTPase TITAN5 (TTN5) and identified two residues Gln70 and Thr30 in the protein which they propose to have functional roles. The authors catalogue the localization, GTP hydrolytic activity, and discuss putative functions of TTN5 and the mutants.

      __Major comments: __

      The core of the manuscript, which is descriptive characterization of TTN5, lies in reliably demonstrating putative roles. While the GTP hydrolysis rates are well-quantified (though the claims need to be toned down), the microscopy data especially the association of TTN5 with different endomembrane compartments is not convincing due to the quality (low resolution) of the figures submitted. The manuscript text is difficult to navigate due to repetition and inconsistency in the order that the mutants are referred. I am requesting additional experiments which should be feasible considering the authors have all the materials required to perform the experiments and obtain high-quality images which support their claims.

      In general the figure quality needs to be improved for all microscopy images. I would suggest that the authors highlight 1-2 individual cells to make their point and use the current images as supplementary to establish a broader spread. __Our response: __

      *We have worked substantially on the text and figures to make the content well comprehensive. The mutants are referred to in a consistent manner in the text and figures. We have addressed requested experiments. *

      As we pointed out in the cover letter and our responses to reviewers 1 and 2, we will upload all raw image data to BioImage Archive upon acceptance of the manuscript so that they can be re-examined without any reduction of resolution. Furthermore, we have conducted new experiments on immunolocalization of HA3-TTN5 (NEW Figure 3P, NEW Figure 4A, B). The text has been improved in several places (see highlighted changes in the manuscript and as detailed in the responses to reviewer 1. We think, this addresses well the reviewers' concerns.

      Fig. S1 lacks clarity. __Our response: __

      Supplementary Figure S1 shows TTN5 gene expression in different organs and growing stages as revealed by transcriptomic data, made available through the AtGenExpress eFB tool of the Bio-Analytic Resource for Plant Biology (BAR). The figure visualizes that TTN5 is ubiquitously expressed in different plant organs and tissues, e.g. the epidermis layers that we investigated here, and throughout development including embryo development. In accordance with the embryo-lethal phenotype, this highlights well that TTN5* is needed throughout for plant growth and it emphasizes that our investigation of TTN5 localization in epidermis cells is valid. *

      We have added a better description to the figure legend. We now also mention the respective publications from which the transcriptome data-sets are derived. The modified figure legend is:

      "Supplementary Figure S1. Visualization of TTN5 gene expression levels during plant development based on transcriptome data. Expression levels in (A), different types of aerial organs at different developmental stages; from left to right and bottom to top are represented different seed and plant growth stages, flower development stages, different leaves, vegetative to inflorescence shoot apex, embryo and silique development stages; (B), seedling root tissues based on single cell analysis represented in form of a uniform manifold approximation and projection plot; (C), successive stages of embryo development. As shown in (A) to (C), TTN5 is ubiquitously expressed in these different plant organs and tissues. In particular, it should be noted that TTN5 transcripts were detectable in the epidermis cell layer of roots that we used for localization of tagged TTN5 protein in this study. In accordance with the embryo-lethal phenotype, the ubiquitous expression of TTN5 highlights its importance for plant growth. Original data were derived from (Nakabayashi et al. 2005, Schmid et al. 2005) (A); (Ryu et al. 2019) (B); (Waese et al. 2017) (C). Gene expression levels are indicated by local maximum color code, ranging from the minimum (no expression) in yellow to the maximum (highest expression) in red."

      For the supplementary videos, it is difficult to determine if punctate structures are moving or is it cytoplasmic streaming? Could this be done with a co-localized marker? Considering that such markers have been used later in Fig. 4? __Our response: __

      We had detected movement of YFP fluorescent structures in all analyzed YFP-TTN5 plant parts except the root tip. Movement of fluorescence signals in YFP-TTN5T30N seedlings was slowed in hypocotyl epidermis cells. To answer the reviewer comment, we added three NEW supplemental videos (NEW Supplementary Video Material S1M-O) generated with all the three YFP-TTN5 constructs imaged over time in N. benthamiana leaf epidermal cells upon colocalization with the cis-Golgi marker GmMan1-mCherry as requested by the reviewer. In these NEW videos, some of *the YFP fluorescent spots seem to move together with the Golgi stacks. GmMan1 is described with a stop-and-go directed movement mediated by the actino-myosin system (Nebenführ 1999) and similarly it might be the case for YFP-TTN5 signals based on the colocalization. *

      • *

      It would be good if the speed of movement is quantified, if the authors want to retain the current claims in results and the discussion. __Our response: __

      *We describe a difference in the movement of YFP fluorescent signal for the YFP-TTN5T30N variant in the hypocotyl compared to YFP-TTN5 and YFP-TTN5Q70L. In hypocotyl cells, we could observe a slowed down or arrested movement specifically of YFP-TTN5T30N fluorescent structures, and we describe this in the Results section (Line 278-291). *

      "Interestingly, the mobility of these punctate structures differed within the cells when the mutant YFP-TTN5T30N was observed in hypocotyl epidermis cells, but not in the leaf epidermis cells (Supplementary Video Material S1E, compare with S1B) nor was it the case for the YFP-TTN5Q70L mutant (Supplementary Video Material S1F, compare with S1E)."

      *The slowed movement in the YFP-TTN5T30N mutant is well visible even without quantification. We checked that the manuscript text does not contain overstatements in this regard. *

      • *

      Fig.2 I am not sure what the unit / scale is in Fig. 2D/E if each parameter (Kon, Koff, and Kd) are individually plotted? Could the authors please clarify/simplify this panel?

      __Our response: __

      We presented kinetics for nucleotide association (kon) and dissociation (koff) and the dissociation constant (Kd) in a bar diagram for each nucleotide, mdGDP (Figure 2D) and mGppNHp (Figure 2E). We modified and relabeled the bar diagram representation. It should be now very clear which are the parameters and units. Please see also the other modified figures (NEW modified Figure 2A-H). We also modified the legend of Figure 2D and E:

      "(D-E), Kinetics of association and dissociation of fluorescent nucleotides mdGDP (D) or mGppNHp (E) with TTN5 proteins (WT, TTN5T30N, TTN5Q70L) are illustrated as bar charts. The association of mdGDP (0.1 µM) or mGppNHp (0.1 µM) with increasing concentration of TTN5WT, TTN5T30N and TTN5Q70L was measured using a stopped-flow device (see A, B; data see Supplementary Figure S3A-F, S4A-E). Association rate constants (kon in µM-1s-1) were determined from the plot of increasing observed rate constants (kobs in s-1) against the corresponding concentrations of the TTN5 proteins. Intrinsic dissociation rates (koff in s-1) were determined by rapidly mixing 0.1 µM mdGDP-bound or mGppNHp-bound TTN5 proteins with the excess amount of unlabeled GDP (see A, C, data see Supplementary Figure S3G-I, S4F-H). The nucleotide affinity (dissociation constant or Kd in µM) of the corresponding TTN5 proteins was calculated by dividing koff by kon. When mixing mGppNHp with nucleotide-free TTN5T30N, no binding was observed (n.b.o.) under these experimental conditions."

      • *

      Are panels D and E representing values for mdGDP and GppNHP? This is not very clear from the figure legend.

      __Our response: __

      Yes, Figure 2D and E represent the kon, koff and Kd values for mdGDP (Figure 2D) and mGppNHP (Figure 2E). As detailed in our previous response to comment 2a, we modified figure and figure legend to make the representation more clear.

      • *

      Fig. 3 Same comments as in para above - improve resolution fo images, concentrate on a few selected cells, if required use an inset figure to zoom-in to specific compartments. Our response:

      As detailed in our responses to reviewers 1 and 2, we will upload all original image data to BioImage Archive upon acceptance of the manuscript, so that a detailed investigation of all our images is possible without any reduction of resolution.

      Please provide the non-fluorescent channel images to understand cell topography __Our response: __

      *We presented our microscopic images with the respective fluorescent channel and for colocalization with an additional merge. We did not present brightfield images as the cell topography was already well visible by fluorescent signal close to the PM. Therefore, brightfield images would not provide any benefit. Since we will upload all original data to BioImage Archive for a detailed investigation of all our images, the data can be obtained if needed. *

      Is the nuclear localization seen in transient expression (panel L-N) an artefact? If so, this needs to be mentioned in the text. Our response:

      As explained in our responses to reviewers 1 and 2, fluorescence signals were detected within the nuclei of root cells of YFP-TTN5 plants, while immunostaining signals of HA3-TTN5 were not detected in the nucleus.

      In an α-GFP Western blot using YFP-TTN5 Arabidopsis seedlings, we detected besides the expected and strong 48 kDa YFP-TTN5 band, three additional weak bands ranging between 26 to 35 kDa (NEW Supplementary Figure S7C). We cannot explain the presence of these small protein bands. They might correspond to free YFP, to proteolytic products or potentially to proteins expressed from aberrant transcripts. α-HA Western blot controls performed with plant material from HA3-TTN5 seedlings showed a single band at the correct size (Supplementary Figure S7D). We must therefore be cautious about nuclear TTN5 localization and we rephrased the text carefully (starting Line 300):

      „We also found multiple YFP bands in α-GFP Western blot analysis using YFP-TTN5 Arabidopsis seedlings. Besides the expected and strong 48 kDa YFP-TTN5 band, we observed three weak bands ranging between 26 to 35 kDa (Supplementary Figure S7C). We cannot explain the presence of these small protein bands. They might correspond to free YFP, to proteolytic products or potentially to proteins produced from aberrant transcripts with perhaps alternative translation start or stop sites. On the other side, a triple hemagglutinin-tagged HA3-TTN5 driven by the 35S promoter did complement the embryo-lethal phenotype of ttn5-1 (Supplementary Figure S7D, E). α-HA Western blot control performed with plant material from HA3-TTN5 seedlings showed a single band at the correct size, but no band that was 13 to 18 kDa smaller (Supplementary Figure S7D). (...) We did not observe any staining in nuclei or ER when performing HA3-TTN5 immunostaining (Figure 3P; Figure 4A, B), as was the case for fluorescence signals in YFP-TTN5-expressing cells. Presumably, this can indicate that either the nuclear and ER signals seen with YFP-TTN5 correspond to the smaller proteins detected, as described above, or that immunostaining was not suited to detect them. Hence, we focused interpretation on patterns of localization overlapping between the fluorescence staining with YFP-labeled TTN5 and with HA3-TTN5 immunostaining, such as the particular signal patterns in the specific punctate membrane structures."

      Fig. 4 - In addition to the points made for Fig. 3 The authors should consider reducing gain/exposure to improve image clarity. Especially for the punctate structures, which are difficult to observe in TTN5, likely because of the cytoplasmic localization as well.

      __Our response: __

      Thank you for this comment. We record image z-stacks and represent in single z-planes. Reducing the gain to decrease the cytoplasmic signal does not increase the clarity of the punctate structures as the signal strength will become weak.. As mentioned above, we will upload all original image data to BioImage Archive for a detailed investigation of all our images without any reduction of resolution.

      • *

      Reducing Agrobacterial load could be considered. OD of 0.4 is a bit much, 0.1 or even 0.05 could be tried. If available try expression in N. tabaccum, which is more amenable to microscopy. However, this is OPTIONAL, benthamiana should suffice. __Our response: __

      Thank you for the suggestion. We are routinely using N. benthamiana leaf infiltration. When setting up this method at first, we did not observe different localization results by using different ODs of bacterial cultures. Hence, an OD600 of 0.4 is routinely used in our institute. This value is comparable with the literature although some literature reports even higher OD values for infiltration (Norkunas et al., 2018; Drapal et al., 2021; Zhang et al., 2020, Davis et al., 2020; Stephenson et al., 2018).

      A standard norm now is to establish the level of colocalization is by quantifying a pearson's or Mander's correlation. Which I believe has been done in the text, I didn't find a plot representing the same? Could the data (which the authors already have) be plotted alongwith "n" as a table or graph? __Our response: __

      *Please check our response to reviewer 1, comment 4. *

      We like to insist that we performed colocalization very carefully and quantified the data in three different manners. We like to state that there is no general standardized procedure that best suits the idea of a colocalization pattern. Results of colocalization are represented in stem diagrams and table format, including statistical analysis. Colocalization was carried out with the ImageJ plugin JACoP for Pearson's and Overlap coefficients and based on the centroid method. The plotted Pearson's and Overlap coefficients are presented in bar diagrams in Supplementary Figure S8A and C, including statistics. The obtained values by the centroid method are represented in table format in Supplementary Figure S8B and D, which *can be considered a standard method (see Ivanov et al., 2014). *

      Colocalization of two different fluorescence signals was performed for the two channels in a specific chosen region of interest (indicating in % the overlapping signal versus the sum of signal for each channel). The differences between the YFP/mRFP and mRFP/YFP ratios indicate that a higher percentage of ARA7-RFP signal is colocalizing with YFP-TTN5Q70L signal than with the TTN5WT or the TTN5T30N mutant form signals, while the YFP signals have a similar overlap with ARA7-positive structures. This is not a contradiction. Presumably this answers well the questions on colocalization.

      Please note that upon acceptance for publication, we will upload all original colocalization data to BioImage Archive. Hence, the high-quality data can be reanalyzed by readers.

      The cartoons for the action of chemicals are useful, but need a bit more clarity. Our response:

      The schematic explanations of pharmacological treatments and expected outcomes are useful to readers. For a better understanding, we added additional explaining sentences to the figure legends (Figure 5E, M; Figure 6A). We also modified Figure 6A and the corresponding legend.

      "(E), Schematic representation of GmMan1 localization at the ER upon brefeldin A (BFA) treatment. BFA blocks ARF-GEF proteins which leads to a loss of Golgi cis-cisternae and the formation of BFA-induced compartments due to an accumulation of Golgi stacks up to a redistribution of the Golgi to the ER by fusion of the Golgi with the ER (Renna and Brandizzi 2020)."

      "(M), Schematic representation of ARA7 localization in swollen MVBs upon wortmannin treatment. Wortmannin inhibits phosphatidylinositol-3-kinase (PI3K) function leading to the fusion of TGN/EE to swollen MVBs (Renna and Brandizzi 2020)."

      "(A), Schematic representation of progressive stages of FM4-64 localization and internalization in a cell. FM4-64 is a lipophilic substance. After infiltration, it first localizes in the plasma membrane, at later stages it localizes to intracellular vesicles and membrane compartments. This localization pattern reflects the endocytosis process (Bolte et al. 2004)."

      • *

      Fig. 5 does the Q70L mutant show reduced endocytosis ?

      __Our response: __

      We have not investigated this question. As detailed in our response to reviewer 1, *we like to emphasize that we agree fully that functional evidences are interesting to assign role for TTN5 in trafficking steps. A phenotype associated with TTN5T30N and TTN5Q70L would be clearly meaningful. *

      Concerning the aspect of colocalization of the mutants with the markers we show in Figure 5C, D and G, H that YFP-TTN5T30N- and YFP-TTN5Q70L-related signals colocalize with the Golgi marker GmMan1-mCherry. Figure 5K, L and O, P show that YFP-TTN5T30N and YFP-TTN5Q70L-related signals can colocalize with the MVB marker, and this may affect relevant vesicle trafficking processes and plasma membrane protein regulation involved in root cell elongation.

      *At present, we have not yet investigated perturbed cargo trafficking. These aspects are certainly interesting but require extensive work and testing of appropriate physiological conditions and appropriate cargo targets. We discuss future perspectives in the Discussion. We agree that such functional information is of great importance, but needs to be clarified in future studies. *

      • *

      The main text needs to be organized in a way that a reader can separate what is the hypothesis/assumption from actual results and conclusions (see lines #143-149).

      Our response:

      *Thank you for this comment. We reformulated text throughout the manuscript. *

      The text is repeated in multiple places, while I understand that this is not plagiarism, the repetitiveness makes it difficult to read and understand the text. I highlight a couple of examples here, but please check the whole text thoroughly and edit/delete as necessary. a. Lines #124-125 with Lines #149-151 Lines #140-143

      __Our response: __

      *We checked the text and removed unnecessary repetitions. *

      • *

      • Could the authors elaborate on whether there are plan homologs of TTN5? Also, have other ARF/ARLs been compared to TTN5 beyond HsARF1? *

      Our response:

      Phylogenetic trees of the ARF family in Arabidopsis in comparison to human ARF family were already published by Vernoud et al. (2003). In this phylogenetic tree ARF, ARL and SAR proteins of Arabidopsis are compared with the members in humans and S. cervisiae. It is difficult to deduce whether the proteins are homologs or orthologs. In this setting, an ortholog of TTN5 may be HsARL2 followed by HsARL3. In Figure 1A we represented some human GTPases as closely related in sequence to TTN5, these are HsARL2, HsARF1 and AtARF1 since they are the best studied ARF GTPases. HRAS is a well-known member of the RAS superfamily which we used for kinetic comparison in Figure 2. We additionally compared published kinetics of RAC1, HsARF3, *CDC42, RHOA, ARF6, RAD, GEM, and RAS GTPases. *

      • *

      On a related note, a major problem I have with these kinetic values is the assumption of significance or not. For eg. Line#180 the values represent and 2 and 6-fold increase, if these numbers do not matter can a significance threshold be applied so as to understand how much fold-change is appreciable?

      Our response:

      The kinetics of TTN5 and its two mutant variants can be compared with those of other studied GTPases. To provide a basis for the statements about differences in GTPase activities, we modified the text and added respective references in the text for comparisons of fold changes.

      The new text is now as follows Line 175-231):

      „ We next measured the dissociation (koff) of mdGDP and mGppNHp from the TTN5 proteins in the presence of excess amounts of GDP and GppNHp, respectively (Figure 2C) and found interesting differences (Figure 2D, E; Supplementary Figures S3G-I, S4F-H). First, TTN5WT showed a koff value (0.012 s-1 for mGDP) (Figure 2D; Supplementary Figure S3G), which was 100-fold faster than those obtained for classical small GTPases, including RAC1 (Haeusler et al. 2006)and HRAS (Gremer et al. 2011), but very similar to the koff value of HsARF3 (Fasano et al. 2022). Second, the koffvalues for mGDP and mGppNHp, respectively, were in a similar range between TTN5WT (0.012 s-1 mGDP and 0.001 s-1mGppNHp) and TTN5Q70L (0.025 s-1 mGDP and 0.006 s-1 mGppNHp), respectively, but the koff values differed 10-fold between the two nucleotides mGDP and mGppNHp in TTN5WT (koff = 0.012 s-1 versus koff = 0.001 s-1; Figure 2D, E; Supplementary Figure S3G, I, S4F, H). Thus, mGDP dissociated from proteins 10-fold faster than mGppNHp. Third, the mGDP dissociation from TTN5T30N (koff = 0.149 s-1) was 12.5-fold faster than that of TTN5WT and 37-fold faster than the mGppNHp dissociation of TTN5T30N (koff = 0.004 s-1) (Figure 2D, E; Supplementary Figure S3H, S4G). Mutants of CDC42, RAC1, RHOA, ARF6, RAD, GEM and RAS GTPases, equivalent to TTN5T30N, display decreased nucleotide binding affinity and therefore tend to remain in a nucleotide-free state in a complex with their cognate GEFs (Erickson et al. 1997, Ghosh et al. 1999, Radhakrishna et al. 1999, Jung and Rösner 2002, Kuemmerle and Zhou 2002, Wittmann et al. 2003, Nassar et al. 2010, Huang et al. 2013, Chang and Colecraft 2015, Fisher et al. 2020, Shirazi et al. 2020). Since TTN5T30N exhibits fast guanine nucleotide dissociation, these results suggest that TTN5T30N may also act in either a dominant-negative or fast-cycling manner as reported for other GTPase mutants (Fiegen et al. 2004, Wang et al. 2005, Fidyk et al. 2006, Klein et al. 2006, Soh and Low 2008, Sugawara et al. 2019, Aspenström 2020).

      The dissociation constant (Kd) is calculated from the ratio koff/kon, which inversely indicates the affinity of the interaction between proteins and nucleotides (the higher Kd, the lower affinity). Interestingly, TTN5WT binds mGppNHp (Kd = 0.029 µM) 10-fold tighter than mGDP (Kd = 0.267 µM), a difference, which was not observed for TTN5Q70L (Kd for mGppNHp = 0.026 µM, Kd for mGDP = 0.061 µM) (Figure 2D, E). The lower affinity of TTN5WT for mdGDP compared to mGppNHp brings us one step closer to the hypothesis that classifies TTN5 as a non-classical GTPase with a tendency to accumulate in the active (GTP-bound) state (Jaiswal et al. 2013). The Kd value for the mGDP interaction with TTN5T30N was 11.5-fold higher (3.091 µM) than for TTN5WT, suggesting that this mutant exhibited faster nucleotide exchange and lower affinity for nucleotides than TTN5WT. Similar as other GTPases with a T30N exchange, TTN5T30Nmay behave in a dominant-negative manner in signal transduction (Vanoni et al. 1999).

      To get hints on the functionalities of TTN5 during the complete GTPase cycle, it was crucial to determine its ability to hydrolyze GTP. Accordingly, the catalytic rate of the intrinsic GTP hydrolysis reaction, defined as kcat, was determined by incubating 100 µM GTP-bound TTN5 proteins at 25{degree sign}C and analyzing the samples at various time points using a reversed-phase HPLC column (Figure 2F; Supplementary Figure S5). The determined kcat values were quite remarkable in two respects (Figure 2G). First, all three TTN5 proteins, TTN5WT, TTN5T30N and TTN5Q70L, showed quite similar kcatvalues (0.0015 s-1, 0.0012 s-1, 0.0007 s-1; Figure 2G; Supplementary Figure S5). The GTP hydrolysis activity of TTN5Q70L was quite high (0.0007 s-1). This was unexpected because, as with most other GTPases, the glutamine mutations at the corresponding position drastic impair hydrolysis, resulting in a constitutively active GTPase in cells (Hodge et al. 2020, Matsumoto et al. 2021). Second, the kcat value of TTN5WT (0.0015 s-1) although quite low as compared to other GTPases (Jian et al. 2012, Esposito et al. 2019), was 8-fold lower than the determined koff value for mGDP dissociation (0.012 s-1) (Figure 2E). This means that a fast intrinsic GDP/GTP exchange versus a slow GTP hydrolysis can have drastic effects on TTN5 activity in resting cells, since TTN5 can accumulate in its GTP-bound form, unlike the classical GTPase (Jaiswal et al. 2013). To investigate this scenario, we pulled down GST-TTN5 protein from bacterial lysates in the presence of an excess amount of GppNHp in the buffer using glutathione beads and measured the nucleotide-bound form of GST-TTN5 using HPLC. As shown in Figure 2H, isolated GST-TTN5 increasingly bonds GppNHp, indicating that the bound nucleotide is rapidly exchanged for free nucleotide (in this case GppNHp). This is not the case for classical GTPases, which remain in their inactive GDP-bound forms under the same experimental conditions (Walsh et al. 2019, Hodge et al. 2020)."

      Another issue with the kinetic measurements is the significance levels. Line #198-201. The three proteins are claimed to have similar values and in the nnext line, the Q70L mutant is claimed to be high.

      Our response:

      Please see our response and changes in the text according in our response to the previous comment 9. We have provided extra explanations and references to clarify why the kinetic behavior of TTN5 is unusual in several respects (Line 215-220).

      „First, all three TTN5 proteins, TTN5WT, TTN5T30N and TTN5Q70L, showed quite similar kcat values (0.0015 s-1, 0.0012 s-1, 0.0007 s-1; Figure 2G; Supplementary Figure S5). The GTP hydrolysis activity of TTN5Q70L was quite high (0.0007 s-1). This was unexpected because, as with most other GTPases, the glutamine mutations at the corresponding position drastic impair hydrolysis, resulting in a constitutively active GTPase in cells (Hodge et al. 2020, Matsumoto et al. 2021)."

      Provide data for conclusion in line#214-215

      Our response:

      We agree that a reference should be added after this sentence to make this sentence clearer (Line 228-231).

      "As shown in Figure 2H, isolated GST-TTN5 increasingly bonds GppNHp, indicating that the bound nucleotide is rapidly exchanged for free nucleotide (in this case GppNHp). This is not the case for classical GTPases, which remain in their inactive GDP-bound forms under the same experimental conditions (Walsh et al. 2019, Hodge et al. 2020)."

      • *

      How were the mutants studied here identified? random mutation or was it directed based on qualified assumptions?

      __Our response: __

      We used the T30N and the Q70L point mutations as such types of mutants had been reported to confer specific phenotypes in these well-conserved amino acid positions in multiple other small GTPases (Erickson et al. 1997, Ghosh et al. 1999, Radhakrishna et al. 1999, Jung and Rösner 2002, Kuemmerle and Zhou 2002, Wittmann et al. 2003, Nassar et al. 2010, Huang et al. 2013, Chang and Colecraft 2015, Fisher et al. 2020, Shirazi et al. 2020). In particular, these positions affect the interaction between small GTPases and their respective guanine nucleotide exchange factor (GEF; T30N) or on GTP hydrolysis (Q70L). We introduced the mutants and described their potential effect on the GTPase cycle in the introduction and cited exemplary literature. Please see also our response to comment 6 and the proposed text changes (Line 142-151).

      Could more simplification be provided for deifitinition of Kon/Koff values. And can these values be compared between mutants directly?

      __Our response: __

      *We introduce kon and koff in the modified Figure 2D, E, and they are described in the figure legends. Moreover, we present the data for calculations in Supplementary Figures S3, 4, where again we define the values in the respective figure legends. *

      • *

      Data provided are not convincing to claim that both the mutant forms have lower association with the Golgi.

      __Our response: __

      *Our conclusion is that both YFP-TTN5 and YFP-TTN5Q70L fluorescence signals tend to colocalize more with the Golgi-marker signals compared to YFP-TTN5T30N signals as deduced from the centroid-based colocalization method (Line 404-405). *

      "Hence, the GTPase-active TTN5 forms are likely more present at cis-Golgi stacks compared to TTN5T30N."

      The Pearson coefficients of all three YFP-TTN5 constructs were nearly identical, but we could identify differences in overlapping centers between the YFP and mCherry channel. 48 % of the GmMan1-mCherry fluorescent cis-Golgi stacks were overlapping with signal of YFP-TTN5Q70L, while for YFP-TTN5T30N an overlap of only 31 % was detected. This means that less cis*-Golgi stacks colocalized with signals in the YFP-TTN5T30N mutant than in YFP-TTN5Q70L, which is the statement in our manuscript. *

      • *

      IN general the Authors should strongly consider the claims made in the manuscript. For eg. "This study lays the foundation for studying the functional relationships of this small GTPase" (line 125) is unqualified as this is true for every protein ever studied and published. Considering that TTN was not isolated/identified in this study for the first time this claim doesn't stand.

      __Our response: __

      *We reformulated the sentence (Line 123-124). *

      "This study paves the way towards future investigation of the cellular and physiological contexts in which this small GTPase is functional."

      • *

      Line #185 - "characterestics of a dominant-negative...." What is this based on? From the text it is not clear what are the paremeters. Considering that no complementation phenotypes have been presented, this is a far-fetched claim Our response:

      Small GTPases in general are a well studied protein family and the here used mutations T30N and Q70L are conserved amino acids and commonly used for the characterization of the Ras superfamily members. We added explaining sentences with references to the text. The characteristics referred to in the above paragraph is based on the kinetic study.

      We modified the text as follows (Line 186-197 ):

      „Third, the mGDP dissociation from TTN5T30N (koff = 0.149 s-1) was 12.5-fold faster than that of TTN5WT and 37-fold faster than the mGppNHp dissociation of TTN5T30N (koff = 0.004 s-1) (Figure 2D, E; Supplementary Figure S3H, S4G). Mutants of CDC42, RAC1, RHOA, ARF6, RAD, GEM and RAS GTPases, equivalent to TTN5T30N, display decreased nucleotide binding affinity and therefore tend to remain in a nucleotide-free state in a complex with their cognate GEFs (Erickson et al. 1997, Ghosh et al. 1999, Radhakrishna et al. 1999, Jung and Rösner 2002, Kuemmerle and Zhou 2002, Wittmann et al. 2003, Nassar et al. 2010, Huang et al. 2013, Chang and Colecraft 2015, Fisher et al. 2020, Shirazi et al. 2020). Since TTN5T30N exhibits fast guanine nucleotide dissociation, these results suggest that TTN5T30N may also act in either a dominant-negative or fast-cycling manner as reported for other GTPase mutants (Fiegen et al. 2004, Wang et al. 2005, Fidyk et al. 2006, Klein et al. 2006, Soh and Low 2008, Sugawara et al. 2019, Aspenström 2020)."

      The claims in Line #224-227 are exaggerated. Please tone down or delete __Our response: __

      *We rephrased the sentence (Line 240-243). *

      "Therefore, we propose that TTN5 exhibits the typical functions of a small GTPase based on in vitro biochemical activity studies, including guanine nucleotide association and dissociation, but emphasizes its divergence among the ARF GTPases by its kinetics."

      Line#488-489 - This conclusion is not really supported. At best Authors can claim that TTN5 is associated with trafficking components, but the functional relevance of this association is not determined. Our response:

      *We toned down our statement (Line 604-608). *

      „The colocalization of FM4-64-labeled endocytosed vesicles with fluorescence in YFP-TTN5-expressing cells may indicate that TTN5 is involved in endocytosis and the possible degradation pathway into the vacuole. Our data on colocalization with the different markers support the hypothesis that TTN5 may have functions in vesicle trafficking."

      __Minor comments: __

      Line #95 - " This rolein vesicle....." - please clarify which role? Our response:

      We rephrased the sentence (Line 96-99).

      „These roles of ARF1 and SAR1 in COPI and II vesicle formation within the endomembrane system are well conserved in eukaryotes which raises the question of whether other plant ARF members are also involved in functioning of the endomembrane system."

      Line #168 - "we did not observed" please change to "not able to measure/quantify" __Our response: __

      *We changed the text accordingly (Line 169-171). *

      „A remarkable observation was that we were not able to monitor the kinetics of mGppNHp association with TTN5T30N but observed its dissociation (koff = 0.026 s-1; Figure 2E)."

      Line#179 - ARF# is human for Arabidopsis?

      Our response:

      *The study of Fasano et al., 2022 is based on human ARF3 and we added the information to the text (Line 180-181) *

      "(...) very similar to the koff value of HsARF3 (Fasano et al. 2022)."

      • *

      Line #181 - compared to what is the 10-fold difference?

      __Our response: __

      The 10-fold difference is between the nucleotides mGDP and mGppNHp, for both TTN5WT and TTN5Q70L. We added the information on specific nucleotides to this sentence for a better understanding (Line 181-185).

      „Second, the koff values for mGDP and mGppNHp, respectively, were in a similar range between TTN5WT (0.012 s-1mGDP and 0.001 s-1 mGppNHp) and TTN5Q70L (0.025 s-1 mGDP and 0.006 s-1 mGppNHp), respectively, but the koffvalues differed 10-fold between the two nucleotides mGDP and mGppNHp in TTN5WT (koff = 0.012 s-1 versus koff = 0.001 s-1; Figure 2D, E; Supplementary Figure S3G, I, S4F, H)."

      Lines #314-323 - are diffciult to understand, consider reframing. Same goes for the conclusion following these lines.

      __Our response: __

      We added an explanation to these sentences for a better understanding (Line 392-405).

      „We performed an additional object-based analysis to compare overlapping YFP fluorescence signals in YFP-TTN5-expressing leaves with GmMan1-mCherry signals (YFP/mCherry ratio) and vice versa (mCherry/YFP ratio). We detected 24 % overlapping YFP- fluorescence signals for TTN5 with Golgi stacks, while in YFP-TTN5T30N and YFP-TTN5Q70L-expressing leaves, signals only shared 16 and 15 % overlap with GmMan1-mCherry-positive Golgi stacks (Supplementary Figure S8B). Some YFP-signals did not colocalize with the GmMan1 marker. This effect appeared more prominent in leaves expressing YFP-TTN5T30N and less for YFP-TTN5Q70L, compared to YFP-TTN5 (Figure 5B-D). Indeed, we identified 48 % GmMan1-mCherry signal overlapping with YFP-positive structures in YFP-TTN5Q70L leaves, whereas 43 and only 31 % were present with YFP fluorescence signals in YFP-TTN5 and YFP-TTN5T30N-expressing leaves, respectively (Supplementary Figure S8B), indicating a smaller amount of GmMan1-positive Golgi stacks colocalizing with YFP signals for YFP-TTN5T30N. Hence, the GTPase-active TTN5 forms are likely more present at cis-Golgi stacks compared to TTN5T30N."

      Authors might consider a longer BFA treatment (3-4h) to see more clearer ER-Golgi fusion (BFA bodies)

      __Our response: __

      We perforned addtional BFA treatments for HA3-TTN5-expressing Arabidopsis seedlings followed by whole-mount immunostaining and for YFP-TTN5-expressing Arabidopsis lines. In both experiments we could obtain the typical BFA bodies. We included the NEW data in NEW Figure 4B, C

      **Referees cross-commenting**

      I agree with both my co-reviewers that the manuscript needs substantial improvement in its cell biology based experiments and conclusions thereof. I think the concensus of all reviewers points to weakness in the in-planta experiments which needs to be addressed to understand and characterize TTN5, which is the main goal of the manuscript.

      Reviewer #3 (Significance (Required)):

      Significance: The manuscript has general significance in understanding the role of small GTPases which are understudied. Although the manuscript does not advance the field of either intracellular trafficking or organization it holds significance in attempting to characterize proteins involved, which is a prerequisite for further functional studies.

      __Our response: __

      Thank you for your detailed analysis of our manuscript and positive assessment. Our study is an advance in the plant vesicle trafficking field.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Compared to our initial submission to Review Commons, we have addressed all the reviewers' comments. We have extensively re-written the manuscript to make it clearer to a larger audience. In particular, we have transferred Figure EV1 to Figure 1 with more complete panels and included a scheme (Figure EV3) on the steps of D2R internalization which we measure with live cell imaging. We have added a new paragraph to the start of the Discussion to summarize our main conclusions and reordered the discussion on the possible mechanisms of membrane PUFA enrichment on D2R endocytosis. All the changes in the text are in red for easier comparison with the previous version.

      As suggested by reviewer 1, we have performed additional experiments to test the specificity of the effects of PUFA treatments on D2R endocytosis, reinforcing the results shown in Figure 4 using feeding assays. We show with live cell TIRF imaging and the ppH assay that TfR-SEP endocytosis is not affected (Figure EV5) and that SEP-β2AR endocytosis and βarr2-mCherry recruitment to the plasma membrane are not affected (Figure EV6).

      Reviewer #1

      Evidence, reproducibility and clarity

      *The manuscript, using different live and fixed cell trafficking assays, demonstrates that incorporation of poly-unsaturated, but not saturated, free fatty acids in the membrane phospholipids reduce agonist induced internalization of the D2 dopamine receptor but not the adrenergic beta2 receptors or the transferrin receptor. Pulsed pH (ppH) live microscopy further demonstrated that the reduced internalization by incorporation of free fatty acid was accompanied by a blunted recruitment of Beta-arrestin for the D2R.

      I believe said claims put forward in the manuscript are overall well supported by the data and as such I do not believe that further experiments are necessarily needed to uphold these key claims. Also, the methodology is satisfactorily reported, and statistics are robust, although two-way Anova like used in Fig 1 seems appropriate for Fig 2 and 3*

      We thank the reviewer for his/her positive assessment of our work. We have checked the statistical tests used for all our measures. For Figure 2 and 3 (now 3 and 4) we test for only one factor (PUFA treatment or not) so we ran ordinary one-way ANOVA using Graphpad Prism.

      That said, I suggest that the fixed cell internalization experiments (Fig 2 and 3), which relate the effect on the D2R to B2AR and transferrin are revised. This is important since this is relevant to judge whether the effect is a general or a selective molecular mechanism since this is the one of the three assay which this comparison relies on. Alternatively, I suggest omitting this data and include the B2AR in the Live DERET assay and both B2AR and TfR in the ppH assay. Specifically, my concerns with the fixed cell internalization are: • The analysis is based on counting the number of endosomes, which is not necessarily equivalent to the number of receptors internalized

      The number of puncta, as well as their fluorescence, is reported by the analysis program (written in Matlab2021 and available upon request). We chose to show number of puncta because they reflect more directly the number of labelled endosomes (in Figures 3 and 4). As shown in the figure below, we found slight but significant differences between groups for FLAG-D2R (88.6 % and 87.6 % of average fluorescence in DHA and DPA treated cells compared to control cells), (panel A), and no differences for FLAG-β2AR (panel B). We find a significant decrease in puncta fluorescence for transferrin uptake in cells incubated with DHA (but not DPA) relative to control cells (panel C). However, because we did not detect differences in the number of puncta or in the frequency and amplitude of endocytic vesicle creation events (see below), we still conclude that enrichment with exogenous PUFAs does not affect clathrin mediated endocytosis.

      In conclusion, the most robust measure of endocytosis for this assay is the number of detected puncta per cell rather than their fluorescence.

      • The analysis relies on fully effective stripping of the surface pool of receptors - i.e clustered surface receptors not stripped by the protocol will be assessed as internalized. It is often very difficult to obtain full efficiency of the Flag-tag stripping and this is somewhat expression dependent. • The protocol for the constitutive and agonist induced internalization is different and yet shown on the same absolute graph. Although I take it the microscope gain setting are unaltered between the constitutive and agonist induced internalization I don't believe the quantification can be directly related. This is confusing at the very least. More critically however, the membrane signal from the non-stripped condition of constitutive internalization will likely fully shield internalized receptors in the Rab4 membrane proximal recycling pathway leading to under-estimation of the in the constitutive endocytosis. I believe this methodological limitation underlies the massive relative difference in the constitutive endocytosis between panel 2A,B and 2C,D. For comparison, by a quantitative dual color FACS endocytosis assay, we have previously demonstrated the ligand endocytosis a ~4 fold increased over constitutive (in concert with Fig 2A,B here) (Schmidt et al 20XX). Importantly, high relative variability by this methodology could well shield an actual effect of incorporation of FFAs on the constitutive endocytosis. We thank the reviewer for pointing this difference in the protocol. As a matter of fact, we have not used acid stripping in all the conditions used for the uptake assays (Figures 3 and 4). We apologize for the confusion and we have clarified this point in the Methods section. In early experiments we compared conditions with or without stripping but we concluded from these experiments that indeed, the stripping was not complete. Moreover, we noticed early on that many cells treated with DHA or DPA did not have any detectable cluster (13 cells out of 58 quantified cells treated with DHA after addition of QPL, 12/56 cells treated with DPA, 0/68 for cells treated with vehicle). Stripping the antibody would have made these cells undetectable, biasing the analysis. Therefore, to make our results more consistent we decided to use non-stripping conditions. To detect endosomes specifically, we used a segmentation tool developed earlier (see Rosendale et al.* 2019). This tool is based on wavelet transforms which recognizes dot-like structures. In addition, we excluded from the cell mask the labelled plasma membrane by a mask erosion.

      We agree the design of experiments was not aimed at comparing the effect of PUFA treatment on low levels of constitutive D2R endocytosis. This would require more sensitive assays and be addressed in subsequent studies.

      'Optional' Also, it would be informative to see the ppH Beta-arrestin experiments with the B2AR to assess, whether the putative discrepancy between D2R and B2AR is upstream or downstream of the blunted Beta-arrestin recruitment. To the same point, it would be very informative to assess how the incorporation of the free fatty acids affect receptor signalling, which would also help relate the effect of incorporation of the FFA's in the phospholipids to previous experiment using short term incubation with FFA's

      We have now performed live imaging experiments in HEK293 cells expressing SEP-β2AR, GRK2 and βarr2-mCherry and stimulated with isoproterenol (Figure EV6). We show that the clustering of SEP-β2AR, of βarr2-mCherry, as well as endocytosis, are not affected by treatments with DHA or DPA. In this study, we focused on the early trafficking steps of D2R internalization. It will be interesting in a future study to address its consequences on G protein dependent and independent signaling. Moreover, and for good measure, we performed experiments to assess TfR-SEP endocytosis with the ppH assay. Again, we found no difference between cells treated or not with PUFAs (Figure EV5)

      *References overall seem appropriate although Schmidt et al would be relevant for reference of the constitutive vs agonist induced endocytosis of D2R and B2AR. *

      We have now cited Schmidt et al. 2020 doi 10.1111/bcpt.13274 in the discussion with the following sentences: "D2R also shows constitutive endocytosis (Schmidt et al, 2020) which may be modulated by PUFAs although we did not detect any significant difference in our measures (see Figure 3) which were aimed at detecting high levels of internalization induced by agonists. Further work will be required to specifically examine the effect of PUFAs on constitutive GPCR internalization."

      Overall, the figures are well composed and convey the messages fairly well. Specific point that would strengthen the rigor include: • Chosing actual representative pictures of the quantitative data in Fig 2 and 3 (e.g. hard to see 25 endocytic events in Fig 2A constitutive endo, EtOH)

      We apologize for the confusion. We employ a normalization procedure to account for cell size. In addition, all numbers have been normalized to the condition stimulated with agonist with no PUFA treatment). In fact, we detect in unstimulated cells very few puncta (on average 0.6, range 0-5) compared to 27.3 clusters (range 2-87) in cells stimulated with QPL.

      • Showing actual p values for the statistical comparisons* For easier reading, we have kept the stars convention for the figures but added two tables with all statistical tests and the p values for both main figures and EV figures.

      Moreover, for ease of reading the figures (without consulting the legend repeatedly) it would be very helpful to headline individual panel with what the experiments assesses. Figure 1a and 1b for example can't be distinguished at all before reading the figure legend. Also, y-axis could be more informative on what I measured rather than just giving the unit.

      We have added titles to panels (in particular for Figure 2A,B which correspond to former Figure 1A,B) and we have given new titles to Y axes to make them clearer. We hope that the reading of our figures will now be easier.

      Finally, the figure presentation and description of S1 is very hard to follow. I cannot really make out what is assessed in the different panels.

      We have changed substantially Figure EV1 (now Figure 1) with new presentation of data: all 4 conditions (control, treated with DHA, DPA or BA) systematically presented in the same graph, and clearer titles for the parameter displayed on the Y axes. We hope that this figure is now easier to follow.

      Significance

      *The strength of the manuscript is the use and validation of incorporation of FFA's in the plasma membrane, which more closely mimics the physiological situation than brief application of FFAs as often done. Is addition, the blunted recruitment of beta-arrestin as assessed by the ppH protocol is quite intriguing mechanistically. The limitation are the relative narrow focus on the D2 receptor (and not multiple GPCRs) that does not really speak to as or assess the physiological, pathophysiological or therapeutic role of the observations (except from referring the relation between FFAs and disease). Also, despite the putative role of Beta-arrestin recruitment in the process, the actual causation in the process is not clear. This shortcoming is underscored by the putative effect on the constitutive internalization described above.

      My specific expertise for assessing the paper is within general trafficking processes (including the trafficking methodology applied), trafficking of GPCRs and function of the dopamine system including the role of D2 receptors.*

      • *

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      • *

      The only conclusion that I was able to understand from the study was that enrichment of cell membranes with polyunsaturated fatty acids specifically inhibited agonist-induced internalization of D2 receptors. However, I think that the experiments used to conclude that PUFAs do not alter D2R clustering but reduce the recruitment of β-arrestin2 and D2R endocytosis need some clarification (i.e. data depicted in Fig. 2-5). This lack of clarity might be due to the fact I am not familiar enough with the employed technologies or to the unclear writing style of the paper. There was an overuse of acronyms, initialisms and abbreviations, which are difficult to understand for researchers outside of the specific lipid field. I think that the manuscript should be written in a way to be legible also for researchers not working in the immediate filed.

      The paper was not written in a manner that a general audience of cell biologists or those interested in GPCR biology could understand and judge. It is indeed interesting that polyunsaturated fatty acids specifically inhibit D2R internalization in HEK293 cells, and it could be significant. But, it is difficult to judge the significance of the observation without more in vivo data.

      I would suggest the following. Remove all acronyms and abbreviations. Significantly, expand the Materials and Methods section, either in the manuscript or in the Supplemental section. I suggest clearly explaining each construct used, and the function of each module in the construct, with diagrams. In addition, provide a comprehensive step by step description of each experimental protocol, providing the reader with the rationale for each step in the protocol with explanatory diagrams. The authors should also more clearly explain the rationale and logic that was utilized to make the conclusions that they did from the depicted observations. Only then can a broader audience determine if the authors' conclusions are justified.

      We thank the reviewer for his/her comments. Indeed, our main message was that two types of PUFAs (DHA and DPA) specifically alter D2R endocytosis by reducing the recruitment of β-arrestin2 without changing D2R clustering at the plasma membrane. We are sorry that our writing was not clear enough. We also found out that in the last steps of the submission to Review Commons, the first paragraph of the Discussion was inadvertently erased. This made our main conclusions, summarized in this first paragraph, less clear. We have now put back this important paragraph. Moreover, we have extensively rewritten the manuscript thriving to make it as clear as possible to a large audience. We have reduced the use of acronyms to keep only the most used ones [e.g. PUFA (used 99 times), DHA (37 times), GPCR (34 times), D2R (126 times), GRK (17 times)] and made them consistent throughout the manuscript. Following the reviewer's suggestion, we have also added a scheme of the steps following D2R activation by agonist leading to its internalization (Figure EV3).

      We understand that the reviewer implies by "in vivo data" results obtained in the brain of animals. As written in the Introduction and in the Discussion, the current work follows up on a recently published manuscripts by a subset of the authors, namely (i) Ducrocq et al. 2020 (doi 10.1016/j.cmet.2020.02.012) in which we show that deficits in motivation in animals deprived in ω3-PUFAs can be restored specifically by conditional expression of a fatty acid desaturase from c. elegans (FAT1) that allows restoring PUFA levels specifically in D2R-expressing striatal projection neurons (which mediate the so-called indirect pathway), and (ii) Jobin et al. 2023 (doi: 10.1038/s41380-022-01928-6) which combines in cellulo (HEK 293 cells) and in vivo data to show that PUFAs affects the ligand binding of the dopamine D2 receptor and its signaling in a lipid context that reflects patient lipid profiles regarding poly-unsaturation levels.

      Reviewer #2 (Significance (Required)):

      • *

      In summary, I will reiterate that the reported experiments need to be much better explained to make the study understandable to a broader audience and for that audience to determine whether the conclusions are justified.

      • *

      • *

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      • *

      Summary:

      The authors investigate the role of lipid polyunsaturation in endocytic uptake of the dopamine D2 receptor (D2R). To modulate the degree of unsaturation in live cell plasma membranes, the authors incubate cell lines with pure fatty acid that is metabolized and incorporated into the cellular membranes. To quantify the internalization of D2R in these live cells, the authors utilized quantitative fluorescence assays such as DERET and endosome analysis to determine the degree and rate of D2R internalization in the presence of two model agonists - dopamine and quinpirole. The authors conclude that when the PUFA content of the plasma membrane is increased (i.e., via ω3 or ω6 fatty acids), both the quantity and rate of D2R internalization decrease substantially. The authors confirmed that these phenomena are specific to D2R as caveolar endocytosis and clathrin-mediated endocytosis were unaffected when these same experimental techniques were utilized for β2 adrenergic receptor and transferrin. Additionally, the authors conclude that the clustering ability of D2R is unaffected by lipid unsaturation but that the ability of D2R clusters to interact with β-arrestin2 is inhibited in the presence of excess PUFA. Based on these findings, the authors propose several hypothetical mechanisms for lipid-D2R interactions on the plasma membrane, which will likely be the scope of future work.

      Overall, this is a highly thorough and rigorous body of work that convincingly illustrates the connection between PUFA levels and D2R activity. However, I do not agree with the authors' conclusions pertaining to how their results should be interpreted in the context of fatty acid-related disorders. Additionally, this manuscript could benefit from some reorganization which would present the work more clearly. Please see the comments below.

      We thank the reviewer for the positive appreciation of our work, qualified as a "thorough and rigorous body of work that convincingly illustrates the connection between PUFA levels and D2R activity". We will address the specific points raised by the reviewer with our answers below.

      Comments:

        • A recurring motivation for this study that is brought up by the authors is that dietary deficiency of ω3 fatty acids is tied to D2R dysfunction. This would indicate that PUFA reduction in the plasma membrane results in D2R dysfunction. However, the experiments emphasized in this manuscript investigate the condition where PUFA content is INCREASED in the plasma membrane and D2R function is compromised. It seems inappropriate for the authors to cite dietary deficiency of ω3 as a motivation when they experimentally test a condition that is tied to ω3 surplus.* Regarding the general comment of the reviewer, we agree that direct conclusion cannot be drawn on the etiology of psychiatric disorders by looking at the effect of membrane fatty acid levels on D2R in HEK 293 cells. Nevertheless, we mention in the Introduction the intriguing occurrence of low PUFA levels in psychiatric disorders as starting point to look at D2R as an important target for psychoactive drugs prescribed for these disorders. In the Discussion, we propose that manipulating fatty acid levels might potentiate the efficacy of D2R ligands used as treatments. We felt raising these aspects was not putting too much emphasis on psychiatric disorders. However, in accordance with the reviewer's comment, we toned down these descriptions in the revised manuscript.

      The goal of increasing the levels of fatty acids at the membrane in HEK 293, the most widely used cellular system to study GPCR trafficking, was to try to emulate the levels of lipids in brain cells. Indeed, the levels of PUFAs in our culture conditions are much lower (~8 %, Figure 1B) than in brain extracts (~30 %). Therefore, the "control" condition in HEK 293 cells would correspond to PUFA deficiency while after our enrichment protocol these levels are closer to those found in brain cells. Our results could therefore be interpreted as endocytosis of D2R being augmented under membrane PUFA decrease. Importantly, increased receptor internalization often correlates with decreased signaling. Therefore, membrane PUFA enrichment in our conditions would rather potentiate D2R signaling.

      • Following up on the first comment, the authors' results seem to indicate that excess ω3's are detrimental to D2R function. This result would be at odds with the conventional view that ω3's are essential and that excessive ω3 may not be harmful. The authors should rationalize their findings in the context of what is known about excess dietary ω3.*

      The Reviewer is right that the conventional view is that excessive ω3 PUFA may not be harmful. However, this rather applies to dietary consumption, which might have limited effect to brain fatty acid contents since their accretion is highly regulated. Moreover, the majority of studies looking at ω3 supplementation have been performed in young adults and the effects on the developing brain - as it might be happening in pathological conditions in which D2R is involved - remain poorly understood. Furthermore, as mentioned above, blunted internalization of D2R under membrane PUFA enrichment is not an indication of "detrimental" to D2R function. Nor do we argue that membrane enrichment corresponds to excess PUFAs.

      • I would argue that the control experiments with saturated fatty acids (i.e., Behenic Acid in figure 1), represent a scenario mimicking ω3 deficiency as the enrichment of Behenic Acid causes an overall reduction in PUFAs (Figure EV1C - an increase in SFA must correspond to a decrease in PUFA). These Behenic acid results are the only experiments presented by the authors that mimic a scenario resembling ω3 deficiency and the results show that the D2R internalization is unaffected (Figure 1G-H). Therefore, I would further argue that if anything, the authors results suggest that ω3 deficiency is NOT correlated to D2R internalization. Again, the authors must rationalize these findings in the context of what is known about dietary intake of ω3's.*

      The Reviewer must refer to the fact that nutrients rich in SFAs are usually poor in PUFAs and vice-versa. Based on our lipidomic analysis, we now present in Figure 1B the effect of treatments (DHA, DPA, BA) on the levels of PUFAs (Figure 1B) and saturated fatty acids (Figure 1C). In cells treated with behenic acid (BA), PUFA levels are not significantly changed relative to control, untreated cells, while saturated fatty acid levels are increased. BA was used here to determine whether the effects observed with PUFAs was related to the enrichment in unsaturations or due to carbon chain length (C22). It is not the case because BA treatment, unlike DHA or DPA treatment, does not affect D2R endocytosis (Figure 2G,H).

      • It's not clear why the authors decided to include an ω6 fatty acid in this study. The authors built up a detailed rationale for investigating ω3's as they are dietarily essential and tied to disease when deficient. To my knowledge, ω6's are considered much less beneficial than ω3's in a dietary sense. The inclusion of an ω6 almost seems coerced as the ω6-related results don't provide any interesting additional insights. It would benefit the manuscript if the authors provided some additional discussion explaining why ω6's are being investigated in addition to ω3's. *

      We agree that we could have made the rationale clearer. The goal in comparing ω3-DHA and ω6-DPA was to assess whether the position of the first unsaturation (n-3 vs n-6), with the same carbon chain length (C22) might differentially impact D2R endocytosis.

      • In Figure EV1D, the AHA and DPA percentages each increase by ~6%. The corresponding Figure EV1B indicates that the overall PUFA% in the plasma membrane also increases by 6%. This makes sense as the total change in PUFA content is consistent with the amount of AHA or DPA being internalized to cells. However, this consistency was not observed with BA and SFAs. In Figure EV1E, the BA percentage increases only ~1% while the total SFA percentage in Figure EV1C increases by ~6%. How can something undergoing a 1% change (relative to total lipid content) result in a 6% overall change in SFA content?*

      The reviewer is correct: the level of SFAs is increased by 5.2% (34.5 % of total FAs in control cells to 39.7 % in BA treated cells), more than the increase in BA alone (1.18% from 0.35 % to 1.53 %). A close look at our lipidomics data showed that many of the 10 saturated fatty acids quantified are enhanced. In particular, the two most abundant ones, palmitic acid (16:0) and stearic acid (18:0) are increased, from 21.37 % to 22.28 % and 8.47 % to 11.17%, respectively. The reasons for these apparent discrepancies may involve lipid metabolic pathways which convert the rare and long BA into more common and shorter SFAs to preserve lipid contents and thus membrane properties.

      • In Figure 4, the discussion of kinetics does not make sense. How exactly are kinetics being monitored in this figure? (Recruitment kinetics are discussed in panels D and G)*

      We wanted to convey the impression that the time to reach the peak βarr2-mCherry recruitment was shorter in PUFA-treated cells than in control cells. However, after analyzing the kinetics in individual cells, we did not find a statistically significant difference in the time to maximum fluorescence. Therefore, we removed this reference to the kinetics of recruitment.

      We now write: " However, treatment with DHA or DPA significantly decreased peak βarr2-mCherry fluorescence (Figure 5F-G).."

      • In Figure 5, What is the purpose of panel D? Would it be more helpful to include additional, overlaid "cumulative N" plots for scenarios in which PUFAs were enriched? This would work well in conjunction with panel F.*

      The purpose of this panel is to show the kinetics of increase in the frequency of endocytic vesicle formation upon agonist addition, and the decrease in frequency when the agonist is removed. We have now added examples of cells treated with DHA and DPA of similar surface for direct comparison with control (EtOH) cells.

      • For the readers who are new to this area or unfamiliar with the assays used, Figure 1 is not intuitive and initially difficult to interpret. It would greatly benefit the flow of the manuscript if Figures EV1A-C and EV2A were included in the main text and "Normalized R" was clearly defined in the main text, prior to discussion of Figure 1.*

      We have now transferred Figure EV1 as Figure 1. We have adapted the scheme of the DERET assay and its legend (now in Figure EV1A) to make it clearer. We did not put in Figure 2 because this figure is already very big. We have changed "Normalized R" to "Ratio 620/520) (% max)" to be clearer and more consistent with the scheme.

      Reviewer #3 (Significance (Required)):

      • *

      General assessment: The work, for the most part, is rigorous and scientifically sound. The authors utilize impressive, quantitative assays to expand our understanding of protein-lipid interactions. However, the authors need to improve their discussion of the actual physiological conditions that correspond to their experimental results.

      • *

      Advance: This work may fill a gap in our understanding of disorders related to the dopamine D2 receptor. However, some of the results may be at odds with what is currently known/understood about dietary ω3 fatty acids.

      • *

      Audience: This work will be of broad interest to researchers in the biophysics field, with particular emphasis on researchers who study protein and membrane biophysics. This work will also be of interest to researchers who study membrane molecular biology.

      • *

      Reviewer Expertise: quantitative fluorescence spectroscopy and microscopy; membrane biophysics; protein-lipid interactions

      • *
    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: In this paper, Dresselhaus et al (2023) investigate the possibility that known cargoes of extracellular vesicles (EVs) released at the Drosophila neuromuscular junction have cell-autonomous functions rather than functions specifically conferred as a condition of their release in EVs, in vivo. To do so, authors focus their studies on use of Tsg101-KD, a mutant of the ESCRT-I machinery, of the ESCRT EV biogenesis pathway, and are able to show that for some endogenously-expressed, fluorescently-tagged cargoes, fluorescence intensity in the pre-synaptic compartment is significantly elevated (Syt4 and Evi) and the postsynaptic intensity in the muscle is significantly decreased (Syt4, Evi, APP, and Nrg).

      We note that throughout our study, we detected endogenous Nrg with a well-characterized monoclonal antibody, not a fluorescent tag. We and others previously demonstrated that endogenous Nrg detected by this antibody is trafficked from neurons into EVs, using the same pathways as other EV cargoes such as Syt4, APP and Evi (Blanchette et al., 2022; Enneking et al., 2013; Walsh et al., 2021). Thus, the EV trafficking phenotypes in our study are consistent across fluorescently tagged cargo (endogenous knockin for Syt4 and GAL4/UAS-driven for APP and Evi), as well as for untagged, endogenous Nrg, thus controlling for effects of either overexpression or tagging.

      These findings suggest that these cargoes become trapped in the endosomal system (colocalizing with early, late, and recycling endosomal compartments), rather than undergoing secretion in EVs targeting post-synaptic muscle and glia as usual. This phenotype is recapitulated for select cargoes using mutants of both early and late components of ESCRT pathway machinery. They further characterize the Tsg101 mutant, demonstrating co-occurrence of an autophagic flux defect, but as the cargo phenotype is present without induction of the autophagic flux defect for their Hrs mutants, authors suggest the overlapping role of Tsg101 in autophagy is independent of its role in the ESCRT pathway/ EV secretion. Subsequently, they use previously defined functional phenotypes of the Evi (number of active zones, number of boutons, number of developmentally-arrested ghost boutons) and Syt-4 (number of transient ghost boutons and mEJPs) cargoes to show a minimal dependence on cargo delivery via ESCRT-derived EVs for these cargoes to carry out their synaptic growth and plasticity functions in vivo. However, it should be notes that for Evi/ Wg cargo, there is a slight increase in developmentally-arrested ghost boutons suggesting the cargo may not be entirely independent of EV-mediated cargo delivery. Finally, authors express an anti-GFP proteasome-directed nanobody using motor neuron or muscle-specific drivers and find that Syt4-GFP cargo doesn't enter muscle cytoplasm as fluorescence is maintained and cargo is not degraded by the muscle proteasome. While authors suggest this as evidence of EV-mediated transfer for cargo proteostasis, it is not explicitly shown that Syt4 cargo is, in fact, trafficked and degraded by the lysosome or hypothesized how Syt4 function or post-synaptic localization may be carried out independently of EVs.

      We have added new data showing that Syt4 is taken up by glial and muscle phagocytosis (Fig. 7), and included in the discussion several possible interpretations for how Syt4 activity is carried out independently of its traffic into EVs. Indeed we believe it is more likely to function in the presynaptic neuron rather than the postsynaptic muscle.

      Major comments:

      R1.1 It is difficult to evaluate the findings of this study without knowing the extent of ESCRT pathway impairment. Please provide data quantifying the degree of knockdown/ mutant expression for each ESCRT component (i.e., western blot)

      To address the reviewer’s request to specifically measure the degree of knockdown in the RNAi lines, we tested all available reagents. Unfortunately no Drosophila Tsg101 antibody exists and we did not receive a reply to our requests for a Shrub antibody. An Hrs antibody exists, but we found that none of three available Hrs RNAi lines depleted Hrs signal, or caused a phenotype similar to the HrsD28 point mutant, suggesting that they are not effective at knocking down the protein. Therefore, we were unable to specifically measure the level of depletion in motor neurons for RNAi of Tsg101, Shrub, or Hrs.

      However, we can make a strong argument that our knockdowns were sufficiently effective to answer the questions in our study. We used RNAi as only one of several complementary tools to manipulate ESCRT function (i.e. we also used loss-of-function mutants (HrsD28/Deficiency) and dominant negative mutants (Vps4DN)). These mutants caused a comparable and severe loss of EVs to RNAi (Fig 2): therefore the extent of depletion in the RNAi experiments was sufficient to cause a similarly severe phenotype as genomic or DN mutations, meeting the definition of a bona fide loss-of-function. We also know, since we used these complementary strategies, that the phenotypes we observe are very unlikely to be due to off-target effects of the RNAi.

      More importantly, what is directly relevant for our subsequent functional experiments is to know the extent of EV depletion, which we have explicitly measured throughout the paper. It is unclear what additional insights would be gained by knowing whether the strong Tsg101 and Shrub RNAi phenotypes are due to incomplete versus complete knockdown, given that we do measure the extent of EV depletion under these conditions. Further, we note that tsg101 null mutants die as first instar larvae (Moberg et al., 2005), raising the possibility that a more complete knockdown in neurons would be lethal early in development and make our study impossible. Indeed HrsD28 is an early stop that preserves the VHS and FYVE domains but truncates the C-terminal ⅔ of the protein. Its (occasional) survival to third instar indicates that it may be a severe hypomorph rather than a null.

      We have added a sentence in the text (p12 line 21-25) to clarify that we do not know the exact extent of knockdown for our RNAi experiments, but that by genetic definitions, they meet the criteria of a loss-of-function manipulation.

      R1.2 Loss of ESCRT machinery likely disrupts the release of small EVs to a significant extent; however, the authors do not show that EV release is entirely lost, only that 1) cargoes are backed up in the endosomal system due to endosomal dysfunction and 2) fluorescence of cargoes in the postsynaptic compartment is diminished. To claim that ESCRT-derived EVs with the relevant cargoes are lost, the authors should perform immunogold labelling with TEM. This would provide direct evidence that the cargoes examined here are packaged in ILVs, and that the ILVs are of a size (~50-150nm) consistent with exosomes (which should really be referred to as small extracellular vesicles (sEVs) per the minimal information for studies of extracellular vesicles (MISEV 2018 [https://doi.org/10.1080/20013078.2018.1535750]) Additionally, EM would show the loss of cargo packaging and provide information about where these cargoes localize in the presence of ESCRT mutants/loss-of-function.

      EM (including some limited immunoEM) studies requested by Reviewer 1 have previously been performed in this system by us and by the Budnik and Verstreken labs (Koles et al., 2012; Korkut et al., 2009; Korkut et al., 2013; Lauwers et al., 2018; Walsh et al., 2021). MVBs at the NMJ contain ~50-100 nm ILVs, and can often be seen proximal to or fusing with the plasma membrane. Mutants such as Hsp90 that block this fusion also block EV release, arguing that these MVBs are the source of EV (Lauwers et al., 2018). By immunoEM, the EV cargo Evi localizes to MVBs (Koles et al., 2012). ~50-200 nm structures containing immunogold against Evi were also observed in the subsynaptic reticulum between the neuron and the muscle, as well as in membrane compartments in the muscle cytoplasm (Koles et al., 2012; Korkut et al., 2009). Thus, the criteria requested by the reviewer have previously been established in this system.

      In response to the reviewer’s request to show that these structures are altered in ESCRT mutants, we attempted immunoEM experiments in the Tsg101KD condition. However, similar to the previously published results (Koles et al., 2012; Korkut et al., 2009), immunoEM in thick tissue such as Drosophila larval fillets is quite challenging, and we found it very difficult to retain immunogenicity together with excellent fixation and preservation of membrane structures, such that we could rigorously measure compartment morphology and size. Even if we did achieve good structural preservation, exosomes are ambiguous in complex membrane-rich tissues, since cross-sections through the extensively infolded muscle membrane (e.g. see Fig 3B) are very similar in size to EVs.

      As an alternative and more robust approach, we used STED microscopy, with a resolution of ~50nm, where we could conduct a rigorous and properly powered study of directly labeled EV cargoes (New data in Fig. S1). We show that postsynaptic Nrg and APP-GFP are found in structures with a mean diameter of ~125 nm, consistent with small EVs or exosomes, and these are strongly depleted in the Tsg101KD animals (to similar levels as antibody background far from the site of EV accumulation), as expected. Note that we are able to detect particles significantly smaller than 125 nm in the distribution, suggesting that the resolution of our system is sufficient to measure EV width.

      We also note that several of these cargoes are detected via an intracellular tag (Syt4, APP, Evi) or antibody against an intracellular domain (Nrg), so by topology they must be membrane-bound in the EVs rather than cleaved from the cell surface. We and others have previously shown that this postsynaptic signal is entirely derived from the presynaptic neuron, by using neuronal UAS-expression of a tagged protein, by neuronal RNAi of the endogenous gene, or by the tissue-specific tagging approach in the current manuscript (Fig. S4). We have also previously shown that these puncta contain the tetraspanin Sunglasses (CG12143/Tsp42Ej), which is an EV marker (Walsh et al., 2021). We have added new data to our manuscript (Fig. S1A) to show that neuronally-derived tetraspanin EVs are depleted in upon Tsg101KD. Therefore, the reviewer’s point “2) fluorescence of cargoes in the postsynaptic compartment is diminished.” is the most direct and sensitive test of trans-synaptic cargo transfer, and is the precise parameter that we are trying to manipulate to test the functions of this transfer.

      We believe that light microscopy showing loss of presynaptically-derived cargoes in the postsynaptic region is the best and most direct argument for loss of EV secretion, compared to the ambiguity of EM. It is also exactly the method that led to the proposal for the signaling function of EVs in previous work, which our current manuscript is revisiting. We are now using improved tests of that original hypothesis by examining it in light of additional membrane trafficking mutants (and finding that it no longer holds up). Overall, given the preponderance of evidence from the preceding literature and our studies indicating that (1) these cargoes are indeed in EVs and (2) we see a strong enough depletion of transsynaptic transfer to challenge the hypothesis that EVs serve signaling functions (see R1.3 response below), we are reluctant to spend more time attempting immunoEM which is not likely to resolve membrane structures.

      To address the point of EV terminology used in our manuscript, we think it is very unlikely that the postsynaptic structures are not exosomes. The criteria defined by MISEV for exosomes is that they are endosomally-derived from MVBs, ideally with the EV “caught in the act of release” upon fusion with the plasma membrane. As noted above, cargoes such as Syt4 and Evi are observed by immunoEM in MVBs, and these can be found in the process of fusing with the plasma membrane (i.e. caught in the act of release) (Koles et al., 2012; Korkut et al., 2009; Korkut et al., 2013; Lauwers et al., 2018). Mutants that block MVB fusion also block EV release at the NMJ (Lauwers et al., 2018). These EVs require ESCRT for their formation and are trapped in endosomes rather than the plasma membrane upon ESCRT depletion (this study). They depend on multiple components of the endosomal system (Rab GTPases, retromer) for their formation (Koles et al., 2012; Walsh et al., 2021). Taken together, it seems to us that there is sufficient data to argue that these are exosomes. However, as the reviewers requested, we have called them EVs in the revised paper (and only suggest they are exosomes in the discussion).

      R1.3 Other biogenesis pathways utilize multivesicular bodies to generate EVs, most prominently the nSMase2/ceramide synthesis pathway (which operates in an ESCRT-independent manner). It is possible that this pathway compensates when there are defects in the canonical ESCRT pathway. Thus, it is imperative for the authors to show that the cargo secretion no longer occurs in the presence of ESCRT mutations/loss-of-function. The authors should also use nSMase2 pathway mutants to see if the phenotypes in cargo trafficking (i.e., pre/ post-synaptic protein levels) are recapitulated.

      The reviewer asked us to show that cargo secretion does not occur in the ESCRT mutants. We reiterate that at the limits of detection of our assay, we see a very strong depletion of secretion__, and that EV cargo levels are not distinguishable from background (__Figure S1). Perhaps Reviewer 1’s concern is that since it would never be possible to show that we have depleted EVs completely (i.e. below the level of detection of our assays), that it is not possible to challenge the hypothesis that EV traffic is required for the proposed signaling functions of EVs. Indeed, they mention in their overall assessment “as it is unknown if minor sources of cargo+ EVs are sufficient in maintaining functional phenotype”. We do have some information on this, as described in the manuscript (p3 lines 41-43; p7 lines 25-31; p11 lines 27-30) and as follows: The critical argument against this concern is that other trafficking mutants with residual levels of EVs (rab11 or nwk) do show loss of signaling function (Blanchette et al., 2022; Korkut et al., 2013). Therefore residual EVs, even at the lower level of detection of our assay, are not enough to support signaling. The main difference is that in nwk and rab11 mutants the levels of the cargo in the donor presynaptic neuron are also strongly depleted, unlike in the ESCRT mutants. This strongly suggests that the cargoes are signaling from the presynaptic compartment, rather than in EVs. We have added the nwk mutant to show this baseline in Figure 2A,D. Similarly, our new results showing that hrs mutants retain Wg signaling while Tsg101 mutants do not, despite a similar degree of EV depletion (new data with more cargoes in Figure 2A-F), argues that residual EVs do not account for the lack of disruption of signaling. Finally, we have been transparent in our discussion that trace amounts of EVs could still exist, including by alternative pathways, but are unlikely to provide function (p11 lines 25-33).

      We agree that it might be an interesting future mechanistic direction to ask if the SMase pathway works with or in parallel to the ESCRT pathway (both have been suggested in the literature). However, we do not believe that this is essential for the current work: The SMase pathway is unlikely to be “compensating”, since EVs are already very strongly depleted with ESCRT disruption alone. We also note that SMase depletion may also affect other trafficking pathways (Back et al., 2018; Choezom and Gross, 2022; Niekamp et al., 2022), and therefore might not provide any clarifying information if it did disrupt signaling. In summary, we believe the depletion we see in single ESCRT mutants is sufficient to (1) establish the role of ESCRT in EV traffic in this system, and (2) test the role of transsynaptic transfer in signaling functions of cargoes.

      R1.4 The authors' findings support that cargo trafficking is affected by widespread endosomal dysfunction but doesn't cleanly prove that 1) synaptic sEV release is lost and 2) that cargo-specific sEVs are lost. As previously mentioned, loss of cargo+ ILVs in MVEs by TEM could demonstrate this, but another useful approach would be to include in vitro Drosophila primary neuronal culture/ EV isolation and mass spec/proteomic characterization studies as proof of concept. According to widely agreed upon guidelines in the EV field, the authors should directly characterize their EV population to show 1) the appropriate size distribution associated with exosomes/sEVs, 2) the presence of traditional EV markers (i.e., tetraspanins), 3) changes in overall EV count by ESCRT mutants, and 4) decreased levels of cargo(es) of interest in the presence of ESCRT mutants/loss-of-function. In vitro experiments would be particularly helpful for quantifying the degree of loss of cargo-specific EVs with each ESCRT mutant. These experiments could also investigate the possibility that cargoes are secreted in nSMase2/ Ceramide-derived EVs, by showing that EV cargo levels are unaffected in nSMase mutants.

      Our data already show loss of cargo-specific EVs, defined by puncta of several independent specific cargoes in the extraneuronal space and postsynaptic muscle. To further substantiate this, we have directly characterized our EV population and shown a distribution of ~125 nm extraneuronal structures containing the transmembrane cargoes Nrg and APP (by STED) as well as Evi, Syt4 and the EV marker tetraspanin (by confocal microscopy). This addresses the (1) size distribution, (2) EV marker and (3) count criteria. All these markers (cargoes and tetraspanins) are severely depleted from the postsynaptic area in the ESCRT mutants, satisfying the (4) decreased levels criteria. As noted above, we and others have repeatedly demonstrated that these postsynaptic puncta are derived from neurons, and since we are detecting the intracellular domain in all cases, must be membrane-bound. Others have previously shown by EM that several of these markers are surrounded by membrane and derived from neuronal MVBs (see R1.2). Note that we do not believe that ESCRT mutants must necessarily cleanly show enlarged endosomes without ILVs or a class E vps compartment - instead stalled endosomes appear to be targeted for autophagy in heterogeneous intermediates (Fig 3).

      We do not believe that turning to a heterologous system (e.g. cultured primary Drosophila neurons, which do not even form functional synapses) is usefully translatable to results in neurons in vivo. Data from our lab and many other systems has shown that EV biogenesis and release pathways are highly cell-type specific (p9 lines 8-12), and also differ in different regions of neurons (eg synapses vs soma) (Blanchette and Rodal, 2020). Further, keeping the experimental setup of the original for EV signaling hypothesis is a prerequisite for our improved tests of this hypothesis. We do note that APP, Evi and Syt4 have been demonstrated by us and others to be released from Drosophila S2 cells in EVs defined by differential centrifugation, sucrose gradient buoyancy, electron microscopy and mass spectrometry (Koles et al., 2012; Korkut et al., 2009; Korkut et al., 2013; Walsh et al., 2021). However even if we did measure the precise change in EV number and cargoes upon ESCRT manipulation in these heterologous cells, it would not allow us to conclude that the same quantitative change was happening in the motor neurons of interest in vivo, which is the information we need to conduct our tests of cargo signaling function. All we would learn is whether ESCRT was required in that cell type, which would not be informative for our study.

      We appreciate that EV researchers working in cell culture systems often use a set of approaches including bulk isolation, EM, and mass spectrometry. Our system does not allow for these approaches, but provides complementary strengths of single EV characterization, in vivo relevance with functional assays, and a wealth of genetic tools. MISEV itself states that it does not provide a set of agreed-upon rules that can be applied generically to any experiment. We agree with the MISEV statement that we should use the best available assays for the system under investigation.

      R1.5 During functional tests of Evi+ motor neurons lacking generation of Evi+ EVs, there is a slight defect observed, namely the increased formation of developmentally arrested ghost boutons when Evi secretion in sEVs is lost. As mentioned, Evi is a transporter of Wg and it is possible for Wg to be transmitted between cells via normal diffusion. Thus, some basal levels of Wg may be reaching the muscle when its transfer via sEVs is abolished, and these basal levels may be sufficient to phenocopy the WT in the number of active zones and boutons. Is it possible that this element of Evi/ Wg function is dose-dependent and thus reliant on the extra Evi/ Wg transferred via sEVs? If possible, the authors should use a Wnt-signaling pathway reporter (i.e., fluorescently tagged Beta-Catenin) to measure the levels of Wnt signaling activity in the muscle when Evi/Wg+ EVs are present vs. abolished. If the degree of Wnt signaling (readout would be intensity of fluorescent reporter) is decreased without Evi+ sEVs, there may be a dose-dependent response. Otherwise, please more clearly disclose the partial loss of Evi function without Evi+ sEVs or state the intact function of Evi without sEVs as speculative.

      We agree that Wg is likely to be reaching the muscle in the absence of Evi exosomes via conventional secretory mechanisms, and have conducted new experiments to test this hypothesis (Fig. 5). In Drosophila muscles, Wg does not signal via a conventional b-catenin pathway. Instead, neuronally-derived Wg activates cleavage of its receptor Fz2, resulting in translocation of a Fz2 C-terminal fragment into the nucleus (Mathew et al., 2005; Mosca and Schwarz, 2010). We did attempt to directly measure Wg (using antibodies or knockins) and though we were able to detect a specific presynaptic signal, the background noise throughout the postsynaptic muscle was too high for a sensible quantification. In response to the reviewer’s question and also R2.6), we collaborated with the laboratory of Timothy Mosca to test Fz2 nuclear import in Tsg101 and Hrs mutants (new Figure 5F-G). Strikingly, we found that Hrs mutants, despite being extremely sickly, have normal nuclear import of Frizzled. We also confirmed that Hrs mutants have dramatically depleted levels of all EV cargoes examined, including Evi (Figure 2A-F). On the other hand we found that Tsg101 knockdowns have dramatically reduced Wg signaling (and a concomitant defect in postsynaptic development). We do not rule out (but think it is unlikely) that very small amounts of EVs could be present in hrs but not tsg101 mutants. A more parsimonious interpretation is that additional membrane trafficking defects in the Tsg101 mutants (which are beyond the scope of this study to explore in detail) block an alternative mode of Wg release, perhaps conventional secretion. The fact that Hrs mutants, despite showing similar depletion of Evi EVs, do not have a signaling defect strongly argues that EV release per se is not required for Wg signaling.

      R1.6 To support the authors' hypothesis that Syt4 transmission via EVs is a proteostatic mechanism, the authors should determine whether Syt4 cargo localizes to lysosomal compartments in muscle, glia, or both. Otherwise, the proteostatic degradation of Syt4 via EVs is speculative.

      Our data suggest that EVs serve as one of several parallel proteostatic mechanisms for presynaptic cargoes. We have added new data to the manuscript to emphasize the advance our work makes in our understanding of these mechanisms, and have emphasized this in the discussion on p 11-12, lines 46-5).


      1. Degradation of neuronally derived EVs in glia and muscles. Previous work has shown that EV cargoes such as Evi can be found in compartments in the muscle cytoplasm, and that a-HRP-positive puncta are taken up and degraded by glial and muscle phagocytosis (Fuentes-Medel et al., 2009). These a-HRP-positive structures, despite colocalizing with EV cargoes Syt4, Nrg and APP (Walsh et al., 2021), were not previously connected to EVs. We have added new data showing that muscle or glial-specific RNAi of the phagocytic receptor Draper leads to the accumulation of EVs containing Syt4 (new Figure 7G-H)). Together with our finding (Figure 7A-F) that Syt4 is not significantly detected in the muscle cytoplasm, these results indicate that the main destination for transynaptic transfer is phagocytosis by the recipient cell. We have not been able to convincingly detect EV cargoes in the endolysosomal system of muscles, even in mutants disrupting lysosomal traffic, likely because the small number of EVs released by neurons (even over days of development) are drastically diluted in the much larger muscle cell.
      2. Compensatory endosomophagy in the neuron. __When EV release is blocked in Hrs or Tsg101 mutants, we observe an induction of autophagy in the neuron (__Figure 3B, E-G). However, in the absence of ESCRT manipulation, autophagy mutants do not accumulate EVs (Figure 3C,D. S2H-I). This suggests that autophagy is a compensatory mechanism that is induced in the absence of EV release.
      3. Retrograde transport to cell bodies: We previously found that disruption of neuronal dynactin leads to accumulation EV cargoes in presynaptic terminals (Blanchette et al., 2022), suggesting that retrograde transport is a mechanism for removal of these cargoes from synapses. Interestingly, EV release is not increased in these conditions, indicating that the retrogradely transported compartment represents a late endosome without ILVs, or an MVB that cannot fuse with the plasma membrane.

        R1.7 Please discuss alternate modes of cargo transfer from the presynaptic compartment to the postsynaptic compartment that may be utilized when EV-mediated transfer is abolished (i.e., cytonemes or tunneling nanotubules).

      We have added these possibilities to the discussion (p11 line 31), though we note that we do not observe any such structures, or indeed any Syt4 in the muscle cytoplasm, and there is no current evidence for such transsynaptic structures in this system. Conventional secretion of Wg into the extracellular space and signaling through its transmembrane receptor Frizzled2 can account for Wg signaling in the absence of exosomes.

      R1.8 OPTIONAL: Investigate the mechanism of Syt4+ sEV fusion with the postsynaptic compartment (direct fusion with the plasma membrane, receptor-mediated fusion, endocytosis and unpacking, or endocytosis and degradation).

      We note that the Budnik lab has already shown that HRP-positive EVs released by NMJs are taken up by glia and muscles (Fuentes-Medel et al., 2009), and we have added data showing that this also applies for Syt4 (Fig. 7). Our data are not consistent with Syt4 fusing with recipient cell membranes or entering the muscle cytoplasm. Further investigation of this mechanism is beyond the scope of this project.

      Given that several fundamental questions have yet to be answered regarding the biogenesis pathways and machinery utilized for EV-mediated cargo secretion, and the necessity for further TEM studies and/or work with primary cultures to characterize ILVs and EVs, >6 months is estimated to perform the necessary experiments that may require learning/ optimizing new systems.

      Minor comments:

      R1.9 Please clarify the choice of using Tsg101 KD in place of mutants of other ESCRT machinery (i.e., Hrs). Especially as when the Tsg101 mutant was characterized, you found major defects in autophagic flux that were not present for HrsD28/Df.

      Tsg101 RNAi was selected since it provides a neuron-autonomous knockdown, eliminating the complications of mutant effects in other tissues. These animals are also relatively healthy as third instar larvae compared to genomic mutants tsg1012 (L1 lethal) and HrsD28 or motor-neuron driven Vps4DN (where L3 larvae are rare). This made it easier to recover enough larvae to properly power experiments, and alleviated concerns that general sickness is contributing to the phenotype (though note that neuronal Tsg101KD does result in pupal lethality). Finally, we were unable to effectively knock down Hrs by RNAi (see R1.1). To extend our studies beyond Tsg101, we have included additional experiments in the revised manuscript showing that HrsD28 animals, despite being quite unhealthy, still retain Syt4-dependent functional plasticity (See R2.5 and R3.4) and Wg signaling.

      R1.10 Please clarify why the specific method in experiment in Fig. 4E-J was chosen. As Syt4 is a transmembrane protein, is likely undergoes degradation via the lysosome, like other membrane-bound proteins. Is it known whether the proteasome-directed nanobody is sufficient to pull Syt4 from membrane-bound compartments to undergo degradation in the proteasome? Would it make more sense to use a lysosome-directed nanobody?

      The GFP tag on Syt4 is cytosolic rather than lumenal. Our data show that when we express the proteosome-directed nanobody presynaptically, it efficiently degrades membrane-associated Syt4-GFP (Fig. 7B). Therefore we expect that this tool should be similarly effective on membrane-associated Syt4-GFP if it were exposed to the muscle cytoplasm. We have confirmed that it is effective in the muscle against DLG-GFP (Fig. S5A)

      R1.11 Please provide further methodological information regarding the sample preparation for live imaging of axons to generate kymographs found in Fig. S3.

      Additional details have been provided on p14 lines 10-24 and p15 lines 31-37.

      R1.12 In Figure 1I and 1J, include representative image and quantification of Syt4-GFP pre- and post-synaptic intensity for HrsD28/Df for consistency with ShrubKD and Vps4DN in Figure 1K-P.

      We generated and tested HrsD28; Syt4-GFP (Fig 2A,D), and HrsD28; Evi-GFP strains (Fig 2B-E). All EV cargoes exhibited a dramatic post-synaptic depletion in Hrs mutants, similar to the other ESCRT manipulations.

      R1.13 In Figure 2H, please provide a cell type marker or HRP mask with a merged image for image clarity.

      This image shows neuronal cell bodies in the ventral ganglion, which are densely packed relative to each other. The cell type specificity is provided by the motor neuron driver. We did not use a cell type marker or individually mask cells for analysis, but instead quantified intensity over the whole field of view. We can manually trace cell bodies in this image if requested, but it would not represent our ROI for analysis.

      R1.14 In Figure 4B, please provide quantification for the differences between 1) WT Mock and Tsg101 MOCK and 2) WT Stim and Tsg101KD Stim to show that upon stimulation, WT and Tsg101 undergo the same increase in the number of ghost boutons/ NMJ in Muscle 4.

      We have added these statistical comparisons to the graph (Fig. 6B)

      R1.15 In Figure 3 G and H, use consistent scale bars to compare between temperatures.

      We have removed the Shrub data at 20º as it did not provide additional insight to the manuscript.

      Reviewer #1 (Significance (Required)):

      General assessment (Strengths):

      -Use of Drosophila NMJ model system consistent with others in the field and exceptional harnessing of genetic tools for mutations across the ESCRT pathway (-0, -I, -III, etc.) -Identification of ESCRT pathway mutants that do not deplete pre-synaptic cargo levels but generate endosomal dysfunction, indicative of a possible decrease in secretion of cargoes via EVs -Implementing functional characterization of Evi/ Wg and Syt4 cargoes, consistent with previous work in the field; highly reproducible

      -Sufficiently thorough investigation of the cross-regulation of autophagy and EV biogenesis by Tsg101

      General assessment (Weaknesses):

      -Lack of investigation of known ESCRT-independent pathways/ genes involved in the generation of sEVs (i.e., nSMase2/ Ceramide) especially as it is unknown if minor sources of cargo+ EVs are sufficient in maintaining functional phenotype

      See R1.3 for comments on this point

      -Lack of sEV characterization and validation of EVs derived from mutant

      We have added STED data to measure EV size, and described the challenges in EV membrane measurements by EM in the in vivo system.

      -Does not show the loss of cargoes of interest on EVs from mutants other than through back-up of cargoes in the presynaptic endocytic pathway (Rab7, Rab5, Rab11)

      We strongly disagree with this comment. We have explicitly measured the loss of numerous cargoes in postsynaptic structures that have been rigorously established to be EVs in this and previous publications. Our findings are not limited to back-up of presynaptic structures.

      -Lack of rigorous investigation of the claim that Evi and Syt4 are released via EVs for proteostatic means is missing. Authors should demonstrate the degradation of EV cargoes by recipient cells (either muscle OR glia)

      We have added new data and discussion on multiple and compensatory proteostatic pathways.

      -If EV-mediated cargo transfer is not required, authors should investigate alternate modes of cargo transfer more rigorously (i.e., diffusion of Wg, suggest/ test hypotheses for mechanism of Syt4 function or transfer).

      We have included discussion of alternate modes of transfer for Wg (i.e. conventional secretion). By contrast, for Syt4 we believe it is acting in the donor cell without transfer, and have included alternate interpretations of the previous literature that had suggested its function in muscles.

      Advance: -Compared with other recent in vivo studies of EVs where donor EVs are loaded with a cargo, such as Cre, which uniquely identifies recipient cells through Cre recombination-mediated expression of a fluorescent reporter (Zomer et al 2015, Cell), this study relies on the readout of fluorescently tagged cargo in the recipient cells to represent transfer via EVs. While numerous studies in the Drosophila field focus on the same small set of known EV cargoes at the NMJ (Koles et al., 2012; Gross et al., 2012; Korkut et al., 2013; Korkut et al., 2009; Walsh et al., 2021), there is a noticeable lack of EV characterization based on MISEV (i.e. TEM of EVs, size distribution, enrichment of well-known EV markers [https://doi.org/10.1080/20013078.2018.1535750]) that would significantly strengthen the work and make it more widely accepted in the EV field.

      As mentioned above, many of these criteria (including EV size and enrichment of known EV markers) are already established in the previous literature for this system. As requested, we have also added similar data to our revised manuscript.

      -In this study, the use of ESCRT machinery mutants is proven as a new technical method in delineating the role of EV cargoes in cell-autonomous versus EV-dependent functions. This is the first study, to my knowledge, that has leveraged mutants from both early and late ESCRT complexes for the study of EVs in Drosophila. Additionally, the finding that some cargoes may be able to carry out their signaling functions, independent of transfer via EVs, provides key mechanistic insight into one possible role of EVs as proteostatic shuttles for cargo. This work also begins to address a fundamental question in the field, which is to delineate roles that EVs actually carry out in physiological conditions, compared to the many roles that have been shown possible in vitro.

      We appreciate the reviewer’s insight into the impact of our work.

      Audience: -Basic research (endosomal biology, ESCRT pathway, cell signaling, neurodevelopment)

      -Specialized (Drosophila, Neurobiology; Extracellular Vesicles)

      -This article will be of interest to basic scientists in the field of endosomal trafficking and extracellular vesicle biology as well as though studying the nervous system in Drosophila melanogaster. As the field of extracellular vesicle biology has broad implications in the spread of pathogenic cargoes in cancer and neurodegenerative disease, the basic biology associated with EVs has some translational relevance.

      Expertise (Keywords):

      -ESCRT and nSMase2 EV biogenesis pathways

      -EV characterization in vitro/ live imaging studies

      -EV release and uptake

      -Neuronal and glial cell biology

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This manuscript addresses the role of exosome secretion in neuromuscular junction development in Drosophila, a system that has been proposed to depend on exosomes. In particular, delivery of Wingless via exosomes has been proposed to promote structural organization of the synapse. Previously, however, the studies that proposed this model targeted the cargoes themselves, rather than targeting exosome biogenesis or secretion. In this new study, exosome biogenesis is targeted via knockdown of the ESCRT components Hrs, TSG101, and Chmp4. The authors find that some previously ascribed functions are not inhibited by these knockdowns. In particular, formation of active zones, as defined by BRP-positive puncta (total and per micrometer), and total bouton numbers. It does look like there is a partial defect in BRP-positive puncta per micrometer, but it is not significant. For ghost bouton formation, there is a similar increase in evi-mutant and ESCRT-KD NMJs (with some subtle differences depending on abdominal segment and temperature). They also examine the role of Syt4, which has been proposed to be transferred from nerve to muscle cells at the junction and to regulate mEJP frequency after stimulation. They found no difference in mEJP frequency after stimulation between WT and TSG101-KD animals, although they did not have a positive control with inhibition of Syt4. They did do an elegant experiment to demonstrate that most of extracellularly transferred Syt4 does not reach the muscle cytoplasm. Overall, it is an interesting paper, mostly well controlled and rigorous, and well-written. It is an important contribution to the EV and NMJ fields. The data should provoke reconsideration of some of the functions that were previously ascribed to exosome transfer at the NMJ. However, I do think that there are some overly strong statements and the functions of the exosomes at the synapse were quite narrowly examined. For example, the title of the paper is pretty strong and the abstract does not say which functions were or were not affected by TSG101 KD. There are also a couple of experiments that would enhance the manuscript. Some specific suggestions are below:

      R2.1 Title: "ESCRT disruption provides evidence against signaling functions for synaptic exosomes" seems a bit broad -- only evi/Wg and Syt4 functions were examined at NMJ synapses, not all signaling functions of all exosomes at all synapses. Something like, "ESCRT disruption provides evidence against signaling functions for exosome-carried evi/Wg and Syt4 at the neuromuscular junction" seems a bit more reasonable.

      We are open to changing the title to: “ESCRT disruption provides evidence against transsynaptic signaling functions for some extracellular vesicle cargoes” though we prefer to leave it as is since “provides evidence against” is already fairly understated.

      __ __R2.2 Abstract: the description of the actual data is very little, just one sentence saying that "many" of the signaling functions are retained with ESCRT depletion. I think a bit more focus on the actual data is warranted.

      We have edited the abstract to include more detail on the signaling phenotypes.

      __

      __R2.3 Results section:

      Fig 3: What does A2 and A3 mean for the graphs in c,d,e, g, h? Please specify in figure legend.

      We have described in the figure legends that A2 and A3 refer to specific abdominal segments in the larvae.

      R2.4 The sentence "Further, active zones in Tsg101KD appeared morphologically normal by TEM (Fig.2B)." is confusing to me. What do you mean by that? Are you referring to the following two sentences about feathery DLG and SSR? But the feathery DLG I presume is in Fig 3, where that staining is. And I also don't know what feathery DLG means -- it should be pointed out in the appropriate image.

      Presynaptic active zones are defined by an electron-dense T-shaped pedestal at sites of synaptic vesicle release, and can be seen in the TEM in what is now Figure 3B, marked as AZ. We have also labeled AZ by immunofluorescence (Fig. 5A) and they appear normal.

      By contrast, Dlg primarily labels the postsynaptic apparatus associated with the infoldings of the muscle membrane. In control animals, Dlg immunostaining is relatively tightly and smoothly clustered within ~1µm of the presynaptic neuron. By contrast, in Evi mutants, there are wisps of Dlg-positive structures extending from the bouton periphery. We have added arrows in what is now Fig. 5C to indicate the feathery structures.

      R2.5 Fig 4 addresses Syt4 function. However, there is no positive control inhibiting Syt4 to see if there is a change. Just comparison of WT and TSG101. It seems like this positive control is in order.

      We have added the positive control (Fig. 6E-F) reproducing the previously reported result that Syt4 mutants lack the high-frequency stimulation-induced increase in mEPSP frequency (HFMR). We have also added new data on HrsD28 genomic mutants. Despite the fact that few of these larvae survive and they are quite unhealthy, they still exhibit robust HFMR, similar to the Tsg101KD larvae, strongly supporting our hypothesis.

      R2.6 Discussion: I think some discussion of what ghost boutons are and what the possible significance is of the evi and ESCRT mutant phenotype of enhanced ghost bouton formation

      We have added more discussion on the ghost bouton phenotype (p11 lines 5-14), especially in light of our new findings that Hrs and Tsg101 mutants may distinguish alternative modes of Wg secretion (see R1.5)

      R2.7 Also, in the Discussion, it is mentioned that Wg probably gets secreted in the ESCRT mutants -- presumably this accounts for the discrepancy between evi mutants and the ESCRT mutants. An experiment to actually test this would greatly enhance the manuscript.

      We have added this experiment as addressed in R1.5

      Reviewer #2 (Significance (Required)):

      Overall, it is an interesting paper, mostly well controlled and rigorous, and well-written. It is an important contribution to the EV and NMJ fields. The data should provoke reconsideration of some of the functions that were previously ascribed to exosome transfer at the NMJ. However, I do think that there are some overly strong statements and the functions of the exosomes at the synapse were quite narrowly examined. For example, the title of the paper is pretty strong and the abstract does not say which functions were or were not affected by TSG101 KD.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Dresselhaus et al. investigates signaling functions for synaptic exosomes at the Drosophila NMJ. Exosomes are widely seen in vivo and in vitro. They are clearly sufficient to induce signaling responses in vitro, but whether they normally fulfill signaling functions in vivo has not been rigorously addressed. The authors make use of several mutants that block exosome release to test whether exosome release is important for two distinct signaling pathways: the Evi/Wg pathway and the Syt4 signaling pathway. Both pathways have been implicated in neuron to muscle signaling. Surprisingly, the authors find scant evidence that exosome release is required for either pathway. They convincingly show that knockdown of Tsg101 (an ESCRT-I component) does not phenocopy many synaptic phenotypes of either wg or syt4. Instead, they propose that in vivo, exosomes may serve as a proteostatic mechanism, as a mechanism for the neuron to dispose of unwanted/damaged proteins.

      Specific comments are below:

      R3.1 Loss of Tsg101 has been linked to upregulated MAPK stress signaling pathways and autophagy. Thus, it's possible that activating such compensatory mechanisms in Tsg101 knockdown animals could mask phenotypes associated with specific loss of EV cargoes such as Wg or Syt4. Indeed, the authors demonstrate that loss of Tsg101 and Hrs have very different effects on synaptic autophagy. To provide additional evidence that Wg or Syt4 signaling is independent of EV release, it would be good to check for wg/syt4 phenocopy in additional ESCRT complex mutants. I understand they did a bit with Shrub knockdown at low temperature in Figure 3, but the temperature-dependence of the ghost bouton phenotype clouds the interpretation. Could the authors try a motorneuron driver with a more restricted phenotype to overcome the lethality issues, or alternatively use one of their other ESCRT component mutants? This is obviously the central claim of the manuscript, and it would be strengthened by carrying out phenotypic analysis in mutants other than the Tsg101 RNAi line.

      As noted for R2.5, we have added HFMR experiments for the HrsD28 genomic mutant, and found that despite being very unhealthy, they exhibit robust HFMR similar to Tsg101KD. We also confirmed dramatic depletion of Syt4 EVs in the HrsD28 mutant. Thus, the preserved Syt4 signaling function in ESCRT mutants with depleted EV Syt4 is not restricted to Tsg101, and does not depend on the co-occurring autophagy phenotype.

      R3.2 In Figure 1, the authors show that neuronal Tsg101 RNAi dramatically reduces "postsynaptic" levels of exosome cargoes at the L3 stage to argue that exosome release is blocked in this mutant. While this seems very likely at the L3 stage, it is unclear when Tsg101 levels are reduced and thus when exosome release is impaired in this background. This is important because we don't know when these signaling pathways act. For example, it is possible that the critical period for Wg and Syt4 signaling is during the L1 stage, and that Tsg101 knockdown is incomplete at that stage. It is important to assay exosome release at earlier larval stage, particularly when RNAi is the method used to reduce gene function.

      We have conducted this experiment. We noted accumulation of cargoes in Tsg101KD L1 larvae, indicating that the RNAi is effective early in development. However, we do not find many EVs in either wild-type or Tsg101KD first instar larvae (red is a-HRP, green is Syt4-GFP). This argues that it is unlikely that EV-mediated signaling has a critical period earlier in development. It is likely that the accumulation of EVs that we observe trapped in the muscle membrane reticulum in third instar larvae were laid down over days or hours of development. We do not propose to include these data in the manuscript unless the editors and reviewers prefer that we do so.

      R3.3 If the Syt4 and Evi exosomes do not serve major signaling roles and are in fact neuronal waste, it seems likely they are phagocytosed by glia. Are levels of non-neuronal Syt4/Evi levels increased when glial phagocytosis in blocked (eg in draper mutants)?

      As mentioned above, the Budnik lab previously showed that uptake and degradation of postsynaptic a-HRP-positive structures depends on glial and muscle phagocytosis.a-HRP recognizes a number of neuronally-derived glycoproteins (Snow et al., 1987). Though the Budnik lab had not previously linked these structures to EVs, we do know that they very strongly colocalize with known EV cargoes and depend on the exact same membrane traffic machinery for release, arguing that some a-HRP antigen proteins are also EV cargoes (Blanchette et al., 2022). To close this loop. we have added data showing that Syt4-positive EVs also depend on Draper for their clearance (Fig 7).

      R3.4 For the HFMR experiment, it would be good to see the syt4-dependent phenotype as a positive control.__ __

      As mentioned for R2.5, we have added the Syt4 positive control (Figure 6E,F), which fails to show HFMR as expected.

      .__ __R3.5 In the abstract, the authors state that, "the cargoes are likely to function cell autonomously in the motorneuron". Isn't it alternatively possible that these proteins (wg in particular) could signal to the muscle in a non-exosome dependent pathway?

      Yes, we believe that Wg is likely released by another mechanism (perhaps conventional secretion). As noted for R1.5 and R2.6, we have added new data in Fig. 5 showing that Frizzled nuclear import IS NOT disrupted in Hrs mutants, despite dramatic loss of Evi EVs. Interestingly Frizzled nuclear import (and postsynaptic development) IS altered in neuronal Tsg101KD larvae, which disrupt additional membrane trafficking pathways beyond EV release (see Fig. 3). This is particularly interesting in light of the normal Syt4 signaling in Tsg101KD larvae, and supports the hypothesis that Syt4 can function without leaving the neuron, while Wg must be released, albeit not via Hrs-dependent EV formation. Another (less parsimonious) interpretation is that very small amounts of Wg release in the Hrs mutant are sufficient to promote Frizzled nuclear import.

      Reviewer #3 (Significance (Required)):

      This is an important paper that is well-organized and logically presented. It makes a clear and largely compelling case against major signaling roles for exosomes at this synapse. The authors should be commended for publishing this work, which demands a re-evaluation of proposed key roles for exosomes at the fly NMJ. Given the intense interest in exosomes in neurobiology, this paper will be of great interest to neuronal cell biologists working across systems.

      We thank the reviewer for their appreciation of the impact of our work on the field.

      Back, M.J., H.C. Ha, Z. Fu, J.M. Choi, Y. Piao, J.H. Won, J.M. Jang, I.C. Shin, and D.K. Kim. 2018. Activation of neutral sphingomyelinase 2 by starvation induces cell-protective autophagy via an increase in Golgi-localized ceramide. Cell Death Dis. 9:670.

      Blanchette, C.R., and A.A. Rodal. 2020. Mechanisms for biogenesis and release of neuronal extracellular vesicles. Curr Opin Neurobiol. 63:104-110.

      Blanchette, C.R., A.L. Scalera, K.P. Harris, Z. Zhao, E.C. Dresselhaus, K. Koles, A. Yeh, J.K. Apiki, B.A. Stewart, and A.A. Rodal. 2022. Local regulation of extracellular vesicle traffic by the synaptic endocytic machinery. J. Cell Biol. 10.1083/jcb.202112094.

      Choezom, D., and J.C. Gross. 2022. Neutral sphingomyelinase 2 controls exosome secretion by counteracting V-ATPase-mediated endosome acidification. J Cell Sci. 135.

      Enneking, E.M., S.R. Kudumala, E. Moreno, R. Stephan, J. Boerner, T.A. Godenschwege, and J. Pielage. 2013. Transsynaptic coordination of synaptic growth, function, and stability by the L1-type CAM Neuroglian. PLoS Biol. 11:e1001537.

      Fuentes-Medel, Y., M.A. Logan, J. Ashley, B. Ataman, V. Budnik, and M.R. Freeman. 2009. Glia and muscle sculpt neuromuscular arbors by engulfing destabilized synaptic boutons and shed presynaptic debris. PLoS Biol. 7:e1000184.

      Koles, K., J. Nunnari, C. Korkut, R. Barria, C. Brewer, Y. Li, J. Leszyk, B. Zhang, and V. Budnik. 2012. Mechanism of evenness interrupted (Evi)-exosome release at synaptic boutons. J Biol Chem. 287:16820-16834.

      Korkut, C., B. Ataman, P. Ramachandran, J. Ashley, R. Barria, N. Gherbesi, and V. Budnik. 2009. Trans-synaptic transmission of vesicular Wnt signals through Evi/Wntless. Cell. 139:393-404.

      Korkut, C., Y. Li, K. Koles, C. Brewer, J. Ashley, M. Yoshihara, and V. Budnik. 2013. Regulation of postsynaptic retrograde signaling by presynaptic exosome release. Neuron. 77:1039-1046.

      Lauwers, E., Y.C. Wang, R. Gallardo, R. Van der Kant, E. Michiels, J. Swerts, P. Baatsen, S.S. Zaiter, S.R. McAlpine, N.V. Gounko, F. Rousseau, J. Schymkowitz, and P. Verstreken. 2018. Hsp90 Mediates Membrane Deformation and Exosome Release. Mol Cell. 71:689-702 e689.

      Mathew, D., B. Ataman, J. Chen, Y. Zhang, S. Cumberledge, and V. Budnik. 2005. Wingless signaling at synapses is through cleavage and nuclear import of receptor DFrizzled2. Science. 310:1344-1347.

      Moberg, K.H., S. Schelble, S.K. Burdick, and I.K. Hariharan. 2005. Mutations in erupted, the Drosophila ortholog of mammalian tumor susceptibility gene 101, elicit non-cell-autonomous overgrowth. Dev Cell. 9:699-710.

      Mosca, T.J., and T.L. Schwarz. 2010. The nuclear import of Frizzled2-C by Importins-beta11 and alpha2 promotes postsynaptic development. Nat Neurosci. 13:935-943.

      Niekamp, P., F. Scharte, T. Sokoya, L. Vittadello, Y. Kim, Y. Deng, E. Sudhoff, A. Hilderink, M. Imlau, C.J. Clarke, M. Hensel, C.G. Burd, and J.C.M. Holthuis. 2022. Ca(2+)-activated sphingomyelin scrambling and turnover mediate ESCRT-independent lysosomal repair. Nat Commun. 13:1875.

      Snow, P.M., N.H. Patel, A.L. Harrelson, and C.S. Goodman. 1987. Neural-specific carbohydrate moiety shared by many surface glycoproteins in Drosophila and grasshopper embryos. J Neurosci. 7:4137-4144.

      Trajkovic, K., C. Hsu, S. Chiantia, L. Rajendran, D. Wenzel, F. Wieland, P. Schwille, B. Brugger, and M. Simons. 2008. Ceramide triggers budding of exosome vesicles into multivesicular endosomes. Science. 319:1244-1247.

      Walsh, R.B., E.C. Dresselhaus, A.N. Becalska, M.J. Zunitch, C.R. Blanchette, A.L. Scalera, T. Lemos, S.M. Lee, J. Apiki, S. Wang, B. Isaac, A. Yeh, K. Koles, and A.A. Rodal. 2021. Opposing functions for retromer and Rab11 in extracellular vesicle traffic at presynaptic terminals. J Cell Biol. 220:e202012034.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife assessment

      This useful manuscript challenges the utility of current paradigms for estimating brain-age with magnetic resonance imaging measures, but presents inadequate evidence to support the suggestion that an alternative approach focused on predicting cognition is more useful. The paper would benefit from a clearer explication of the methods and a more critical evaluation of the conceptual basis of the different models. This work will be of interest to researchers working on brain-age and related models.

      Thank you so much for providing high-quality reviews on our manuscript. We revised the manuscript to address all of the reviewers’ comments and provided full responses to each of the comments below. Importantly, in this revision, we clarified that we did not intend to use Brain Cognition as an alternative approach as mentioned by the editor. This is because, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And such quantification is the third aim of this study.

      Reviewer #1 (Public Review):

      In this paper, the authors evaluate the utility of brain age derived metrics for predicting cognitive decline by performing a 'commonality' analysis in a downstream regression that enables the different contribution of different predictors to be assessed. The main conclusion is that brain age derived metrics do not explain much additional variation in cognition over and above what is already explained by age. The authors propose to use a regression model trained to predict cognition ('brain cognition') as an alternative suited to applications of cognitive decline. While this is less accurate overall than brain age, it explains more unique variance in the downstream regression.

      Importantly, in this revision, we clarified that we did not intend to use Brain Cognition as an alternative approach. This is because, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      REVISED VERSION: while the authors have partially addressed my concerns, I do not feel they have addressed them all. I do not feel they have addressed the weight instability and concerns about the stacked regression models satisfactorily.

      Please see our responses to #3 below

      I also must say that I agree with Reviewer 3 about the limitations of the brain age and brain cognition methods conceptually. In particular that the regression model used to predict fluid cognition will by construction explain more variance in cognition than a brain age model that is trained to predict age. This suffers from the same problem the authors raise with brain age and would indeed disappear if the authors had a separate measure of cognition against which to validate and were then to regress this out as they do for age correction. I am aware that these conceptual problems are more widespread than this paper alone (in fact throughout the brain age literature), so I do not believe the authors should be penalised for that. However, I do think they can make these concerns more explicit and further tone down the comments they make about the utility of brain cognition. I have indicated the main considerations about these points in the recommendations section below.

      Thank you so much for raising this point. We now have the following statement in the introduction and discussion to address this concern (see below).

      Briefly, we made it explicit that, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. That is, the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. More importantly, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And this is the third goal of this present study.

      From Introduction:

      “Third and finally, certain variation in fluid cognition is related to brain MRI, but to what extent does Brain Age not capture this variation? To estimate the variation in fluid cognition that is related to the brain MRI, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data. Previous studies found reasonable predictive performances of these cognition-prediction models, built from certain MRI modalities (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). Analogous to Brain Age, we called the predicted values from these cognition-prediction models, Brain Cognition. The strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in fluid cognition that is related to the brain MRI and, therefore, indicates the upper limit of Brain Age’s capability in capturing fluid cognition. This is, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Consequently, if we included Brain Cognition, Brain Age and chronological age in the same model to explain fluid cognition, we would be able to examine the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age. These unique effects of Brain Cognition, in turn, would indicate the amount of co-variation between brain MRI and fluid cognition that is missed by Brain Age.”

      From Discussion:

      “Third, by introducing Brain Cognition, we showed the extent to which Brain Age indices were not able to capture the variation in fluid cognition that is related to brain MRI. More specifically, using Brain Cognition allowed us to gauge the variation in fluid cognition that is related to the brain MRI, and thereby, to estimate the upper limit of what Brain Age can do. Moreover, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      From our results, Brain Cognition, especially from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We then examined Brain Cognition using commonality analyses (Nimon et al., 2008) in multiple regression models having a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition. Similar to Brain Age indices, Brain Cognition exhibited large common effects with chronological age. But more importantly, unlike Brain Age indices, Brain Cognition showed large unique effects, up to around 11%. As explained above, the unique effects of Brain Cognition indicated the amount of co-variation between brain MRI and fluid cognition that was missed by a Brain Age index and chronological age. This missing amount was relatively high, considering that Brain Age and chronological age together explained around 32% of the total variation in fluid cognition. Accordingly, if a Brain Age index was used as a biomarker along with chronological age, we would have missed an opportunity to improve the performance of the model by around one-third of the variation explained.”

      This is a reasonably good paper and the use of a commonality analysis is a nice contribution to understanding variance partitioning across different covariates. I have some comments that I believe the authors ought to address, which mostly relate to clarity and interpretation

      Reviewer #1 Public Review #1

      First, from a conceptual point of view, the authors focus exclusively on cognition as a downstream outcome. I would suggest the authors nuance their discussion to provide broader considerations of the utility of their method and on the limits of interpretation of brain age models more generally.

      Thank you for your comments on this issue.

      We now discussed the broader consideration in detail:

      (1) the consistency between our findings on fluid cognition and other recent works on brain disorders,

      (2) the difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021)

      and

      (3) suggested solutions we and others made to optimise the utility of Brain Age for both cognitive functioning and brain disorders.

      From Discussion:

      “This discrepancy between the predictive performance of age-prediction models and the utility of Brain Age indices as a biomarker is consistent with recent findings (for review, see Jirsaraie, Gorelik, et al., 2023), both in the context of cognitive functioning (Jirsaraie, Kaufmann, et al., 2023) and neurological/psychological disorders (Bashyam et al., 2020; Rokicki et al., 2021). For instance, combining different MRI modalities into the prediction models, similar to our stacked models, often leads to the highest performance of age-prediction models, but does not likely explain the highest variance across different phenotypes, including cognitive functioning and beyond (Jirsaraie, Gorelik, et al., 2023).”

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). We consider the former as a normative type of study and the latter as a case-control type of study (Insel et al., 2010; Marquand et al., 2016). Those case-control Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. On the one hand, this means that case-control studies treat Brain Age as a method to detect anomalies in the neurological/psychological group (Hahn et al., 2021). On the other hand, this also means that case-control studies have to ignore under-fitted models when applied prediction models built from largely healthy participants to participants with neurological/psychological disorders (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other normative studies focusing on cognitive functioning often build age-prediction models from MRI data of largely healthy participants and apply the built age-prediction models to participants who are also largely healthy. Accordingly, the age-prediction models for explaining cognitive functioning in normative studies, while not allowing us to detect group-level anomalies, do not suffer from being under-fitted. This unfortunately might limit the generalisability of our study into just the normative type of study. Future work is still needed to test the utility of brain age in the case-control case.”

      “Next, researchers should not select age-prediction models based solely on age-prediction performance. Instead, researchers could select age-prediction models that explained phenotypes of interest the best. Here we selected age-prediction models based on a set of features (i.e., modalities) of brain MRI. This strategy was found effective not only for fluid cognition as we demonstrated here, but also for neurological and psychological disorders as shown elsewhere (Jirsaraie, Gorelik, et al., 2023; Rokicki et al., 2021). Rokicki and colleagues (2021), for instance, found that, while integrating across MRI modalities led to age-prediction models with the highest age-prediction performance, using only T1 structural MRI gave age-prediction models that were better at classifying Alzheimer’s disease. Similarly, using only cerebral blood flow gave age-prediction models that were better at classifying mild/subjective cognitive impairment, schizophrenia and bipolar disorder.

      As opposed to selecting age-prediction models based on a set of features, researchers could also select age-prediction models based on modelling methods. For instance, Jirsaraie and colleagues (2023) compared gradient tree boosting (GTB) and deep-learning brain network (DBN) algorithms in building age-prediction models. They found GTB to have higher age-prediction performance but DBN to have better utility in explaining cognitive functioning. In this case, an algorithm with better utility (e.g., DBN) should be used for explaining a phenotype of interest. Similarly, Bashyam and colleagues (2020) built different DBN-based age-prediction models, varying in age-prediction performance. The DBN models with a higher number of epochs corresponded to higher age-prediction performance. However, DBN-based age-prediction models with a moderate (as opposed to higher or lower) number of epochs were better at classifying Alzheimer’s disease, mild cognitive impairment and schizophrenia. In this case, a model from the same algorithm with better utility (e.g., those DBN with a moderate epoch number) should be used for explaining a phenotype of interest. Accordingly, this calls for a change in research practice, as recently pointed out by Jirasarie and colleagues (2023, p7), “Despite mounting evidence, there is a persisting assumption across several studies that the most accurate brain age models will have the most potential for detecting differences in a given phenotype of interest”. Future neuroimaging research should aim to build age-prediction models that are not necessarily good at predicting age, but at capturing phenotypes of interest.”

      Reviewer #1 Public Review #2

      Second, from a methods perspective, there is not a sufficient explanation of the methodological procedures in the current manuscript to fully understand how the stacked regression models were constructed. I would request that the authors provide more information to enable the reader to better understand the stacked regression models used to ensure that these models are not overfit.

      Thank you for allowing us an opportunity to clarify our stacked model. We made additional clarification to make this clearer (see below). We wanted to confirm that we did not use test sets to build a stacked model in both lower and higher levels of the Elastic Net models. Test sets were there just for testing the performance of the models.

      From Methods: “We used nested cross-validation (CV) to build these prediction models (see Figure 7). We first split the data into five outer folds, leaving each outer fold with around 100 participants. This number of participants in each fold is to ensure the stability of the test performance across folds. In each outer-fold CV loop, one of the outer folds was treated as an outer-fold test set, and the rest was treated as an outer-fold training set. Ultimately, looping through the nested CV resulted in a) prediction models from each of the 18 sets of features as well as b) prediction models that drew information across different combinations of the 18 separate sets, known as “stacked models.” We specified eight stacked models: “All” (i.e., including all 18 sets of features), “All excluding Task FC”, “All excluding Task Contrast”, “Non-Task” (i.e., including only Rest FC and sMRI), “Resting and Task FC”, “Task Contrast and FC”, “Task Contrast” and “Task FC”. Accordingly, there were 26 prediction models in total for both Brain Age and Brain Cognition.

      To create these 26 prediction models, we applied three steps for each outer-fold loop. The first step aimed at tuning prediction models for each of 18 sets of features. This step only involved the outer-fold training set and did not involve the outer-fold test set. Here, we divided the outer-fold training set into five inner folds and applied inner-fold CV to tune hyperparameters with grid search. Specifically, in each inner-fold CV, one of the inner folds was treated as an inner-fold validation set, and the rest was treated as an inner-fold training set. Within each inner-fold CV loop, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters and applied the estimated model to the inner-fold validation set. After looping through the inner-fold CV, we, then, chose the prediction models that led to the highest performance, reflected by coefficient of determination (R2), on average across the inner-fold validation sets. This led to 18 tuned models, one for each of the 18 sets of features, for each outer fold.

      The second step aimed at tuning stacked models. Same as the first step, the second step only involved the outer-fold training set and did not involve the outer-fold test set. Here, using the same outer-fold training set as the first step, we applied tuned models, created from the first step, one from each of the 18 sets of features, resulting in 18 predicted values for each participant. We, then, re-divided this outer-fold training set into new five inner folds. In each inner fold, we treated different combinations of the 18 predicted values from separate sets of features as features to predict the targets in separate “stacked” models. Same as the first step, in each inner-fold CV loop, we treated one out of five inner folds as an inner-fold validation set, and the rest as an inner-fold training set. Also as in the first step, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters from our grid. We tuned the hyperparameters of stacked models using grid search by selecting the models with the highest R2 on average across the inner-fold validation sets. This led to eight tuned stacked models.

      The third step aimed at testing the predictive performance of the 18 tuned prediction models from each of the set of features, built from the first step, and eight tuned stacked models, built from the second step. Unlike the first two steps, here we applied the already tuned models to the outer-fold test set. We started by applying the 18 tuned prediction models from each of the sets of features to each observation in the outer-fold test set, resulting in 18 predicted values. We then applied the tuned stacked models to these predicted values from separate sets of features, resulting in eight predicted values.

      To demonstrate the predictive performance, we assessed the similarity between the observed values and the predicted values of each model across outer-fold test sets, using Pearson’s r, coefficient of determination (R2) and mean absolute error (MAE). Note that for R2, we used the sum of squares definition (i.e., R2 = 1 – (sum of squares residuals/total sum of squares)) per a previous recommendation (Poldrack et al., 2020). We considered the predicted values from the outer-fold test sets of models predicting age or fluid cognition, as Brain Age and Brain Cognition, respectively.”

      Note some previous research, including ours (Tetereva et al., 2022), splits the observations in the outer-fold training set into layer 1 and layer 2 and applies the first and second steps to layers 1 and 2, respectively. Here we decided against this approach and used the same outer-fold training set for both first and second steps in order to avoid potential bias toward the stacked models. This is because, when the data are split into two layers, predictive models built for each separate set of features only use the data from layer 1, while the stacked models use the data from both layers 1 and 2. In practice with large enough data, these two approaches might not differ much, as we demonstrated previously (Tetereva et al., 2022).

      Reviewer #1 Public Review #3

      Please also provide an indication of the different regression strengths that were estimated across the different models and cross-validation splits. Also, how stable were the weights across splits?

      The focus of this article is on the predictions. Still, it is informative for readers to understand how stable the feature importance (i.e., Elastic Net coefficients) is. To demonstrate the stability of feature importance, we now examined the rank stability of feature importance using Spearman’s ρ (see Figure 4). Specifically, we correlated the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, we computed 10 Spearman’s ρ for each prediction model of the same features. We found Spearman’s ρ to be varied dramatically in both age-prediction (range=.31-.94) and fluid cognition-prediction (range=.16-.84) models. This means that some prediction models were much more stable in their feature importance than others. This is probably due to various factors such as a) the collinearity of features in the model, b) the number of features (e.g., 71,631 features in functional connectivity, which were further reduced to 75 PCAs, as compared to 19 features in subcortical volume based on the ASEG atlas), c) the penalisation of coefficients either with ‘Ridge’ or ‘Lasso’ methods, which resulted in reduction as a group of features or selection of a feature among correlated features, respectively, and d) the predictive performance of the models. Understanding the stability of feature importance is beyond the scope of the current article. As mentioned by Reviewer 1, “The predictions can be stable when the coefficients are not,” and we chose to focus on the prediction in the current article.

      Reviewer #1 Public Review #4

      Please provide more details about the task designs, MRI processing procedures that were employed on this sample in addition to the regression methods and bias correction methods used. For example, there are several different parameterisations of the elastic net, please provide equations to describe the method used here so that readers can easily determine how the regularisation parameters should be interpreted.

      Thank you for the opportunity for us to provide more methodical details.

      First, for the task design, we included the following statements:

      From Methods:

      “HCP-A collected fMRI data from three tasks: Face Name (Sperling et al., 2001), Conditioned Approach Response Inhibition Task (CARIT) (Somerville et al., 2018) and VISual MOTOR (VISMOTOR) (Ances et al., 2009).

      First, the Face Name task (Sperling et al., 2001) taps into episodic memory. The task had three blocks. In the encoding block [Encoding], participants were asked to memorise the names of faces shown. These faces were then shown again in the recall block [Recall] when the participants were asked if they could remember the names of the previously shown faces. There was also the distractor block [Distractor] occurring between the encoding and recall blocks. Here participants were distracted by a Go/NoGo task. We computed six contrasts for this Face Name task: [Encode], [Recall], [Distractor], [Encode vs. Distractor], [Recall vs. Distractor] and [Encode vs. Recall].

      Second, the CARIT task (Somerville et al., 2018) was adapted from the classic Go/NoGo task and taps into inhibitory control. Participants were asked to press a button to all [Go] but not to two [NoGo] shapes. We computed three contrasts for the CARIT task: [NoGo], [Go] and [NoGo vs. Go].

      Third, the VISMOTOR task (Ances et al., 2009) was designed to test simple activation of the motor and visual cortices. Participants saw a checkerboard with a red square either on the left or right. They needed to press a corresponding key to indicate the location of the red square. We computed just one contrast for the VISMOTOR task: [Vismotor], which indicates the presence of the checkerboard vs. baseline.”

      Second, for MRI processing procedures, we included the following statements.

      From Methods: “HCP-A provides details of parameters for brain MRI elsewhere (Bookheimer et al., 2019; Harms et al., 2018). Here we used MRI data that were pre-processed by the HCP-A with recommended methods, including the MSMALL alignment (Glasser et al., 2016; Robinson et al., 2018) and ICA-FIX (Glasser et al., 2016) for functional MRI. We used multiple brain MRI modalities, covering task functional MRI (task fMRI), resting-state functional MRI (rsfMRI) and structural MRI (sMRI), and organised them into 19 sets of features.”

      “ Sets of Features 1-10: Task fMRI contrast (Task Contrast) Task contrasts reflect fMRI activation relevant to events in each task. Bookheimer and colleagues (2019) provided detailed information about the fMRI in HCP-A. Here we focused on the pre-processed task fMRI Connectivity Informatics Technology Initiative (CIFTI) files with a suffix, “_PA_Atlas_MSMAll_hp0_clean.dtseries.nii.” These CIFTI files encompassed both the cortical mesh surface and subcortical volume (Glasser et al., 2013). Collected using the posterior-to-anterior (PA) phase, these files were aligned using MSMALL (Glasser et al., 2016; Robinson et al., 2018), linear detrended (see https://groups.google.com/a/humanconnectome.org/g/hcp-users/c/ZLJc092h980/m/GiihzQAUAwAJ) and cleaned from potential artifacts using ICA-FIX (Glasser et al., 2016).

      To extract Task Contrasts, we regressed the fMRI time series on the convolved task events using a double-gamma canonical hemodynamic response function via FMRIB Software Library (FSL)’s FMRI Expert Analysis Tool (FEAT) (Woolrich et al., 2001). We kept FSL’s default high pass cutoff at 200s (i.e., .005 Hz). We then parcellated the contrast ‘cope’ files, using the Glasser atlas (Gordon et al., 2016) for cortical surface regions and the Freesurfer’s automatic segmentation (aseg) (Fischl et al., 2002) for subcortical regions. This resulted in 379 regions, whose number was, in turn, the number of features for each Task Contrast set of features. “

      “ Sets of Features 11-13: Task fMRI functional connectivity (Task FC) Task FC reflects functional connectivity (FC ) among the brain regions during each task, which is considered an important source of individual differences (Elliott et al., 2019; Fair et al., 2007; Gratton et al., 2018). We used the same CIFTI file “_PA_Atlas_MSMAll_hp0_clean.dtseries.nii.” as the task contrasts. Unlike Task Contrasts, here we treated the double-gamma, convolved task events as regressors of no interest and focused on the residuals of the regression from each task (Fair et al., 2007). We computed these regressors on FSL, and regressed them in nilearn (Abraham et al., 2014). Following previous work on task FC (Elliott et al., 2019), we applied a highpass at .008 Hz. For parcellation, we used the same atlases as Task Contrast (Fischl et al., 2002; Glasser et al., 2016). We computed Pearson’s correlations of each pair of 379 regions, resulting in a table of 71,631 non-overlapping FC indices for each task. We then applied r-to-z transformation and principal component analysis (PCA) of 75 components (Rasero et al., 2021; Sripada et al., 2019, 2020). Note to avoid data leakage, we conducted the PCA on each training set and applied its definition to the corresponding test set. Accordingly, there were three sets of 75 features for Task FC, one for each task.

      Set of Features 14: Resting-state functional MRI functional connectivity (Rest FC) Similar to Task FC, Rest FC reflects functional connectivity (FC ) among the brain regions, except that Rest FC occurred during the resting (as opposed to task-performing) period. HCP-A collected Rest FC from four 6.42-min (488 frames) runs across two days, leading to 26-min long data (Harms et al., 2018). On each day, the study scanned two runs of Rest FC, starting with anterior-to-posterior (AP) and then with posterior-to-anterior (PA) phase encoding polarity. We used the “rfMRI_REST_Atlas_MSMAll_hp0_clean.dscalar.nii” file that was pre-processed and concatenated across the four runs. We applied the same computations (i.e., highpass filter, parcellation, Pearson’s correlations, r-to-z transformation and PCA) with the Task FC.

      Sets of Features 15-18: Structural MRI (sMRI)

      sMRI reflects individual differences in brain anatomy. The HCP-A used an established pre-processing pipeline for sMRI (Glasser et al., 2013). We focused on four sets of features: cortical thickness, cortical surface area, subcortical volume and total brain volume. For cortical thickness and cortical surface area, we used Destrieux’s atlas (Destrieux et al., 2010; Fischl, 2012) from FreeSurfer’s “aparc.stats” file, resulting in 148 regions for each set of features. For subcortical volume, we used the aseg atlas (Fischl et al., 2002) from FreeSurfer’s “aseg.stats” file, resulting in 19 regions. For total brain volume, we had five FreeSurfer-based features: “FS_IntraCranial_Vol” or estimated intra-cranial volume, “FS_TotCort_GM_Vol” or total cortical grey matter volume, “FS_Tot_WM_Vol” or total cortical white matter volume, “FS_SubCort_GM_Vol” or total subcortical grey matter volume and “FS_BrainSegVol_eTIV_Ratio” or ratio of brain segmentation volume to estimated total intracranial volume.”

      Third, for regression methods and bias correction methods used, we included the following statements:

      From Methods:

      “For the machine learning algorithm, we used Elastic Net (Zou & Hastie, 2005). Elastic Net is a general form of penalised regressions (including Lasso and Ridge regression), allowing us to simultaneously draw information across different brain indices to predict one target variable. Penalised regressions are commonly used for building age-prediction models (Jirsaraie, Gorelik, et al., 2023). Previously we showed that the performance of Elastic Net in predicting cognitive abilities is on par, if not better than, many non-linear and more-complicated algorithms (Pat, Wang, Bartonicek, et al., 2022; Tetereva et al., 2022). Moreover, Elastic Net coefficients are readily explainable, allowing us the ability to explain how our age-prediction and cognition-prediction models made the prediction from each brain feature (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022) (see below).

      Elastic Net simultaneously minimises the weighted sum of the features’ coefficients. The degree of penalty to the sum of the feature’s coefficients is determined by a shrinkage hyperparameter ‘α’: the greater the α, the more the coefficients shrink, and the more regularised the model becomes. Elastic Net also includes another hyperparameter, ‘l1 ratio’, which determines the degree to which the sum of either the squared (known as ‘Ridge’; l1 ratio=0) or absolute (known as ‘Lasso’; l1 ratio=1) coefficients is penalised (Zou & Hastie, 2005). The objective function of Elastic Net as implemented by sklearn (Pedregosa et al., 2011) is defined as:

      where X is the features, y is the target, and β is the coefficient. In our grid search, we tuned two Elastic Net hyperparameters: α using 70 numbers in log space, ranging from .1 and 100, and l_1-ratio using 25 numbers in linear space, ranging from 0 and 1.

      To understand how Elastic Net made a prediction based on different brain features, we examined the coefficients of the tuned model. Elastic Net coefficients can be considered as feature importance, such that more positive Elastic Net coefficients lead to more positive predicted values and, similarly, more negative Elastic Net coefficients lead to more negative predicted values (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022). While the magnitude of Elastic Net coefficients is regularised (thus making it difficult for us to interpret the magnitude itself directly), we could still indicate that a brain feature with a higher magnitude weights relatively stronger in making a prediction. Another benefit of Elastic Net as a penalised regression is that the coefficients are less susceptible to collinearity among features as they have already been regularised (Dormann et al., 2013; Pat, Wang, Bartonicek, et al., 2022).

      Given that we used five-fold nested cross validation, different outer folds may have different degrees of ‘α’ and ‘l1 ratio’, making the final coefficients from different folds to be different. For instance, for certain sets of features, penalisation may not play a big part (i.e., higher or lower ‘α’ leads to similar predictive performance), resulting in different ‘α’ for different folds. To remedy this in the visualisation of Elastic Net feature importance, we refitted the Elastic Net model to the full dataset without splitting them into five folds and visualised the coefficients on brain images using Brainspace (Vos De Wael et al., 2020) and Nilern (Abraham et al., 2014) packages. Note, unlike other sets of features, Task FC and Rest FC were modelled after data reduction via PCA. Thus, for Task FC and Rest FC, we, first, multiplied the absolute PCA scores (extracted from the ‘components_’ attribute of ‘sklearn.decomposition.PCA’) with Elastic Net coefficients and, then, summed the multiplied values across the 75 components, leaving 71,631 ROI-pair indices. “

      References

      Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kossaifi, J., Gramfort, A., Thirion, B., & Varoquaux, G. (2014). Machine learning for neuroimaging with scikit-learn. Frontiers in Neuroinformatics, 8, 14. https://doi.org/10.3389/fninf.2014.00014

      Ances, B. M., Liang, C. L., Leontiev, O., Perthen, J. E., Fleisher, A. S., Lansing, A. E., & Buxton, R. B. (2009). Effects of aging on cerebral blood flow, oxygen metabolism, and blood oxygenation level dependent responses to visual stimulation. Human Brain Mapping, 30(4), 1120–1132. https://doi.org/10.1002/hbm.20574

      Bashyam, V. M., Erus, G., Doshi, J., Habes, M., Nasrallah, I. M., Truelove-Hill, M., Srinivasan, D., Mamourian, L., Pomponio, R., Fan, Y., Launer, L. J., Masters, C. L., Maruff, P., Zhuo, C., Völzke, H., Johnson, S. C., Fripp, J., Koutsouleris, N., Satterthwaite, T. D., … on behalf of the ISTAGING Consortium, the P. A. disease C., ADNI, and CARDIA studies. (2020). MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide. Brain, 143(7), 2312–2324. https://doi.org/10.1093/brain/awaa160

      Bookheimer, S. Y., Salat, D. H., Terpstra, M., Ances, B. M., Barch, D. M., Buckner, R. L., Burgess, G. C., Curtiss, S. W., Diaz-Santos, M., Elam, J. S., Fischl, B., Greve, D. N., Hagy, H. A., Harms, M. P., Hatch, O. M., Hedden, T., Hodge, C., Japardi, K. C., Kuhn, T. P., … Yacoub, E. (2019). The Lifespan Human Connectome Project in Aging: An overview. NeuroImage, 185, 335–348. https://doi.org/10.1016/j.neuroimage.2018.10.009

      Butler, E. R., Chen, A., Ramadan, R., Le, T. T., Ruparel, K., Moore, T. M., Satterthwaite, T. D., Zhang, F., Shou, H., Gur, R. C., Nichols, T. E., & Shinohara, R. T. (2021). Pitfalls in brain age analyses. Human Brain Mapping, 42(13), 4092–4101. https://doi.org/10.1002/hbm.25533

      Cole, J. H. (2020). Multimodality neuroimaging brain-age in UK biobank: Relationship to biomedical, lifestyle, and cognitive factors. Neurobiology of Aging, 92, 34–42. https://doi.org/10.1016/j.neurobiolaging.2020.03.014

      Destrieux, C., Fischl, B., Dale, A., & Halgren, E. (2010). Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclature. NeuroImage, 53(1), 1–15. https://doi.org/10.1016/j.neuroimage.2010.06.010

      Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., Marquéz, J. R. G., Gruber, B., Lafourcade, B., Leitão, P. J., Münkemüller, T., McClean, C., Osborne, P. E., Reineking, B., Schröder, B., Skidmore, A. K., Zurell, D., & Lautenbach, S. (2013). Collinearity: A review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27–46. https://doi.org/10.1111/j.1600-0587.2012.07348.x

      Dubois, J., Galdi, P., Paul, L. K., & Adolphs, R. (2018). A distributed brain network predicts general intelligence from resting-state human neuroimaging data. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1756), 20170284. https://doi.org/10.1098/rstb.2017.0284

      Elliott, M. L., Knodt, A. R., Cooke, M., Kim, M. J., Melzer, T. R., Keenan, R., Ireland, D., Ramrakha, S., Poulton, R., Caspi, A., Moffitt, T. E., & Hariri, A. R. (2019). General functional connectivity: Shared features of resting-state and task fMRI drive reliable and heritable individual differences in functional brain networks. NeuroImage, 189, 516–532. https://doi.org/10.1016/j.neuroimage.2019.01.068

      Fair, D. A., Schlaggar, B. L., Cohen, A. L., Miezin, F. M., Dosenbach, N. U. F., Wenger, K. K., Fox, M. D., Snyder, A. Z., Raichle, M. E., & Petersen, S. E. (2007). A method for using blocked and event-related fMRI data to study “resting state” functional connectivity. NeuroImage, 35(1), 396–405. https://doi.org/10.1016/j.neuroimage.2006.11.051

      Fischl, B. (2012). FreeSurfer. NeuroImage, 62(2), 774–781. https://doi.org/10.1016/j.neuroimage.2012.01.021

      Fischl, B., Salat, D. H., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., van der Kouwe, A., Killiany, R., Kennedy, D., Klaveness, S., Montillo, A., Makris, N., Rosen, B., & Dale, A. M. (2002). Whole Brain Segmentation. Neuron, 33(3), 341–355. https://doi.org/10.1016/S0896-6273(02)00569-X

      Glasser, M. F., Smith, S. M., Marcus, D. S., Andersson, J. L. R., Auerbach, E. J., Behrens, T. E. J., Coalson, T. S., Harms, M. P., Jenkinson, M., Moeller, S., Robinson, E. C., Sotiropoulos, S. N., Xu, J., Yacoub, E., Ugurbil, K., & Van Essen, D. C. (2016). The Human Connectome Project’s neuroimaging approach. Nature Neuroscience, 19(9), 1175–1187. https://doi.org/10.1038/nn.4361

      Glasser, M. F., Sotiropoulos, S. N., Wilson, J. A., Coalson, T. S., Fischl, B., Andersson, J. L., Xu, J., Jbabdi, S., Webster, M., Polimeni, J. R., Van Essen, D. C., & Jenkinson, M. (2013). The minimal preprocessing pipelines for the Human Connectome Project. NeuroImage, 80, 105–124. https://doi.org/10.1016/j.neuroimage.2013.04.127

      Gordon, E. M., Laumann, T. O., Adeyemo, B., Huckins, J. F., Kelley, W. M., & Petersen, S. E. (2016). Generation and Evaluation of a Cortical Area Parcellation from Resting-State Correlations. Cerebral Cortex, 26(1), 288–303. https://doi.org/10.1093/cercor/bhu239

      Gratton, C., Laumann, T. O., Nielsen, A. N., Greene, D. J., Gordon, E. M., Gilmore, A. W., Nelson, S. M., Coalson, R. S., Snyder, A. Z., Schlaggar, B. L., Dosenbach, N. U. F., & Petersen, S. E. (2018). Functional Brain Networks Are Dominated by Stable Group and Individual Factors, Not Cognitive or Daily Variation. Neuron, 98(2), 439-452.e5. https://doi.org/10.1016/j.neuron.2018.03.035

      Hahn, T., Fisch, L., Ernsting, J., Winter, N. R., Leenings, R., Sarink, K., Emden, D., Kircher, T., Berger, K., & Dannlowski, U. (2021). From ‘loose fitting’ to high-performance, uncertainty-aware brain-age modelling. Brain, 144(3), e31–e31. https://doi.org/10.1093/brain/awaa454

      Harms, M. P., Somerville, L. H., Ances, B. M., Andersson, J., Barch, D. M., Bastiani, M., Bookheimer, S. Y., Brown, T. B., Buckner, R. L., Burgess, G. C., Coalson, T. S., Chappell, M. A., Dapretto, M., Douaud, G., Fischl, B., Glasser, M. F., Greve, D. N., Hodge, C., Jamison, K. W., … Yacoub, E. (2018). Extending the Human Connectome Project across ages: Imaging protocols for the Lifespan Development and Aging projects. NeuroImage, 183, 972–984. https://doi.org/10.1016/j.neuroimage.2018.09.060

      Insel, T., Cuthbert, B., Garvey, M., Heinssen, R., Pine, D. S., Quinn, K., Sanislow, C., & Wang, P. (2010). Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. American Journal of Psychiatry, 167(7), 748–751. https://doi.org/10.1176/appi.ajp.2010.09091379

      Jirsaraie, R. J., Gorelik, A. J., Gatavins, M. M., Engemann, D. A., Bogdan, R., Barch, D. M., & Sotiras, A. (2023). A systematic review of multimodal brain age studies: Uncovering a divergence between model accuracy and utility. Patterns, 4(4), 100712. https://doi.org/10.1016/j.patter.2023.100712

      Jirsaraie, R. J., Kaufmann, T., Bashyam, V., Erus, G., Luby, J. L., Westlye, L. T., Davatzikos, C., Barch, D. M., & Sotiras, A. (2023). Benchmarking the generalizability of brain age models: Challenges posed by scanner variance and prediction bias. Human Brain Mapping, 44(3), 1118–1128. https://doi.org/10.1002/hbm.26144

      Marquand, A. F., Rezek, I., Buitelaar, J., & Beckmann, C. F. (2016). Understanding Heterogeneity in Clinical Cohorts Using Normative Models: Beyond Case-Control Studies. Biological Psychiatry, 80(7), 552–561. https://doi.org/10.1016/j.biopsych.2015.12.023

      Molnar, C. (2019). Interpretable Machine Learning. A Guide for Making Black Box Models Explainable. https://christophm.github.io/interpretable-ml-book/

      Nimon, K., Lewis, M., Kane, R., & Haynes, R. M. (2008). An R package to compute commonality coefficients in the multiple regression case: An introduction to the package and a practical example. Behavior Research Methods, 40(2), 457–466. https://doi.org/10.3758/BRM.40.2.457

      Pat, N., Wang, Y., Anney, R., Riglin, L., Thapar, A., & Stringaris, A. (2022). Longitudinally stable, brain‐based predictive models mediate the relationships between childhood cognition and socio‐demographic, psychological and genetic factors. Human Brain Mapping, hbm.26027. https://doi.org/10.1002/hbm.26027

      Pat, N., Wang, Y., Bartonicek, A., Candia, J., & Stringaris, A. (2022). Explainable machine learning approach to predict and explain the relationship between task-based fMRI and individual differences in cognition. Cerebral Cortex, bhac235. https://doi.org/10.1093/cercor/bhac235

      Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(85), 2825–2830.

      Poldrack, R. A., Huckins, G., & Varoquaux, G. (2020). Establishment of Best Practices for Evidence for Prediction: A Review. JAMA Psychiatry, 77(5), 534–540. https://doi.org/10.1001/jamapsychiatry.2019.3671

      Rasero, J., Sentis, A. I., Yeh, F.-C., & Verstynen, T. (2021). Integrating across neuroimaging modalities boosts prediction accuracy of cognitive ability. PLOS Computational Biology, 17(3), e1008347. https://doi.org/10.1371/journal.pcbi.1008347

      Robinson, E. C., Garcia, K., Glasser, M. F., Chen, Z., Coalson, T. S., Makropoulos, A., Bozek, J., Wright, R., Schuh, A., Webster, M., Hutter, J., Price, A., Cordero Grande, L., Hughes, E., Tusor, N., Bayly, P. V., Van Essen, D. C., Smith, S. M., Edwards, A. D., … Rueckert, D. (2018). Multimodal surface matching with higher-order smoothness constraints. NeuroImage, 167, 453–465. https://doi.org/10.1016/j.neuroimage.2017.10.037

      Rokicki, J., Wolfers, T., Nordhøy, W., Tesli, N., Quintana, D. S., Alnæs, D., Richard, G., de Lange, A.-M. G., Lund, M. J., Norbom, L., Agartz, I., Melle, I., Nærland, T., Selbæk, G., Persson, K., Nordvik, J. E., Schwarz, E., Andreassen, O. A., Kaufmann, T., & Westlye, L. T. (2021). Multimodal imaging improves brain age prediction and reveals distinct abnormalities in patients with psychiatric and neurological disorders. Human Brain Mapping, 42(6), 1714–1726. https://doi.org/10.1002/hbm.25323

      Somerville, L. H., Bookheimer, S. Y., Buckner, R. L., Burgess, G. C., Curtiss, S. W., Dapretto, M., Elam, J. S., Gaffrey, M. S., Harms, M. P., Hodge, C., Kandala, S., Kastman, E. K., Nichols, T. E., Schlaggar, B. L., Smith, S. M., Thomas, K. M., Yacoub, E., Van Essen, D. C., & Barch, D. M. (2018). The Lifespan Human Connectome Project in Development: A large-scale study of brain connectivity development in 5–21 year olds. NeuroImage, 183, 456–468. https://doi.org/10.1016/j.neuroimage.2018.08.050

      Sperling, R. A., Bates, J. F., Cocchiarella, A. J., Schacter, D. L., Rosen, B. R., & Albert, M. S. (2001). Encoding novel face-name associations: A functional MRI study. Human Brain Mapping, 14(3), 129–139. https://doi.org/10.1002/hbm.1047

      Sripada, C., Angstadt, M., Rutherford, S., Kessler, D., Kim, Y., Yee, M., & Levina, E. (2019). Basic Units of Inter-Individual Variation in Resting State Connectomes. Scientific Reports, 9(1), Article 1. https://doi.org/10.1038/s41598-018-38406-5

      Sripada, C., Angstadt, M., Rutherford, S., Taxali, A., & Shedden, K. (2020). Toward a “treadmill test” for cognition: Improved prediction of general cognitive ability from the task activated brain. Human Brain Mapping, 41(12), 3186–3197. https://doi.org/10.1002/hbm.25007

      Tetereva, A., Li, J., Deng, J. D., Stringaris, A., & Pat, N. (2022). Capturing brain‐cognition relationship: Integrating task‐based fMRI across tasks markedly boosts prediction and test‐retest reliability. NeuroImage, 263, 119588. https://doi.org/10.1016/j.neuroimage.2022.119588

      Vieira, B. H., Pamplona, G. S. P., Fachinello, K., Silva, A. K., Foss, M. P., & Salmon, C. E. G. (2022). On the prediction of human intelligence from neuroimaging: A systematic review of methods and reporting. Intelligence, 93, 101654. https://doi.org/10.1016/j.intell.2022.101654

      Vos De Wael, R., Benkarim, O., Paquola, C., Lariviere, S., Royer, J., Tavakol, S., Xu, T., Hong, S.-J., Langs, G., Valk, S., Misic, B., Milham, M., Margulies, D., Smallwood, J., & Bernhardt, B. C. (2020). BrainSpace: A toolbox for the analysis of macroscale gradients in neuroimaging and connectomics datasets. Communications Biology, 3(1), 103. https://doi.org/10.1038/s42003-020-0794-7

      Woolrich, M. W., Ripley, B. D., Brady, M., & Smith, S. M. (2001). Temporal Autocorrelation in Univariate Linear Modeling of FMRI Data. NeuroImage, 14(6), 1370–1386. https://doi.org/10.1006/nimg.2001.0931

      Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x


      The following is the authors’ response to the previous reviews.

      eLife assessment

      This useful manuscript challenges the utility of current paradigms for estimating brain-age with magnetic resonance imaging measures, but presents inadequate evidence to support the suggestion that an alternative approach focused on predicting cognition is more useful. The paper would benefit from a clearer explication of the methods and a more critical evaluation of the conceptual basis of the different models. This work will be of interest to researchers working on brain-age and related models.

      Thank you so much for providing high-quality reviews on our manuscript. We revised the manuscript to address all of the reviewers’ comments and provided full responses to each of the comments below. Importantly, in this revision, we clarified that we did not intend to use Brain Cognition as an alternative approach. This is because, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And such quantification is the third aim of this study.

      Public Reviews:

      Reviewer 1 (Public Review):

      In this paper, the authors evaluate the utility of brain-age-derived metrics for predicting cognitive decline by performing a 'commonality' analysis in a downstream regression that enables the different contribution of different predictors to be assessed. The main conclusion is that brain-age-derived metrics do not explain much additional variation in cognition over and above what is already explained by age. The authors propose to use a regression model trained to predict cognition ("brain-cognition") as an alternative suited to applications of cognitive decline. While this is less accurate overall than brain age, it explains more unique variance in the downstream regression.

      (1) I thank the authors for addressing many of my concerns with this revision. However, I do not feel they have addressed them all. In particular I think the authors could do more to address the concern I raised about the instability of the regression coefficients and about providing enough detail to determine that the stacked regression models do not overfit.

      Thank you Reviewer 1 for the comment. We addressed them in our response to Reviewer 1 Recommendations For The Authors #1 and #2 (see below).

      (2) In considering my responses to the authors revision, I also must say that I agree with Reviewer 3 about the limitations of the brain age and brain cognition methods conceptually. In particular that the regression model used to predict fluid cognition will by construction explain more variance in cognition than a brain age model that is trained to predict age. To be fair, these conceptual problems are more widespread than this paper alone, so I do not believe the authors should be penalised for that. However, I would recommend to make these concerns more explicit in the manuscript

      Thank you Reviewer 1 for the comment. We addressed them in our response to Reviewer 1 Recommendations For The Authors #3 (see below).

      Reviewer 2 (Public Review):

      In this study, the authors aimed to evaluate the contribution of brain-age indices in capturing variance in cognitive decline and proposed an alternative index, brain-cognition, for consideration.

      The study employs suitable methods and data to address the research questions, and the methods and results sections are generally clear and easy to follow.

      I appreciate the authors' efforts in significantly improving the paper, including some considerable changes, from the original submission. While not all reviewer points were tackled, the majority of them were adequately addressed. These include additional analyses, more clarity in the methods and a much richer and nuanced discussion. While recognising the merits of the revised paper, I have a few additional comments.

      (1) Perhaps it would help the reader to note that it might be expected for brain-cognition to account for a significantly larger variance (11%) in fluid cognition, in contrast to brain-age. This stems from the fact that the authors specifically trained brain-cognition to predict fluid cognition, the very variable under consideration. In line with this, the authors later recommend that researchers considering the use of brain-age should evaluate its utility using a regression approach. The latter involves including a brain index (e.g. brain-cognition) previously trained to predict the regression's target variable (e.g. fluid cognition) alongside a brain-age index (e.g., corrected brain-age gap). If the target-trained brain index outperforms the brain-age metric, it suggests that relying solely on brain-age might not be the optimal choice. Although not necessarily the case, is it surprising for the target-trained brain index to demonstrate better performance than brain-age? This harks back to the broader point raised in the initial review: while brain-age may prove useful (though sometimes with modest effect sizes) across diverse outcomes as a generally applicable metric, a brain index tailored for predicting a specific outcome, such as brain-cognition in this case, might capture a considerably larger share of variance in that specific context but could lack broader applicability. The latter aspect needs to be empirically assessed.

      Thank you so much for raising this point. Reviewer 1 (Public Review #2/Recommendations For The Authors #3) and Reviewer 3 (Recommendations for the Authors #1) made a similar observation. We now made changes to the introduction and discussion to address this concern (please see our responses to Reviewer 1 Recommendations For The Authors #3 below).

      Briefly, as in our 2nd revision, we did not intend to compare Brain Age with Brain Cognition since, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And such quantification is the third aim of this study.

      (2) Furthermore, the discussion pertaining to training brain-age models on healthy populations for subsequent testing on individuals with neurological or psychological disorders seems somewhat one-sided within the broader debate. This one-sidedness might potentially confuse readers. It is worth noting that the choice to employ healthy participants in the training model is likely deliberate, serving as a norm against which atypical populations are compared. To provide a more comprehensive understanding, referencing Tim Hans's counterargument to Bashyam's perspective could offer a more complete view (https://academic.oup.com/brain/article/144/3/e31/6214475?login=false).

      Thank you Reviewer 2 for bringing up this issue. We have now revised the paragraph in question and added nuances on the usage of Brain Age for normative vs. case-control studies. We also cited Tim Hahn’s article that explained the conceptual foundation of the use of Brain Age in case-control studies. Please see below. Additionally, we also made a statement about our study not being able to address issues about the case-control studies directly in the newly written conclusion (see Reviewer 3 Recommendations for the Authors #3).

      Discussion:

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). We consider the former as a normative type of study and the latter as a case-control type of study (Insel et al., 2010; Marquand et al., 2016). Those case-control Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. On the one hand, this means that case-control studies treat Brain Age as a method to detect anomalies in the neurological/psychological group (Hahn et al., 2021). On the other hand, this also means that case-control studies have to ignore under-fitted models when applied prediction models built from largely healthy participants to participants with neurological/psychological disorders (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other normative studies focusing on cognitive functioning often build age-prediction models from MRI data of largely healthy participants and apply the built age-prediction models to participants who are also largely healthy. Accordingly, the age-prediction models for explaining cognitive functioning in normative studies, while not allowing us to detect group-level anomalies, do not suffer from being under-fitted. This unfortunately might limit the generalisability of our study into just the normative type of study. Future work is still needed to test the utility of brain age in the case-control case.”

      (3) Overall, this paper makes a significant contribution to the field of brain-age and related brain indices and their utility.

      Thank you for the encouragement.

      Reviewer 3 (Public Review):

      The main question of this article is as follows: "To what extent does having information on brain-age improve our ability to capture declines in fluid cognition beyond knowing a person's chronological age?" This question is worthwhile, considering that there is considerable confusion in the field about the nature of brain-age.

      (1) Thank you to the authors for addressing so many of my concerns with this revision. There are a few points that I feel still need addressing/clarifying related to 1) calculating brain cognition, 2) the inevitability of their results, and 3) their continued recommendation to use brain-age metrics.

      Thank you Reviewer 3 for the comment. We addressed them in our response to Reviewer 3 Recommendations For The Authors #1-3 (see below).

      Recommendations for the authors:

      Reviewer 1 (Recommendations For The Authors):

      (1) I do not feel the authors have fully addressed the concern I raised about the stacked regression models. Despite the new figure, it is still not entirely clear what the authors are using as the training set in the final step. To be clear, the problem occurs because of the parameters, not the hyperparameters (which the authors now state that they are optimising via nested grid search). in other words, given a regression model y = X*beta, if the X are taken to be predictions from a lower level regression model, then they contain information that is derived from both the training set at the test set for the model that this was trained on. If the split is the same (i.e. the predictions are derived on the same test set as is being used at the second level), then this can lead to overfitting. It is not clear to me whether the authors have done this or not. Please provide additional detail to clarify this point.

      Thank you for allowing us an opportunity to clarify our stacked model. We wanted to confirm that we did not use test sets to build a stacked model in both lower and higher levels of the Elastic Net models. Test sets were there just for testing the performance of the models. We made additional clarification to make this clearer (see below). Let us explain what we did and provide the rationales below.

      From Methods:

      “We used nested cross-validation (CV) to build these prediction models (see Figure 7). We first split the data into five outer folds, leaving each outer fold with around 100 participants. This number of participants in each fold is to ensure the stability of the test performance across folds. In each outer-fold CV loop, one of the outer folds was treated as an outer-fold test set, and the rest was treated as an outer-fold training set. Ultimately, looping through the nested CV resulted in a) prediction models from each of the 18 sets of features as well as b) prediction models that drew information across different combinations of the 18 separate sets, known as “stacked models.” We specified eight stacked models: “All” (i.e., including all 18 sets of features), “All excluding Task FC”, “All excluding Task Contrast”, “Non-Task” (i.e., including only Rest FC and sMRI), “Resting and Task FC”, “Task Contrast and FC”, “Task Contrast” and “Task FC”. Accordingly, there were 26 prediction models in total for both Brain Age and Brain Cognition.

      To create these 26 prediction models, we applied three steps for each outer-fold loop. The first step aimed at tuning prediction models for each of 18 sets of features. This step only involved the outer-fold training set and did not involve the outer-fold test set. Here, we divided the outer-fold training set into five inner folds and applied inner-fold CV to tune hyperparameters with grid search. Specifically, in each inner-fold CV, one of the inner folds was treated as an inner-fold validation set, and the rest was treated as an inner-fold training set. Within each inner-fold CV loop, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters and applied the estimated model to the inner-fold validation set. After looping through the inner-fold CV, we, then, chose the prediction models that led to the highest performance, reflected by coefficient of determination (R2), on average across the inner-fold validation sets. This led to 18 tuned models, one for each of the 18 sets of features, for each outer fold.

      The second step aimed at tuning stacked models. Same as the first step, the second step only involved the outer-fold training set and did not involve the outer-fold test set. Here, using the same outer-fold training set as the first step, we applied tuned models, created from the first step, one from each of the 18 sets of features, resulting in 18 predicted values for each participant. We, then, re-divided this outer-fold training set into new five inner folds. In each inner fold, we treated different combinations of the 18 predicted values from separate sets of features as features to predict the targets in separate “stacked” models. Same as the first step, in each inner-fold CV loop, we treated one out of five inner folds as an inner-fold validation set, and the rest as an inner-fold training set. Also as in the first step, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters from our grid. We tuned the hyperparameters of stacked models using grid search by selecting the models with the highest R2 on average across the inner-fold validation sets. This led to eight tuned stacked models.

      The third step aimed at testing the predictive performance of the 18 tuned prediction models from each of the set of features, built from the first step, and eight tuned stacked models, built from the second step. Unlike the first two steps, here we applied the already tuned models to the outer-fold test set. We started by applying the 18 tuned prediction models from each of the sets of features to each observation in the outer-fold test set, resulting in 18 predicted values. We then applied the tuned stacked models to these predicted values from separate sets of features, resulting in eight predicted values.

      To demonstrate the predictive performance, we assessed the similarity between the observed values and the predicted values of each model across outer-fold test sets, using Pearson’s r, coefficient of determination (R2) and mean absolute error (MAE). Note that for R2, we used the sum of squares definition (i.e., R2 = 1 – (sum of squares residuals/total sum of squares)) per a previous recommendation (Poldrack et al., 2020). We considered the predicted values from the outer-fold test sets of models predicting age or fluid cognition, as Brain Age and Brain Cognition, respectively.”

      Author response image 1.

      Diagram of the nested cross-validation used for creating predictions for models of each set of features as well as predictions for stacked models.

      Note some previous research, including ours (Tetereva et al., 2022), splits the observations in the outer-fold training set into layer 1 and layer 2 and applies the first and second steps to layers 1 and 2, respectively. Here we decided against this approach and used the same outer-fold training set for both first and second steps in order to avoid potential bias toward the stacked models. This is because, when the data are split into two layers, predictive models built for each separate set of features only use the data from layer 1, while the stacked models use the data from both layers 1 and 2. In practice with large enough data, these two approaches might not differ much, as we demonstrated previously (Tetereva et al., 2022).

      (2) I also do not feel the authors have fully addressed the concern I raised about stability of the regression coefficients over splits of the data. I wanted to see the regression coefficients, not the predictions. The predictions can be stable when the coefficients are not.

      The focus of this article is on the predictions. Still, as pointed out by reviewer 1, it is informative for readers to understand how stable the feature importance (i.e., Elastic Net coefficients) is. To demonstrate the stability of feature importance, we now examined the rank stability of feature importance using Spearman’s ρ (see Figure 4). Specifically, we correlated the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, we computed 10 Spearman’s ρ for each prediction model of the same features. We found Spearman’s ρ to be varied dramatically in both age-prediction (range=.31-.94) and fluid cognition-prediction (range=.16-.84) models. This means that some prediction models were much more stable in their feature importance than others. This is probably due to various factors such as a) the collinearity of features in the model, b) the number of features (e.g., 71,631 features in functional connectivity, which were further reduced to 75 PCAs, as compared to 19 features in subcortical volume based on the ASEG atlas), c) the penalisation of coefficients either with ‘Ridge’ or ‘Lasso’ methods, which resulted in reduction as a group of features or selection of a feature among correlated features, respectively, and d) the predictive performance of the models. Understanding the stability of feature importance is beyond the scope of the current article. As mentioned by Reviewer 1, “The predictions can be stable when the coefficients are not,” and we chose to focus on the prediction in the current article.

      Author response image 2.

      Stability of feature importance (i.e., Elastic Net Coefficients) of prediction models. Each dot represents rank stability (reflected by Spearman’s ρ) in the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, there were 10 Spearman’s ρs for each prediction model. The numbers to the right of the plots indicate the mean of Spearman’s ρ for each prediction model.

      (3) I also must say that I agree with Reviewer 3 about the limitations of the brain-age and brain-cognition methods conceptually. In particular that the regression model used to predict fluid cognition will by construction explain more variance in cognition than a brain-age model that is trained to predict age. This suffers from the same problem the authors raise with brain-age and I agree that this would probably disappear if the authors had a separate measure of cognition against which to validate and were then to regress this out as they do for age correction. I am aware that these conceptual problems are more widespread than this paper alone (in fact throughout the brain-age literature), so I do not believe the authors should be penalised for that. However, I do think they can make these concerns more explicit and further tone down the comments they make about the utility of brain-cognition.

      Thank you so much for raising this point. Reviewer 2 (Public Review #1) and Reviewer 3 (Recommendations for the Authors #1) made a similar observation. We now made changes to the introduction and discussion to address this concern (see below).

      Briefly, we made it explicit that, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. That is, the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. More importantly, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And this is the third goal of this present study.

      From Introduction:

      “Third and finally, certain variation in fluid cognition is related to brain MRI, but to what extent does Brain Age not capture this variation? To estimate the variation in fluid cognition that is related to the brain MRI, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data. Previous studies found reasonable predictive performances of these cognition-prediction models, built from certain MRI modalities (Dubois et al., 2018; Pat et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). Analogous to Brain Age, we called the predicted values from these cognition-prediction models, Brain Cognition. The strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in fluid cognition that is related to the brain MRI and, therefore, indicates the upper limit of Brain Age’s capability in capturing fluid cognition. This is, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Consequently, if we included Brain Cognition, Brain Age and chronological age in the same model to explain fluid cognition, we would be able to examine the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age. These unique effects of Brain Cognition, in turn, would indicate the amount of co-variation between brain MRI and fluid cognition that is missed by Brain Age.”

      From Discussion:

      “Third, by introducing Brain Cognition, we showed the extent to which Brain Age indices were not able to capture the variation in fluid cognition that is related to brain MRI. More specifically, using Brain Cognition allowed us to gauge the variation in fluid cognition that is related to the brain MRI, and thereby, to estimate the upper limit of what Brain Age can do. Moreover, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      From our results, Brain Cognition, especially from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We then examined Brain Cognition using commonality analyses (Nimon et al., 2008) in multiple regression models having a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition. Similar to Brain Age indices, Brain Cognition exhibited large common effects with chronological age. But more importantly, unlike Brain Age indices, Brain Cognition showed large unique effects, up to around 11%. As explained above, the unique effects of Brain Cognition indicated the amount of co-variation between brain MRI and fluid cognition that was missed by a Brain Age index and chronological age. This missing amount was relatively high, considering that Brain Age and chronological age together explained around 32% of the total variation in fluid cognition. Accordingly, if a Brain Age index was used as a biomarker along with chronological age, we would have missed an opportunity to improve the performance of the model by around one-third of the variation explained.”

      Reviewer #3 (Recommendations For The Authors):

      Thank you to the authors for addressing so many of my concerns with this revision. There are a few points that I feel still need addressing/clarifying related to: 1) calculating brain cognition, 2) the inevitability of their results, and 3) their continued recommendation to use brain age metrics.

      (1) I understand your point here. I think the distinction is that it is fine to build predictive models, but then there is no need to go through this intermediate step of "brain-cognition". Just say that brain features can predict cognition XX well, and brain-age (or some related metric) can predict cognition YY well. It creates a confusing framework for the reader that can lead them to believe that "brain-cognition" is not just a predicted value of fluid cognition from a model using brain features to predict cognition. While you clearly state that that is in fact what it is in the text, which is a huge improvement, I do not see what is added by going through brain-cognition instead of simply just obtaining a change in R2 where the first model uses brain features alone to predict cognition, and the second adds on brain-age (or related metrics), or visa versa, depending on the question. Please do this analysis, and either compare and contrast it with going through "brain-cognition" in your paper, or switch to this analysis, as it more directly addresses the question of the incremental predictive utility of brain-age above and beyond brain features.

      Thank you so much for raising this point. Reviewer 1 (Public Review #2/Recommendations For The Authors #3) and Reviewer 2 (Public Review #1) made a similar observation. We now made changes to the introduction and discussion to address this concern (see our responses to Reviewer 1 Recommendations For The Authors #3 above).

      Briefly, as in our 2nd revision, we made it explicitly clear that we did not intend to compare Brain Age with Brain Cognition since, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. And, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      We have thought about changing the name Brain Cognition into something along the lines of “predicted values of prediction models predicting fluid cognition based on brain MRI.” However, this made the manuscript hard to follow, especially with the commonality analyses. For instance, the sentence, “Here, we tested Brain Cognition’s unique effects in multiple regression models with a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition” would become “Here, we tested predicted values of prediction models predicting fluid cognition based on brain MRI unique effects in multiple regression models with a Brain Age index, chronological age and predicted values of prediction models predicting fluid cognition based on brain MRI as regressors to explain fluid cognition.” We believe, given our additional explanation (see our responses to Reviewer 1 Recommendations For The Authors #3 above), readers should understand what Brain Cognition is, and that we did not intend to compare Brain Age and Brain Cognition directly.

      As for the suggested analysis, “obtaining a change in R2 where the first model uses brain features alone to predict cognition, and the second adds on brain-age (or related metrics), or visa versa,” we have already done this in the form of commonality analysis (Nimon et al., 2008) (see Figure 7 below). That is, to obtain unique and common effects of the regressors, we need to look at all of the possible changes in R2 when all possible subsets of regressors were excluded or included, see equations 12 and 13 below.

      From Methods:

      “Similar to the above multiple regression model, we had chronological age, each Brain Age index and Brain Cognition as the regressors for fluid cognition:

      Fluid Cognitioni = β0 + β1 Chronological Agei + β2 Brain Age Indexi,j + β3 Brain Cognitioni + εi, (12)

      Applying the commonality analysis here allowed us, first, to investigate the addictive, unique effects of Brain Cognition, over and above chronological age and Brain Age indices. More importantly, the commonality analysis also enabled us to test the common, shared effects that Brain Cognition had with chronological age and Brain Age indices in explaining fluid cognition. We calculated the commonality analysis as follows (Nimon et al., 2017):

      Unique Effectchronological age = ΔR2chronological age = R2chronological age, Brain Age index, Brain Cognition – R2 Brain Age index, Brain Cognition

      Unique EffectBrain Age index = ΔR2Brain Age index = R2chronological age, Brain Age index, Brain Cognition – R2 chronological age, Brain Cognition

      Unique EffectBrain Cognition = ΔR2Brain Cognition = R2chronological age, Brain Age index, Brain Cognition – R2 chronological age, Brain Age Index

      Common Effectchronological age, Brain Age index = R2chronological age, Brain Cognition + R2 Brain Age index, Brain Cognition – R2 Brain Cognition – R2chronological age, Brain Age index, Brain Cognition

      Common Effectchronological age, Brain Cognition = R2chronological age, Brain Age Index + R2 Brain Age index, Brain Cognition – R2 Brain Age Index – R2chronological age, Brain Age index, Brain Cognition

      Common Effect Brain Age index, Brain Cognition = R2chronological age, Brain Age Index + R2 chronological age, Brain Cognition – R2 chronological age – R2chronological age, Brain Age index, Brain Cognition

      Common Effect chronological age, Brain Age index, Brain Cognition = R2 chronological age + R2 Brain Age Index + R2 Brain Cognition – R2chronological age, Brain Age Index – R2 chronological age, Brain Cognition – R2 Brain Age Index, Brain Cognition – R2chronological age, Brain Age index, Brain Cognition , (13)”

      (2) I agree that the solution is not to exclude age as a covariate, and that there is a big difference between inevitable and obvious. I simply think a further discussion of the inevitability of the results would be clarifying for the readers. There is a big opportunity in the brain-age literature to be as direct as possible about why you are finding what you are finding. People need to know not only what you found, but why you found what you found.

      Thank you. We agreed that we need to make this point more explicit and direct. In the revised manuscript, we had the statements in both Introduction and Discussion (see below) about the tight relationship between Brain Age and chronological age by design, making the small unique effects of Brain Age inevitable.

      Introduction:

      “Accordingly, by design, Brain Age is tightly close to chronological age. Because chronological age usually has a strong relationship with fluid cognition, to begin with, it is unclear how much Brain Age adds to what is already captured by chronological age.“

      Discussion:

      “First, Brain Age itself did not add much more information to help us capture fluid cognition than what we had already known from a person’s chronological age. This can clearly be seen from the small unique effects of Brain Age indices in the multiple regression models having Brain Age and chronological age as the regressors. While the unique effects of some Brain Age indices from certain age-prediction models were statistically significant, there were all relatively small. Without Brain Age indices, chronological age by itself already explained around 32% of the variation in fluid cognition. Including Brain Age indices only added around 1.6% at best. We believe the small unique effects of Brain Age were inevitable because, by design, Brain Age is tightly close to chronological age. Therefore, chronological age and Brain Age captured mostly a similar variation in fluid cognition.

      Investigating the simple regression models and the commonality analysis between each Brain Age index and chronological age provided additional insights….”

      (3) I believe it is very important to critically examine the use of brain-age and related metrics. As part of this process, I think we should be asking ourselves the following questions (among others): Why go through age prediction? Wouldn't the predictions of cognition (or another variable) using the same set of brain features always be as good or better? You still have not justified the use of brain-age. As I said before, if you are going to continue to recommend the use of brain-age, you need a very strong argument for why you are recommending this. What does it truly add? Otherwise, temper your statements to indicate possible better paths forward.

      Thank you Reviewer 3 for making an argument against the use of Brain Age. We largely agree with you. However, our work only focuses on one phenotype, fluid cognition, and on the normative situation (i.e., not having a case vs control group). As Reviewer 2 pointed out, Brain Age might still have utility in other cases, not studied here. Still, future studies that focus on other phenotypes may consider using our approach as a template to test the utility of Brain Age in other situations. We added the conclusion statement to reflect this.

      From Discussion:

      “Altogether, we examined the utility of Brain Age as a biomarker for fluid cognition. Here are the three conclusions. First, Brain Age failed to add substantially more information over and above chronological age. Second, a higher ability to predict chronological age did not correspond to a higher utility to capture fluid cognition. Third, Brain Age missed up to around one-third of the variation in fluid cognition that could have been explained by brain MRI. Yet, given our focus on fluid cognition, future empirical research is needed to test the utility of Brain Age on other phenotypes, especially when Brain Age is used for anomaly detection in case-control studies (e.g., Bashyam et al., 2020; Rokicki et al., 2021). We hope that future studies may consider applying our approach (i.e., using the commonality analysis that includes predicted values from a model that directly predicts the phenotype of interest) to test the utility of Brain Age as a biomarker for other phenotypes.”

      References

      Bashyam, V. M., Erus, G., Doshi, J., Habes, M., Nasrallah, I. M., Truelove-Hill, M., Srinivasan, D., Mamourian, L., Pomponio, R., Fan, Y., Launer, L. J., Masters, C. L., Maruff, P., Zhuo, C., Völzke, H., Johnson, S. C., Fripp, J., Koutsouleris, N., Satterthwaite, T. D., … on behalf of the ISTAGING Consortium, the P. A. disease C., ADNI, and CARDIA studies. (2020). MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide. Brain, 143(7), 2312–2324. https://doi.org/10.1093/brain/awaa160

      Butler, E. R., Chen, A., Ramadan, R., Le, T. T., Ruparel, K., Moore, T. M., Satterthwaite, T. D., Zhang, F., Shou, H., Gur, R. C., Nichols, T. E., & Shinohara, R. T. (2021). Pitfalls in brain age analyses. Human Brain Mapping, 42(13), 4092–4101. https://doi.org/10.1002/hbm.25533

      Cole, J. H. (2020). Multimodality neuroimaging brain-age in UK biobank: Relationship to biomedical, lifestyle, and cognitive factors. Neurobiology of Aging, 92, 34–42. https://doi.org/10.1016/j.neurobiolaging.2020.03.014

      Dubois, J., Galdi, P., Paul, L. K., & Adolphs, R. (2018). A distributed brain network predicts general intelligence from resting-state human neuroimaging data. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1756), 20170284. https://doi.org/10.1098/rstb.2017.0284

      Hahn, T., Fisch, L., Ernsting, J., Winter, N. R., Leenings, R., Sarink, K., Emden, D., Kircher, T., Berger, K., & Dannlowski, U. (2021). From ‘loose fitting’ to high-performance, uncertainty-aware brain-age modelling. Brain, 144(3), e31–e31. https://doi.org/10.1093/brain/awaa454

      Insel, T., Cuthbert, B., Garvey, M., Heinssen, R., Pine, D. S., Quinn, K., Sanislow, C., & Wang, P. (2010). Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. American Journal of Psychiatry, 167(7), 748–751. https://doi.org/10.1176/appi.ajp.2010.09091379

      Jirsaraie, R. J., Kaufmann, T., Bashyam, V., Erus, G., Luby, J. L., Westlye, L. T., Davatzikos, C., Barch, D. M., & Sotiras, A. (2023). Benchmarking the generalizability of brain age models: Challenges posed by scanner variance and prediction bias. Human Brain Mapping, 44(3), 1118–1128. https://doi.org/10.1002/hbm.26144

      Marquand, A. F., Rezek, I., Buitelaar, J., & Beckmann, C. F. (2016). Understanding Heterogeneity in Clinical Cohorts Using Normative Models: Beyond Case-Control Studies. Biological Psychiatry, 80(7), 552–561. https://doi.org/10.1016/j.biopsych.2015.12.023

      Nimon, K., Lewis, M., Kane, R., & Haynes, R. M. (2008). An R package to compute commonality coefficients in the multiple regression case: An introduction to the package and a practical example. Behavior Research Methods, 40(2), 457–466. https://doi.org/10.3758/BRM.40.2.457

      Pat, N., Wang, Y., Anney, R., Riglin, L., Thapar, A., & Stringaris, A. (2022). Longitudinally stable, brain‐based predictive models mediate the relationships between childhood cognition and socio‐demographic, psychological and genetic factors. Human Brain Mapping, hbm.26027. https://doi.org/10.1002/hbm.26027

      Poldrack, R. A., Huckins, G., & Varoquaux, G. (2020). Establishment of Best Practices for Evidence for Prediction: A Review. JAMA Psychiatry, 77(5), 534–540. https://doi.org/10.1001/jamapsychiatry.2019.3671

      Rasero, J., Sentis, A. I., Yeh, F.-C., & Verstynen, T. (2021). Integrating across neuroimaging modalities boosts prediction accuracy of cognitive ability. PLOS Computational Biology, 17(3), e1008347. https://doi.org/10.1371/journal.pcbi.1008347

      Rokicki, J., Wolfers, T., Nordhøy, W., Tesli, N., Quintana, D. S., Alnæs, D., Richard, G., de Lange, A.-M. G., Lund, M. J., Norbom, L., Agartz, I., Melle, I., Nærland, T., Selbæk, G., Persson, K., Nordvik, J. E., Schwarz, E., Andreassen, O. A., Kaufmann, T., & Westlye, L. T. (2021). Multimodal imaging improves brain age prediction and reveals distinct abnormalities in patients with psychiatric and neurological disorders. Human Brain Mapping, 42(6), 1714–1726. https://doi.org/10.1002/hbm.25323

      Sripada, C., Angstadt, M., Rutherford, S., Taxali, A., & Shedden, K. (2020). Toward a “treadmill test” for cognition: Improved prediction of general cognitive ability from the task activated brain. Human Brain Mapping, 41(12), 3186–3197. https://doi.org/10.1002/hbm.25007

      Tetereva, A., Li, J., Deng, J. D., Stringaris, A., & Pat, N. (2022). Capturing brain‐cognition relationship: Integrating task‐based fMRI across tasks markedly boosts prediction and test‐retest reliability. NeuroImage, 263, 119588. https://doi.org/10.1016/j.neuroimage.2022.119588

      Vieira, B. H., Pamplona, G. S. P., Fachinello, K., Silva, A. K., Foss, M. P., & Salmon, C. E. G. (2022). On the prediction of human intelligence from neuroimaging: A systematic review of methods and reporting. Intelligence, 93, 101654. https://doi.org/10.1016/j.intell.2022.101654

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This paper provides useful information about how the ionome of Arabidopsis thaliana adapts to very high CO2-levels, backed up by solid evidence and carefully designed studies. However, the broader claims of the paper about climate change and food security - heavily emphasized in the abstract, introduction, and discussion - are inappropriate, as there is no direct link to the presented work.

      We sincerely thank you for the work you have done in reviewing our manuscript. We very much appreciate your overall positive assessment of the experimental work as a whole, its value and robustness.

      In this revised version, we took on board the majority of your suggestions and your comments. In particular, we understood your critical point about overstating our objectives, which might in turn seem uncorrelated with our results. We fully agree with the comments that have been made on this point. Consequently, we have made substantial modifications and corrections in order to clarify our objectives and their implications: exploring in depth the natural variation of the shoot ionome response to elevated CO2, and generating a valuable resource allowing a better understanding of the genetic and molecular mechanisms involved in the regulation of plant mineral nutrition by the elevation of atmospheric CO2.

      We also made modifications in response to the other suggestions, including a clarification of the functional experiments carried out around the function of TIP2;2 in response to elevated CO2. Figure 7 now comprises the comparison between both ambient and elevated CO2 conditions, which is much more informative that what appeared in the previous version.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study's abstract, introduction, and conclusions are not supported by the methods and results conducted. In fact, the results presented suggest that Arabidopsis could easily adapt to an extremely high CO2 environment.

      We understand the reviewer’s comment. Although our work is considered useful, robust and well designed, we agree with the reviewer's point. We have certainly overemphasized the significance of our work to address the issue of food security in response to rising atmospheric CO2, at the expense of the factual description of the results of our fundamental study of the mechanisms at the interface between CO2 and mineral nutrition. We have clarified this focus by modifying the text of the introduction, objectives and discussion. We hope that these modifications will enable readers to better appreciate the core of this work.

      Regarding the last part of the comment, our results do suggest that genetic variation could allow adaptation to rising atmospheric CO2, and our study does indeed aim to identify the extent and basis of this genetic variation.

      This study offers good evidence pointing to a genetic basis for Arabidopsis thaliana's response to elevated CO2 (eCO2) levels and its subsequent impact on the leaf ionome. The natural variation analyses in the study support the hypothesis that genetic factors, rather than local adaptation, guide the influence of eCO2 on the ionome of rosette leaves in Arabidopsis. However, the manuscript's claim regarding its role in "the development of biofortified crops adapted to a high-CO2 world" (line 23) is overstated, especially given the absence of any analysis on the influence of eCO2 on the seed ionome and Arabidopsis is a poor model for harvest index for any crop. The manuscript, in its current form, necessitates massive revisions, particularly in clarifying its broader implications and in providing more substantial evidence for some of its assertions.

      We thank the reviewer for this comment, and we would like to thank the reviewer for the positive appreciation for the identification of genetic basis for Arabidopsis thaliana's response to elevated CO2 and its subsequent impact on the leaf ionome. Nevertheless, it is true that the study of the leaf ionome is far from being able to lead to the development of biofortified plants. Some papers described that nutrient harvest index in Arabidopsis is a potential indicator of nutrient use efficiency (for instance, Masclaux-Daubresse and Chardon, Journal of Experimental Botany 2011 or Aranjuelo et al., Journal of Experimental Botany 2013). However, as we did not include any seed ionome data in the paper, we added clear mentions that our analyses were made on leaves (lines 56/57/250/319) and a comment in the discussion section to address this limitation (lines 325-328).

      Major Drawbacks and Questions:

      (1) Evidence for the Central Premise:

      The foundational premise of the study is the assertion that rising atmospheric CO2 levels result in a decline in plant mineral content. This phenomenon is primarily observed in C3 plants, with C4 plants seemingly less affected. The evidence provided on this topic is scant and, in some instances, contradicts the authors' own references. The potential reduction of certain minerals, especially in grains, can be debated. For instance, reduced nitrogen (N) and phosphorus (P) content in grains might not necessarily be detrimental for human and animal consumption. In fact, it could potentially mitigate issues like nitrogen emissions and phosphorus leaching. Labeling this as a "major threat to food security" (line 30) is exaggerated. While the case for microelements might be more compelling, the introduction fails to articulate this adequately. Furthermore, the introduction lacks any discussion on how eCO2 might influence nutrient allocation to grains, which would be crucial in substantiating the claim that eCO2 poses a threat to food security. A more comprehensive introduction that clearly delineates the adverse effects of eCO2 and its implications for food security would greatly enhance the manuscript.

      We partially agree with this comment. The decline in mineral status of C3 plants under conditions of elevated atmospheric CO2 has been widely described in the literature, and specifically documented for the cereal grains. While there are variations in this effect (depending on species, ecotype, cultivar), there is no debate about its acceptance. Here are just a few of the many works describing this effect, both on a global scale and at the level of the individual plant (Cotrufo MF (1998) Elevated CO2 reduces the nitrogen concentration of plant tissues. Global Change Biology 4: 43-54; Loladze I (2014) Hidden shift of the ionome of plants exposed to elevated CO(2)depletes minerals at the base of human nutrition. eLife 3: e02245; Myers SS (2014) Increasing CO2 threatens human nutrition. Nature 510: 139-142; Poorter H (1997) The effect of elevated CO2 on the chemical composition and construction costs of leaves of 27 C3 species. Plant, Cell & Environment 20: 472-482 ; Soares JC (2019) Preserving the nutritional quality of crop plants under a changing climate: importance and strategies. Plant and Soil 443: 1-26; Stitt] M (1999) The interaction between elevated carbon dioxide and nitrogen nutrition: the physiological and molecular background. Plant, Cell & Environment 22: 583-621; Uddling J (2018) Crop quality under rising atmospheric CO2. Curr Opin Plant Biol 45: 262-267).

      In addition to this, the threat to food security posed by this alteration in plant mineral status has also been well described in the literature by several modeling approaches (Beach RH (2019) Combining the effects of increased atmospheric carbon dioxide on protein, iron, and zinc availability and projected climate change on global diets: a modelling study. Lancet Planet Health 3: e307-e317; Ebi KL (2019) Elevated atmospheric CO(2) concentrations and climate change will affect our food's quality and quantity. Lancet Planet Health 3: e283-e284; Medek DE (2017) Estimated Effects of Future Atmospheric CO2 Concentrations on Protein Intake and the Risk of Protein Deficiency by Country and Region. Environ Health Perspect 125: 087002; Smith MR (2018) Impact of anthropogenic CO2 emissions on global human nutrition. Nature Climate Change 8: 834-839; Weyant C (2018) Anticipated burden and mitigation of carbon-dioxide-induced nutritional deficiencies and related diseases: A simulation modeling study. PLoS Med 15: e1002586; Zhu C (2018) Carbon dioxide (CO2) levels this century will alter the protein, micronutrients, and vitamin content of rice grains with potential health consequences for the poorest rice-dependent countries. Sci Adv 4: eaaq1012). To reinforce this point, we have added a sentence and references (lines 30-33). Nevertheless, we understand the reviewer's comment on the nuance to be given to the intensity of this potential threat. We have therefore modified the text, replacing "major threat" by "significant threat" (lines 3 and 29).

      We also would like to answer the reviewer’s comment on the potential environmental benefit associated with reduced N and P content in grains (mitigation of N emissions and P leaching). Indeed, if this reduced N and P content results from a lowered use efficiency of soil nutrients by plants, as suggested by several studies (Bloom 2010, Cassan 2023, Gojon 2023 and references therein), this may at the opposite favor N oxides emission and P leaching from the soil.

      (2) Exaggerated Concerns:

      The paper begins with the concern that carbon fertilization will lead to carbon dilution in our foods. While we indeed face numerous genuine threats in the coming decades, this particular issue is manageable. The increase in CO2 alone offers many opportunities for boosting yield. However, the heightened heat and increased evapotranspiration will pose massive challenges in many environments.

      While there are indeed multiple threats that we are facing in the coming decades, we don't fully agree with this comment. At present, there's no evidence to say that the negative effect of CO2 on plant mineral content will be manageable. Furthermore, there is compelling evidence that altered mineral nutrition and mineral status of plants will be an important factor limiting the high CO2-induced increase in yield, as will be heat or increased evapotranspiration (see for instance Coskun et al (2016) Nutrient constraints on terrestrial carbon fixation: The role of Nitrogen. J. Plant Physiol. 203: 95-109; Jiang M (2020) Low phosphorus supply constrains plant responses to elevated CO2 : A meta-analysis. Glob Chang Biol 26: 5856-5873 ; Reich PB (2006) Nitrogen limitation constrains sustainability of ecosystem response to CO2. Nature 440: 922-925). Thus, although we do not negate the crucial importance of heat and water stress, we believe it is relevant to study the basic mechanisms responsible for the negative effect of CO2 on plant mineral composition.

      Figure 4 in fact suggests that 43% of the REGMAP panel (cluster 3) is already pre-adapted to very high CO2 levels. This suggests annual species could adapt very rapidly.

      We agree with the reviewer. However, this suggests that genetic variation exists in some ecotypes to support adaptation to elevated CO2. The purpose of this work is indeed to identify this genetic variation, in order to characterize the mechanisms behind.

      (3) Assumptions on CO2 Levels:

      The assumption of 900ppm seems to be based on a very extreme climate change scenario. Most people believe we will overshoot the 1.5°C scenario, however, it seems plausible that 2.5 to 3°C scenarios are more likely. This would correspond to around 500ppm of CO2. https://www.nature.com/articles/s41597-022-01196-7/tables/4

      We agree with the reviewer that the CO2 concentration we used corresponds to a high value in the IPCC projections. That said, this value is currently considered very plausible: the following figure (from Smith and Myers (2018) Nature Climate Change) shows that current CO2 emissions align with the IPCC's most extreme model (RCP 8.5), which would result in a CO2 concentration of around 900 ppm in 2100. Furthermore, nothing allows to exclude the 4°C scenario in the 6th IPCC report.

      Author response image 1.

      (4) Focus on Real Challenges:

      We have numerous real challenges, such as extreme heat and inconsistent rainfall, to address in the context of climate change. However, testing under extreme CO2 conditions and then asserting that carbon dilution will negatively impact nutrition is exaggerated.

      While we fully agree that several threats linked to climate change exist, and all deserve to be studied, we find it questionable to consider that the potential effect of high CO2 on the mineral nutrition of plants is not a real challenge. The mineral nutrition of plants is already a current major environmental challenge. This perspective seems to reflect the reviewer's personal opinion rather than an analysis of our work.

      In contrast, the FACE experiments are fundamental and are conducted at more realistic eCO2 levels. Understanding the interaction between a 20% increase in CO2 and new precipitation patterns is key for global carbon flux prediction.

      Again, we do not fully understand this comment, as the aim of our study was not to perform a global carbon flux prediction, but to unravel genes and mechanisms underlying the negative effect of elevated CO2 on the nutrient content of Arabidopsis rosettes. However, we agree with the reviewer’s comment and with the fact that FACE are useful facilities to explore the CO2 response in more natural environments, and we highlight the fact that the decrease in mineral status of C3 plants has been widely documented in FACE studies. FACE experiments do not facilitate, however, to conduct fully controlled experiments (temperature, rainfall, wind and light intensities are not controllable in FACE), that allow to disentangle the mechanisms by which elevated CO2 regulates the signaling pathways associated with the plant mineral composition. In the longer term, studying the mechanisms we have identified in a more global context of climate change could be highly relevant.

      As I look at the literature on commercial greenhouse tomato production, 1000ppm of eCO2 is common, but it also looks like the breeders and growers have already solved for flavor and nutrition under these conditions.

      Indeed, tomato is often cultivated in CO2-enriched greenhouses at 1000 ppm. According to the literature, this results in a 20-25% reduction in vitamin C or lycopene, and requires a significantly higher nitrogen and water intake to reach expected sugar levels (Doddrell H (2023) Horticulture Research). In addition, the negative effect of elevated CO2 on tomato nutrient content seems to have significant repercussions on nutrition-health properties (Boufeldja (2023), Molecules).

      Conclusion:

      While the study provides valuable insights into the genetic underpinnings of Arabidopsis thaliana's response to elevated CO2 levels, it requires an entirely revised writeup, especially in its abstract, broader claims and implications. The manuscript would benefit from a more thorough introduction, a clearer definition of its scope, and a clear focus on the limits of this study.

      We thank the reviewer for the comments made on our manuscript. In addition to the responses that we provide to these comments, we have modified the main text of the introduction, objectives and discussion to take these comments into consideration. We believe that this will significantly improve the manuscript.

      Reviewer #2 (Public Review):

      Strengths:

      The authors have conducted a large, well-designed experiment to test the response to eCO2. Overall, the experimental design is sound and appropriate for the questions about how a change in CO2 affects the ionome of Arabidopsis. Most of the conclusions in this area are well supported by the data that the authors present.

      We thank the reviewer for this positive appreciation.

      Weakness:

      While the authors have done good experiments, it is a big stretch from Arabidopsis grown in an arbitrary concentration of CO2 to relevance to human and animal nutrition in future climates. Arabidopsis is a great model plant, but its leaves are not generally eaten by humans or animals.

      We agree with the reviewer’s comment. We recognized that implying a direct contribution of our work to human nutrition in the future climates is overstated, as mentioned by the reviewer 1 as well. This was not an intentional overstatement, as we have always been convinced that our work contributed to the understanding of the basic mechanisms involved in the negative regulation of plant mineral nutrition by high CO2. We have significantly modified the text to correct any misunderstanding of our work’s implication.

      The authors don't justify their choice of a CO2 concentration. Given the importance of the parameter for the experiment, the rationale for selecting 900 ppm as elevated CO2 compared to any other concentration should be addressed. And CO2 is just one of the variables that plants will have to contend with in future climates, other variables will also affect elemental concentrations.

      We agree with this comment. We added a justification of the high CO2 concentration used in this work in the Material and Methods section (lines 343-344). You can also read the explanation of this choice in the response to the reviewer 1’s point 3.

      Given these concerns, I think the emphasis on biofortification for future climates is unwarranted for this study.

      Anew, we agree with this comment and we have significantly modified the text to correct any misunderstanding of our work’s implication.

      Additionally, I have trouble with these conclusions:

      -Abstract "Finally, we demonstrate that manipulating the function of one of these genes can mitigate the negative effect of elevated CO2 on the plant mineral composition."

      -Discussion "Consistent with these results, we show that manipulating TIP2;2 expressions with a knock-out mutant can modulate the Zn loss observed under high CO2."

      The authors have not included the data to support this conclusion as stated. They have shown that this mutant increases the Zn content of the leaves when compared to WT but have not demonstrated that this response is different than in ambient CO2. This is an important distinction: one way to ameliorate the reduction of nutrients due to eCO2 is to try to identify genes that are involved in the mechanism of eCO2-induced reduction. Another way is to increase the concentration of nutrients so that the eCO2-induced reduction is not as important (i.e. a 10% reduction in Zn due to eCO2 is not as important if you have increased the baseline Zn concentration by 20%). The authors identified tip2 as a target from the GWAS on difference, but their validation experiment only looks at eCO2.

      We thank the reviewer for this comment, and we agree with it. It is much more interesting, especially in the context of this paper, to analyze the function of a candidate gene not only in elevated CO2, but in both ambient and elevated CO2. Therefore, we added in Figure 7 data for the expression of TIP2;2 in contrasted haplotypes under ambient CO2, in comparison to those already presented under elevated CO2 (now Fig. 7C and 7D). This showed that TIP2;2 expression is lower in haplotype 0 also under ambient CO2. We also added in Figure 7 (Fig. 7E) the Zn level in WT and tip2;2-1 mutant under ambient CO2, in comparison to those already presented under elevated CO2. This showed that that the tip2;2-1 mutant line did not present any decrease in Zn shoot content in response to elevated CO2, in opposition to what is observed for the WT.

      We have added comments associated to these new results in the Results and Discussion sections and in the discussion section (lines 233-242 in the results section, and lines 310-314 in the discussion section).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Reviewer Comments on the Article's Approach to Ionome Analysis

      (1) Omission of Phosphorus from the Ionome:

      It's surprising that phosphorus (P) was not measured in the ionome. After nitrogen (N), P is often the most limiting mineral for plant development and yield, making it a significant component of the ionome. Why did the authors omit this crucial element?

      We agree with the reviewer that P is an important mineral for plant growth. The absence of data related to P content is due to feasibility constraints rather than oversight. The MP-AES instrument we used to analyze the ionome (except N and C, that we obtained from an Elementar Analyzer) would have required an extra-step and an extra-analysis to obtain data for macronutrient such as P or K. In the context of this large-scale experiment, we faced the necessity to compromise and proceed without these data.

      (2) Relationship Between Leaf Ionome and Seed:

      The manuscript lacks evidence demonstrating the relationship between the leaf ionome and the seed. This connection is vital to establish the study's aims as outlined in lines 20-24. If the central argument is that eCO2 threatens food security, it's essential for the authors to either:

      • Provide evidence that eCO2 induces changes in the ionome profiles of seeds.

      • Show that changes in the rosette leaf ionome lead to alterations in seed ionome profiles.

      We agree with the reviewer. Although we know that seed ionome composition of Arabidopsis model accession such as Columbia is indeed negatively affected by eCO2, we do not provide the data that support some of the terms used in lines 20-24. The correspondence between leaf and seed ionome in natural population under eCO2 is certainly a next question that we will address. Therefore, to align our stated objectives with our data, we have modified the sentence in lines 20-24. We also added a comment on this point lines on the discussion section (lines 324-328).

      (3) Analysis of Ionome in Rosette Leaves:

      Why did the authors choose to analyze the ionome specifically in rosette leaves? Is there a known correlation between the ionome profile in rosette leaves and seeds?

      See our answer to the above comment.

      (4) Experimental Design Comments:

      • The layout of the accession growouts, the methods of randomization, blocking, and controls/checks should be detailed.

      • Were BLUEs (Best Linear Unbiased Estimators) or BLUPs (Best Linear Unbiased Predictors) employed to account for experimental design conditions? If not, it's recommended that they be used.

      We thank the reviewer for this comment. A note on replicates has been added in the Method/Plant Material section. Concerning the BLUEs/BLUPs, although I am not familiar with their use, I do not think that these approaches are relevant in our experimental design. Indeed, we pooled 3 to 5 replicates for each accession to measure the ionome (as mentioned in the Method/Ionome analysis section – we realized this was perhaps not clear enough, and thus we reinforced this point in this section). Therefore, we do not have the variance data required to perform BLUEs/BLUPs.

      (5) Carbon Dilution Effect:

      The statement, "The first component of the PCA described a clear antagonistic trend between C content and the change of other mineral elements (Fig. 3B)..." suggests a well-understood carbon dilution effect. These results are anticipated and align with existing knowledge.

      We thank the reviewer for this comment. However, this sentence does not relate to the biomass dilution hypothesis referred to by the reviewer. Indeed, the composition of each mineral (C and others) is expressed as a percentage of biomass, not as an absolute value. Therefore, this reflects more a probable effect of the increase in carbon compounds (notably soluble sugars), which could influence mineral composition.

      (6) Heritability Estimates:

      The authors should report both the broad-sense heritability and an estimate of heritability based on a GRM or Kinship matrix.

      We thank the reviewer for this suggestion. We are skeptical of using a kinship matrix to estimate heritability in our study. Estimating narrow-sense heritability using a kinship matrix is conceptually based on the infinitesimal model of Fisher, thereby meaning that phenotypic variation is driven by hundreds to thousands of QTLs with small effects. If this is the case, GWAS conducted on several hundred (or even thousands) of genotypes will not be powerful enough to detect such QTLs. Accordingly, estimates of broad-sense heritability based on estimates of variance components can drastically differ from estimates of narrow-sense heritability based on the use of a kinship matrix, as illustrated in the study of Bergelson et al. (2019 Scientific Reports).

      (7) Application of the Breeder's Equation:

      It would be beneficial if the authors applied the breeder's equation to estimate the species' potential rate of response. Based on the allele frequency of the adapted cluster 3 (69 ecotypes or 43% frequency of Figure 3B), it seems plausible that the populations could adapt within 23 generations.

      We thank the reviewer for this suggestion. Indeed, it would be really interesting to test whether sub-populations could adapt in comparison with others, and over what period of time. It is nevertheless not possible to do so using the Breeder’s equation in our case, as this requires fitness data under conditions of ambient or elevated CO2 (i.e. production of seeds) to be applied, and we do not have these data at the level of the whole population.

      (8) Overall Quality:

      In general, the authors have executed a high-quality ionome mapping experiment. However, the abstract, introduction, and discussion should be entirely rewritten and reframed.

      We thank the reviewer for the positive evaluation of our experiment. As previously mentioned, we are for the most part in agreement with the comments made about the need to align our stated objectives with our experimental data and conclusions. To do so, we have rewritten part of the abstract, introduction and discussion. The details of these modifications are described in the responses made to each comment.

      Here's a line-by-line list of suggestions on writing:

      Line 30 would read better with a comma after thus (or by replacing thus with therefore and then a comma at the start of the sentence).

      Line 33 nevertheless would read better in between commas.

      Lines 45 - 48 sentence is too long, could probably divide it into two.

      Lines 90 - 94 are hard to interpret, recommend rephrasing for clarity.

      Line 130 - keep verbs in the past tense for consistency (ran instead of run).

      Line 194 - what do the authors mean by crossed? I'm inferring they looked at the intersection of DEGs with the list of genes identified by GWA mapping, probably should use a more concise word.

      There's a concurrent use of the adjective strong (Lines 80, 142, 144, 197, 245). I would advise using a more concise adjective or avoiding its use to let the reader form their own opinion on the data.

      Lines 174-176 the cited reference (No. 15) is incorrect. The study by Katz et al. (2022) does not provide information on the role of ZIF1 in zinc sequestration mechanisms under elevated CO2 conditions.

      We thank the reviewer for these detailed recommendations. We have corrected or rephrased the text according to these suggestions.

      Reviewer #2 (Recommendations For The Authors):

      Technical points:

      900 ppm as elevated CO2: Given the importance of the parameter for the experiment, the rationale for selection 900 ppm as elevated CO2 compared to any other concentration should be addressed.

      We acknowledge the reviewer's point and have previously addressed related aspects earlier in our response. In line with this, we have included a justification for this particular parameter in the Method section.

      The authors do not mention what genotype was used for their root/shoot RNAseq experiment.

      We thank the reviewer for this comment, and indeed, this information was not mentioned. This is now done, in the Method section.

      Line 125: Spelling error "REGMPA".

      This has been corrected.

      Line 338: Removal of outlier observations - "Prior to GWAS and multivariate analyses such as PCA or clustering, mineral composition measures were pre-processed to remove technical outliers". The authors should mention the exact number of outliers that were removed and what the explicit criteria were for removal.

      The number of outliers removed from each dataset is now indicated in Supplemental Table 7 (this is cited in the Method section). The explicit criteria used for this analysis is actually mentioned in the corresponding Method section: “the values positioned more than 5 median absolute deviations away from the median were removed from the dataset”.

      Line 379: "Lowly expressed genes with an average value across conditions under 25 reads were excluded from the analysis". Providing information about the number of the lowly expressed genes that were removed from the analysis can help with the interpretation of the likelihood of the candidates selected being correct.

      This is a standard procedure in RNAseq analysis. It avoids many false positives in the differential analysis of gene expression based on ratios (where a very small number in the denominator can lead to a very high variation in expression, of no real significance). For information, this step led to the removal of 11607 and 10121 genes for the shoot and root datasets.

      Line 384: It's not clear how many biological replicates were used.

      This has been corrected.

      Additional comment: We have also become aware of a confusion concerning one of the candidate genes located close to GWA peaks: line 180 of the first version, we mentioned CAX1 (AT1G16380) for its role on nutrient deficiency response. There are actually two genes annotated as CAX1 in TAIR (both are cation exchangers), but the one involved in nutrient deficiency response is AT2G38170. We therefore removed the sentence mentioning AT1G16380/CAX1 as a potential candidate gene.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive comments and suggestions. We have prepared a revised manuscript with updated quantification of theta cycle skipping, new statistical comparisons of the difference between the two behavioral tasks, and general improvements to the text and figures.

      Reviewer #1 (Public Review):

      Summary

      The authors provide very compelling evidence that the lateral septum (LS) engages in theta cycle skipping.

      Strengths

      The data and analysis are highly compelling regarding the existence of cycle skipping.

      Weaknesses

      The manuscript falls short on in describing the behavioral or physiological importance of the witnessed theta cycle skipping, and there is a lack of attention to detail with some of the findings and figures:

      More/any description is needed in the article text to explain the switching task and the behavioral paradigm generally. This should be moved from only being in methods as it is essential for understanding the study.

      Following this suggestion, we have expanded the description of the behavioral tasks in the Results section.

      An explanation is needed as to how a cell can be theta skipping if it is not theta rhythmic.

      A cell that is purely theta skipping (i.e., always fires on alternating theta cycles and never on adjacent theta cycles) will only have enhanced power at half theta frequency and not at theta frequency. Such a cell will therefore not be considered theta rhythmic in our analysis. Note, however, that there is a large overlap between theta rhythmic and theta skipping cell populations in our data (Figure 3 - figure supplement 2), indicating that most cells are not purely theta skipping.

      The most interesting result, in my opinion, is the last paragraph of the entire results section, where there is more switching in the alternation task, but the reader is kind of left hanging as to how this relates to other findings. How does this relate to differences in decoding of relative arms (the correct or incorrect arm) during those theta cycles or to the animal's actual choice? Similarly, how does it relate to the animal's actual choice? Is this phenomenon actually behaviorally or physiologically meaningful at all? Does it contribute at all to any sort of planning or decision-making?

      We agree that the difference between the two behavioral tasks is very interesting. It may provide clues about the mechanisms that control the cycle-by-cycle expression of possible future paths and the potential impact of goal-directed planning and (recent) experience. In the revised manuscript, we have expanded the analysis of the differences in theta-cycle dynamics between the two behavioral tasks. First, we confirm the difference through a new quantification and statistical comparison. Second, we performed additional analyses to explore the idea that the alternation of non-local representations reflects the number of relevant paths available to the animal (Figure 11 – figure supplements 2 and 3), but this did not appear to be the case. However, these results provide a starting point for future studies to clarify the task dependence of the theta- cycle dynamics of spatial representations and to address the important question of behavioral/physiological relevance.

      The authors state that there is more cycle skipping in the alternation task than in the switching task, and that this switching occurs in the lead-up to the choice point. Then they say there is a higher peak at ~125 in the alternation task, which is consistent. However, in the final sentence, the authors note that "This result indicates that the representations of the goal arms alternate more strongly ahead of the choice point when animals performed a task in which either goal arm potentially leads to reward." Doesn't either arm potentially lead to a reward (but different amounts) in the switching task, not the alternation task? Yet switching is stronger in the alternation task, which is not constant and contradicts this last sentence.

      The reviewer is correct that both choices lead to (different amounts of) reward in the switching task. As written, the sentence that the reviewer refers to is indeed not accurate and we have rephrased it to: “This result indicates that the representations of the goal arms alternate more strongly ahead of the choice point when animals performed a task in which either goal arm potentially leads to a desirable high-value reward.”.

      Additionally, regarding the same sentence - "representations of the goal arms alternate more strongly ahead of the choice point when the animals performed a task in which either goal arm potentially leads to reward." - is this actually what is going on? Is there any reason at all to think this has anything to do with reward versus just a navigational choice?

      We appreciate the reviewer’s feedback and acknowledge that our statement needs clarification. At the choice point in the Y-maze there are two physical future paths available to the animal (disregarding the path that the animal took to reach the choice point) – we assume this is what the reviewer refers to as “a navigational choice”. One hypothesis could be that alternation of goal arm representations is present whenever there are multiple future paths available, irrespective of the animal’s (learned) preference to visit one or the other goal arm. However, the reduced alternation of goal arm representations in the switching task that we report, suggests that the animal’s recent history of goal arm visits and reward expectations likely do influence the theta-cycle representations ahead of the choice point. We have expanded our analysis to test if theta cycle dynamics differ for trials before and after a switch in reward contingency in the switching task, but there was no statistical difference in our data. We have rewritten and expanded this part of the results to make our point more clearly.

      Similarly, the authors mention several times that the LS links the HPC to 'reward' regions in the brain, and it has been found that the LS represents rewarded locations comparatively more than the hippocampus. How does this relate to their finding?

      Indeed, Wirtshafter and Wilson (2020) reported that lateral septum cells are more likely to have a place field close to a reward site than elsewhere in their double-sided T-maze. It is possible that this indicates a shift towards reward or value representations in the lateral septum. In our study we did not look at reward-biased cells and whether they are more or less likely to engage in theta cycle skipping. This could be a topic for future analyses. It should be noted that the study by Wirtshafter and Wilson (2020) reports that a reward bias was predominantly present for place fields in the direction of travel away from the reward site. These reward-proximate LS cells may thus contribute to theta-cycle skipping in the inbound direction, but it is not clear if these cells would be active during theta sweeps when approaching the choice point in the outbound direction.

      Reviewer #2 (Public Review)

      Summary

      Recent evidence indicates that cells of the navigation system representing different directions and whole spatial routes fire in a rhythmic alternation during 5-10 Hz (theta) network oscillation (Brandon et al., 2013, Kay et al., 2020). This phenomenon of theta cycle skipping was also reported in broader circuitry connecting the navigation system with the cognitive control regions (Jankowski et al., 2014, Tang et al., 2021). Yet nothing was known about the translation of these temporally separate representations to midbrain regions involved in reward processing as well as the hypothalamic regions, which integrate metabolic, visceral, and sensory signals with the descending signals from the forebrain to ensure adaptive control of innate behaviors (Carus-Cadavieco et al., 2017). The present work aimed to investigate theta cycle skipping and alternating representations of trajectories in the lateral septum, neurons of which receive inputs from a large number of CA1 and nearly all CA3 pyramidal cells (Risold and Swanson, 1995). While spatial firing has been reported in the lateral septum before (Leutgeb and Mizumori, 2002, Wirtshafter and Wilson, 2019), its dynamic aspects have remained elusive. The present study replicates the previous findings of theta-rhythmic neuronal activity in the lateral septum and reports a temporal alternation of spatial representations in this region, thus filling an important knowledge gap and significantly extending the understanding of the processing of spatial information in the brain. The lateral septum thus propagates the representations of alternative spatial behaviors to its efferent regions. The results can instruct further research of neural mechanisms supporting learning during goal-oriented navigation and decision-making in the behaviourally crucial circuits entailing the lateral septum.

      Strengths

      To this end, cutting-edge approaches for high-density monitoring of neuronal activity in freely behaving rodents and neural decoding were applied. Strengths of this work include comparisons of different anatomically and probably functionally distinct compartments of the lateral septum, innervated by different hippocampal domains and projecting to different parts of the hypothalamus; large neuronal datasets including many sessions with simultaneously recorded neurons; consequently, the rhythmic aspects of the spatial code could be directly revealed from the analysis of multiple spike trains, which were also used for decoding of spatial trajectories; and comparisons of the spatial coding between the two differently reinforced tasks.

      Weaknesses

      Possible in principle, with the present data across sessions, longitudinal analysis of the spatial coding during learning the task was not performed. Without using perturbation techniques, the present approach could not identify the aspects of the spatial code actually influencing the generation of behaviors by downstream regions.

      Reviewer #3 (Public Review)

      Summary

      Bzymek and Kloosterman carried out a complex experiment to determine the temporal spike dynamics of cells in the dorsal and intermediate lateral septum during the performance of a Y-maze spatial task. In this descriptive study, the authors aim to determine if inputting spatial and temporal dynamics of hippocampal cells carry over to the lateral septum, thereby presenting the possibility that this information could then be conveyed to other interconnected subcortical circuits. The authors are successful in these aims, demonstrating that the phenomenon of theta cycle skipping is present in cells of the lateral septum. This finding is a significant contribution to the field as it indicates the phenomenon is present in neocortex, hippocampus, and the subcortical hub of the lateral septal circuit. In effect, this discovery closes the circuit loop on theta cycle skipping between the interconnected regions of the entorhinal cortex, hippocampus, and lateral septum. Moreover, the authors make 2 additional findings: 1) There are differences in the degree of theta modulation and theta cycle skipping as a function of depth, between the dorsal and intermediate lateral septum; and 2) The significant proportion of lateral septum cells that exhibit theta cycle skipping, predominantly do so during 'non-local' spatial processing.

      Strengths

      The major strength of the study lies in its design, with 2 behavioral tasks within the Y-maze and a battery of established analyses drawn from prior studies that have established spatial and temporal firing patterns of entorhinal and hippocampal cells during these tasks. Primary among these analyses, is the ability to decode the animal's position relative to locations of increased spatial cognitive demand, such as the choice point before the goal arms. The presence of theta cycle skipping cells in the lateral septum is robust and has significant implications for the ability to dissect the generation and transfer of spatial routes to goals within and between the neocortex and subcortical neural circuits.

      Weaknesses

      There are no major discernable weaknesses in the study, yet the scope and mechanism of the theta cycle phenomenon remain to be placed in the context of other phenomena indicative of spatial processing independent of the animal's current position. An example of this would be the ensemble-level 'scan ahead' activity of hippocampal place cells (Gupta et al., 2012; Johnson & Redish, 2007). Given the extensive analytical demands of the study, it is understandable that the authors chose to limit the analyses to the spatial and burst firing dynamics of the septal cells rather than the phasic firing of septal action potentials relative to local theta oscillations or CA1 theta oscillations. Yet, one would ideally be able to link, rather than parse the phenomena of temporal dynamics. For example, Tingley et al recently showed that there was significant phase coding of action potentials in lateral septum cells relative to spatial location (Tingley & Buzsaki, 2018). This begs the question as to whether the non-uniform distribution of septal cell activity within the Y-maze may have a phasic firing component, as well as a theta cycle skipping component. If so, these phenomena could represent another means of information transfer within the spatial circuit during cognitive demands. Alternatively, these phenomena could be part of the same process, ultimately representing the coherent input of information from one region to another. Future experiments will therefore have to sort out whether theta cycle skipping, is a feature of either rate or phase coding, or perhaps both, depending on circuit and cognitive demands.

      The authors have achieved their aims of describing the temporal dynamics of the lateral septum, at both the dorsal extreme and the intermediate region. All conclusions are warranted.

      Reviewer #1 (Recommendations For The Authors)

      The text states: "We found that 39.7% of cells in the LSD and 32.4% of cells in LSI had significantly higher CSI values than expected by chance on at least one of the trajectories." The text in the supplemental figure indicates a p-value of 0.05 was used to determine significance. However, four trajectory categories are being examined so a Bonferroni correction should be used (significance at p<0.0125).

      Indeed, a p-value correction for multiple tests should be performed when determining theta cycle skipping behavior for each of the four trajectories. We thank the reviewer for pointing out this oversight. We have implemented a Holm-Sidak p-value correction for the number of tested trajectories per cell (excluding trajectories with insufficient spikes). As a consequence, the number of cells with significant cycle-skipping activity decreased, but overall the results have not changed.

      Figure 4 is very confusing as raster plots are displayed for multiple animals but it is unclear which animal the LFP refers to? The bottom of the plot is also referenced twice in the figure caption.

      We apologize for the confusion. We have removed this figure in the revised manuscript, as it was not necessary to make the point about the spatial distribution of theta cycle skipping. Instead, we show examples of spatially-resolved cycle skipping in Figure 4 (formerly Figure 5 - supplementary figures 1 and 2) and we have added a plot with the spatially-resolved cycle skipping index for all analyzed cells in Figure 5A.

      Figure 6 has, I think, an incorrect caption or figure. Only A and B are marked in the figure but A-G are mentioned in the caption but do not appear to correspond to anything in the figure.

      Indeed, the caption was outdated. This has now been corrected.

      Figure 8 is also confusing for several reasons: how is the probability scale on the right related to multiple semi-separate (top and middle) figures? In the top and bottom figures, it is not clear what the right and left sides refer to. It is also unclear why a probability of 0.25 is used for position (seems potentially low). The caption also mentions Figure A but there are no lettered "sub" figures in Figure 8.

      The color bar on the right applies to both the top plot (directional decoding) and the middle plot (positional decoding). However, the maximum probability that is represented by black differs between the top and middle plots. We acknowledge that a shared color bar may lead to confusion and we have given each of the plots a separate color bar.

      As for the maximum probability of 0.25 for position: this was a typo in the legend. The correct maximum value is 0.5. In general, the posterior probability will be distributed over multiple (often neighboring) spatial bins, and the distribution of maximum probabilities will depend on the number of spatial bins, the level of spatial smoothing in the decoding algorithm, and the amount of decodable information in the data. It would be more appropriate to consider the integrated probability over a small section of the maze, rather than the peak probability that is assigned to a single 5 cm bin. Also, note that a posterior probability of 0.5 is many times higher than the probability associated with a uniform distribution, which is in our case.

      The left and right sides of the plots represent two different journeys that the animal ran. On the left an outbound journey is shown, and on the right an inbound journey. We have improved the figure and the description in the legend to make this clearer.

      The reviewer is correct that there are no panels in Figure 8 and we have corrected the legend.

      Some minor concerns

      The introduction states that "a few studies have reported place cell-like activity in the lateral septum (Tingley and Buzsaki, 2018; Wirtshafter and Wilson, 2020, 2019)." However, notably and controversially, the Tingley study is one of the few studies to find NO place cell activity in the lateral septum. This is sort of mentioned later but the citation in this location should be removed.

      The reviewer is correct, Tingley and Buzsaki reported a spatial phase code but no spatial rate code. We have removed the citation.

      Stronger position/direction coding in the dLS consistent with prior studies and they should be cited in text (not a novel finding).

      Thank you for pointing out this omission. Indeed, a stronger spatial coding in the dorsal lateral septum has been reported before, for example by Van der Veldt et al. (2021). We now cite this paper when discussing these findings.

      Why is the alternation task administered for 30m but the switching task for 45m?

      The reason is that rats received a larger reward in the switching task (in the high-reward goal arm) and took longer to complete trials on average. To obtain a more-or-less similar number of trials per session in both tasks, we extended the duration of switching task sessions to 45 minutes. We have added this explanation to the text.

      Regarding the percentage of spatially modulated cells in the discussion, it is also worth pointing out that bits/sec information is consistent with previous studies.

      Thank you for the suggestion. We now point out that the spatial information in our data is consistent with previous studies.

      Reviewer #2 (Recommendations For The Authors)

      While the results of the study are robust and timely, further details of behavioural training, additional quantitative comparisons, and improvements in the data presentation would make the study more comprehensible and complete.

      Major comments

      (1) I could not fully comprehend the behavioural protocols. They require a clearer explanation of both the specific rationale of the two tasks as well as a more detailed presentation of the protocols. Specifically:

      (1.1) In the alternation task, were the arms baited in a random succession? How many trials were applied per session? Fig 1D: how could animals reach high choice accuracy if the baiting was random?

      We used a continuous version of the alternation task, in which the animals were rewarded for left→home→right and right→home→left visit sequences. In addition, animals were always rewarded on inbound journeys. There was no random baiting of goal arms. Perhaps the confusion stems from our use of the word “trial” to refer to a completed lap (i.e., a pair of outbound/inbound journeys). On average, animals performed 54 of such trials per 30-minute session in the alternation task. We have expanded the description of the behavioral tasks in the Results and further clarified these points in the Methods section.

      (1.2) Were they rewarded for correct inbound trials? If there was no reward, why were they considered correct?

      Yes, rats received a reward at the home platform for correct inbound trials. We have now explicitly stated this in the text.

      (1.3) In the switch alternation protocol, for how many trials was one arm kept more rewarding than the other, and how many trials followed after the rewarding value switch?

      A switch was triggered when rats (of their own volition) visited the high-reward goal arm eight times in a row. Following a switch, the animals could complete as many trials as necessary until they visited the new high- reward goal arm in eight consecutive trials, which triggered another switch. As can be seen in Figure 1D, at the population level, animals needed ~13 trials to fully commit to the high-reward goal arm following a switch. We have further clarified the switching task protocol in the Results and Methods sections.

      (1.4) What does the phrase "the opposite arm (as 8 consecutive visits)" exactly mean? Sounds like 8 consecutive visits signalled that the arm was rewarded (as if were not predefined in the protocol).

      The task is self-paced and the animals initially visit both goal arms, before developing a bias for the high- reward goal arm. A switch of reward size was triggered as soon as the animal visited the high-reward goal arm for eight consecutive trials. We have rewritten the description of the switching task protocol, including this sentence, which hopefully clarifies the procedure.

      (1.5) P. 15, 1st paragraph, Theta cycle skipping and alternation of spatial representations is more prominent in the alternation task. Why in the switching task, did rats visit the left and right arms approximately equally often if one was more rewarding than the other? How many switches were applied per recording session, and how many trials were there in total?

      Both the left and right goal arms were sampled more or less equally by the animals because both goal arms at various times were associated with a large reward following switches in reward values during sessions. The number of switches per session varied from 1 to 3. Sampling of both goal arms was also evident at the beginning of each session and following each reward value switch, before animals switched their behavior to the (new) highly rewarded goal arm. In Table 1, we have now listed the number of trials and the number of reward-value switches for all sessions.

      (1.6) Is the goal arm in figures the rewarded/highly rewarded arm only or are non-baited arms also considered here?

      Both left and right arms are considered goal arms and were included in the analyses, irrespective of the reward that was received (or not received).

      (2) The spatial navigation-centred behavioural study design and the interpretation of results highlight the importance of the dorsal hippocampal input to the LS. Yet, the recorded LSI cells are innervated by intermediate and ventral aspects of the hippocampus, and LS receives inputs from the amygdala and the prefrontal cortex, which together may together bring about - crucial for the adaptive behaviours regulated by the LS - reward, and reward-prediction-related aspects in the firing of LS cells during spatial navigation. Does success or failure to acquire reward in a trial modify spatial coding and cycle skipping of LSD vs. LSI cells in ensuing inbound and outbound trials?

      This is an excellent question and given the length of the current manuscript, we think that exploration of this question is best left for a future extension of our study.

      A related question: in Figure 10, it is interesting that cycle skipping is prominent in the goal arm for outbound switching trials and inbound trials of both tasks. Could it be analytically explained by task contingencies and behaviour (e.g. correct/incorrect trial, learning dynamics, running speed, or acceleration)?

      Our observation of cycle skipping at the single-cell level in the goal arms is somewhat surprising and, we agree with the reviewer, potentially interesting. However, it was not accompanied by alternation of representations at the population level. Given the current focus and length of the manuscript, we think further investigation of cycle skipping in the goal arm is better left for future analyses.

      (3) Regarding possible cellular and circuit mechanisms of cycle skipping and their relation to the alternating representations in the LS. Recent history of spiking influences the discharge probability; e.g. complex spike bursts in the hippocampus are associated with a post-burst delay of spiking. In LS, cycle skipping was characteristic for LS cells with high firing rates and was not uniformly present in all trajectories and arms. The authors propose that cycle skipping can be more pronounced in epochs of reduced firing, yet the opposite seems also possible - this phenomenon can be due to an intermittently increased drive onto some LS cells. Was there a systematic relationship between cycle skipping in a given cell and the concurrent firing rate or a recent discharge with short interspike intervals?

      In our discussion, we tried to explain the presence of theta cycle skipping in the goal arms at the single-cell level without corresponding alternation dynamics at the population level. We mentioned the possibility of a decrease in excitatory drive. As the reviewer suggests, an increase in excitatory drive combined with post- burst suppression or delay of spiking is an alternative explanation. We analyzed the spatial tuning of cells with theta cycle skipping and found that, on average, these cells have a higher firing rate in the goal arm than the stem of the maze in both outbound and inbound run directions (Figure 5 – figure supplement 1). In contrast, cells that do not display theta cycle skipping do not show increased firing in the goal arm. These results are more consistent with the reviewer’s suggested mechanism and we have updated the discussion accordingly.

      (4) Were the differences between the theta modulation (cycle skipping) of local vs. non-local representations (P.14, line 10-12, "In contrast...", Figure 9A) and between alternation vs. switching tasks (Figure 10 C,D) significantly different?

      We have added quantification and statistical comparisons for the auto- and cross-correlations of the local/non-local representations. The results indeed show significantly stronger theta cycle skipping of the non-local representations as compared to the local representations (Figure 10 - figure supplement 1A), a stronger alternation of non-local representations in the outbound direction (Figure 10 - figure supplement 1B), and significant differences between the two tasks (Figure 11E,F).

      (5) Regarding the possibility of prospective coding in LS, is the accurate coding of run direction not consistent with prospective coding? Can the direction be decoded from the neural activity in the start arm? Are the cycling representations of the upcoming arms near the choice point equally likely or preferential for the then- selected arm?

      The coding of run direction (outbound or inbound) is distinct from the prospective/retrospective coding of the goal arm. As implemented, the directional decoding model does not differentiate between the two goal arms and accurate decoding of direction with this model can not inform us whether or not there is prospective (or retrospective) coding. To address the reviewer’s comments, we performed two additional analyses. First, we analyzed the directional (outbound/inbound) decoding performance as a function of location in the maze (Figure 6 - figure supplement 3E). The results show that directional decoding performance is high in both stem and goal arms. Second, we analyzed how well we can predict the trajectory type (i.e., to/from the left or right goal arm) as a function of location in the maze, and separately for outbound and inbound trajectories (Figure 6 - figure supplement 3C,D). The results show that on outbound journeys, decoding the future goal arm is close to chance when the animals are running along the stem. The decoding performance goes up around the choice point and reaches the highest level when animals are in the goal arm.

      (6) Figure 10 seems to show the same or similar data as Figures 5 (A,B) and 9 (C,D).

      Figure 10 (figure 11 in revised manuscript) re-analyzes the same data as presented in Figures 5 and 9, but separates the experimental sessions according to the behavioral task. We now explicitly state this.

      Minor comments

      (1) If cycle skipping in the periodicity of non-local representations was more prominent in alternation than in the switching task, one might expect them to be also prominent in early trials of the switching task, when the preference of a more rewarding arm is not yet established. Was this the case?

      The reviewer makes an interesting suggestion. Indeed, if theta cycle skipping and the alternation of non-local representations reflect that there are multiple paths that the animal is considering, one may predict that the theta skipping dynamics are similar between the two tasks in early trials (as the reviewer suggests). Similarly, one may predict that in the switching task, the alternation of non-local representations is weaker immediately before a reward contingency switch (when the animal has developed a bias towards the goal arm with a large reward) as compared to after the switch.

      We have now quantified the theta cycle dynamics of spatial representations in the early trials in each session of both tasks (Figure 11 - figure supplement 2) and in the trials before and after each switch in the switching task (Figure 11 - figure supplement 3).

      The results of the early trial analysis indicate stronger alternation of non-local representations in the alternation task than in the switching task (consistent with the whole session analysis), which is contrary to the prediction.

      The pre-/post-switch analysis did not reveal a significant difference between the trials before and after a reward contingency switch. If anything, there was a trend towards stronger theta cycle skipping/alternation in the trials before a switch, which would be opposite to the prediction.

      These results do not appear to support the idea that the alternation of non-local representations reflects the number of relevant paths available to the animal. We have updated the text to incorporate these new data and discuss the implications.

      (2) Summary: sounds like the encoding of spatial information and its readout in the efferent regions are equally well established.

      Thank you for pointing this out.

      (3) Summary: "motivation and reward processing centers such as the ventral tegmental area." How about also mentioning here the hypothalamus, which is a more prominent output of the lateral septum than the VTA?

      We have now also mentioned the hypothalamus.

      (4) "lateral septum may contribute to the hippocampal theta" - readers not familiar with details of the medial vs. lateral septum research may misinterpret the modest role of LS in theta compared to MS.

      We have added “in addition to the strong theta drive originating from the medial septum” to make clear that the lateral septum has a modest role in hippocampal theta generation.

      (5) "(Tingley and Buzsáki, 2018) found a lack of spatial rate coding in the lateral septum and instead reported a place coding by specific phases of the hippocampal theta rhythm (Rizzi-Wise and Wang, 2021) " needs rephrasing.

      Thank you, we have rephrased the sentence.

      (6) Figure 4 is a bit hard to generalize. The authors may additionally consider a sorted raster presentation of the dataset in this main figure.

      We have removed this figure in the revised manuscript, as it was not necessary to make the point about the location of theta cycle skipping. Instead, we show examples of spatially-resolved cycle skipping in Figure 4 (formerly Figure 5 - supplementary figures 1 and 2), and, following the reviewer’s suggestion, we have added a plot with the spatially-resolved cycle skipping index for all analyzed cells (Figure 5A).

      (7) It would help if legends of Figure 5 (and related supplementary figures) state in which of the two tasks the data was acquired, as it is done for Figure 10.

      Thank you for the suggestion. The legends of Figure 4A,B (formerly Figure 5 – supplemental figures 1 and 2) and Figure 5 now include in which behavioral task the data was acquired.

      (8) Page 10, "Spatial coding...", 1st Citing the initial report by Leugeb and Mizumori would be appropriate here too.

      The reviewer is correct. We have added the citation.

      (9) The legend in Figure 6 (panels A-G) does not match the figure (only panels A,B). What is shown in Fig. 6B, the legend does not seem to fully match.

      Indeed, the legend was outdated. This has now been corrected.

      (10) 7 suppl., if extended to enable comparisons, could be a main figure. Presently, Figure 7C does not account for the confounding effect of population size and is therefore difficult to interpret without complex comparisons with the Supplementary Figure which is revealing per se.

      We thank the reviewer for their suggestion. We have changed Figure 7 such that it only shows the analysis of decoding performed with all LSD and LSI cells. Figure 7 – supplemental figure 1 has been transformed into main Figure 8, with the addition of a panel to show a statistical comparison between decoding performance in LSD and LSI with a fixed number of cells.

      (11) 14, line 10 there is no Figure 8A

      This has been corrected.

      (12) 15 paragraph 1, is the discussed here model the one from Kay et al?

      From Kay et al. (2020) and also Wang et al. (2020). We have added the citations.

      (13) Figure 5 - Figure Supplement 1 presents a nice analysis that, in my view, can merit a main figure. I could not find the description of the colour code in CSI panels, does grey/red refer to non/significant points?

      Indeed, grey/red refers to non-significant points and significant points respectively. We have clarified the color code in the figure legend. Following the reviewer’s suggestion, we have made Figure 5 Supplement 1 and 2 a main figure (Figure 4).

      (14) Figure 5 -Figure Supplement 2. Half of the cells (255 and 549) seems not to be representative of the typically high SCI in the goal arm in left and right inbound trials combined (Figure 5 A). Were the changes in CSI in the right and left inbound trials similar enough to be combined in Fig 5A? Otherwise, considering left and right inbound runs separately and trying to explain where the differences come from would seem to make sense.

      Figure 5 – figure supplement 2 is now part of the new main Figure 4. Originally, the examples were from a single session and the same cells as shown in the old Figure 4. However, since the old Figure 4 has been removed, we have selected examples from different sessions and both left/right trajectories that are more representative of the overall distribution. We have further added a plot with the spatially-resolved cycle skipping for all analyzed cells in Figure 5A.

      (15) In the second paragraph of the Discussion, dorso-ventral topography of hippocampal projections to the LS (Risold and Swanson, Science, 90s) could be more explicitly stated here.

      Thank you for the suggestion. We have now explicitly mentioned the dorsal-ventral topography of hippocampal-lateral septum projections and cite Risold & Swanson (1997).

      (16) Discussion point: why do the differences in spatial information of cells in the ventral/intermediate vs. dorsal hippocampus not translate into similarly prominent differences in LSI vs. LSD?

      In our data, we do observe clear differences in spatial coding between LSD and LSI. Specifically, cell activity in the LSD is more directional, has higher goal arm selectivity, and higher spatial information (we have now added statistical comparisons to Figure 6 – figure supplement 1). As a result, spatial decoding performance is much better for LSD cell populations than LSI cell populations (see updated Figure 8, with statistical comparison of decoding performance). Spatial coding in the LS is not as strong as in the hippocampus, likely because of the convergence of hippocampal inputs, which may give the impression of a less prominent difference between the two subregions.

      (17) Discussion, last paragraph: citation of the few original anatomical and neurophysiological studies would be fitting here, in addition to the recent review article.

      Thank you for the suggestion. We have added selected citations of the original literature.

      (18) Methods, what was the reference electrode?

      We used an external reference electrode that was soldered to a skull screw, which was positioned above the cerebellum. We have added this to the Methods section.

      (19) Methods, Theta cycle skipping: bandwidth = gaussian kerner parameter?

      The bandwidth is indeed a parameter of the Gaussian smoothing kernel and is equal to the standard deviation.

      Reviewer #3 (Recommendations For The Authors)

      Below I offer a short list of minor comments and suggestions that may benefit the manuscript.

      (A) I was not able to access the Open Science Framework Repository. Can this be rectified?

      Thank you for checking the OSF repository. The data and analysis code are now publicly available.

      (B) In the discussion the authors should attempt to flesh out whether they can place theta cycle skipping into context with left/right sweeps or scan ahead phenomena, as shown in the Redish lab.

      Thank you for the excellent suggestion. We have now added a discussion of the possible link between theta cycle skipping and the previously reported scan-ahead theta sweeps.

      (C) What is the mechanism of cycle skipping? This could be relevant to intrinsic vs network oscillator models. Reference should also be made to the Deshmukh model of interference between theta and delta (Deshmukh, Yoganarasimha, Voicu, & Knierim, 2010).

      We had discussed a potential mechanism in the discussion (2nd to last paragraph in the revised manuscript), which now includes a citation of a recent computational study (Chu et al., 2023). We have now also added a reference to the interference model in Deshmukh et al, 2010.

      (D) Little background was given for the motivation and expectation for potential differences between the comparison of the dorsal and intermediate lateral septum. I don't believe that this is the same as the dorsal/ventral axis of the hippocampus, but if there's a physiological justification, the authors need to make it.

      We have added a paragraph to the introduction to explain the anatomical and physiological differences across the lateral septum subregions that provide our rationale for comparing dorsal and intermediate lateral septum (we excluded the ventral lateral septum because the number of cells recorded in this region was too low).

      (E) It would help to label "outbound" and "inbound" on several of the figures. All axes need to be labeled, with appropriate units indicated.

      We have carefully checked the figures and added inbound/outbound labels and axes labels where appropriate.

      (F) In Figure 6, the legend doesn't match the figure.

      Indeed, the legend was outdated. This has now been corrected.

      (G) The firing rate was non-uniform across the Y-maze. Does this mean that the cells tended to fire more in specific positions of the maze? If so, how would this affect the result? Would increased theta cycle skipping at the choice point translate to a lower firing rate at the choice point? Perhaps less overdispersion of the firing rate (Fenton et al., 2010)?

      Individual cells indeed show a non-uniform firing rate across the maze. To address the reviewer’s comment and test if theta cycle skipping cells were active preferentially near the choice point or other locations, we computed the mean-corrected spatial tuning curves for cell-trajectory pairs with and without significant theta cycle skipping. This additional analysis indicates that, on average, the population of theta cycle skipping cells showed a higher firing rate in the goal arms than in the stem of the maze as compared to non-skipping cells for outbound and inbound directions (shown in Figure 5 - figure supplement 1).

      (H) As mentioned above, it could be helpful to look at phase preference. Was there an increased phase preference at the choice point? Would half-cycle firing correlate with an increased or decreased phase preference? Based on prior work, one would expect increased phase preference, at least in CA1, at the choice point (Schomburg et al., 2014). In contrast, other work might predict phasic preference according to spatial location (Tingley & Buzsaki, 2018). Including phase analyses is a suggestion, of course. The manuscript is already sufficiently novel and informative. Yet, the authors should state why phase was not analyzed and that these questions remain for follow-up analyses. If the authors did analyze this and found negative results, it should be included in this manuscript.

      We thank the reviewer for their suggestion. We have not yet analyzed the theta phase preference of lateral septum cells or other relations to the theta phase. We agree that this would be a valuable extension of our work, but prefer to leave it for future analyses.

      (I) One of the most important aspects of the manuscript, is that there is now evidence of theta cycle skipping in the circuit loop between the EC, CA1, and LS. This now creates a foundation for circuit-based studies that could dissect the origin of route planning. Perhaps the authors should state this? In the same line of thinking, how would one determine whether theta cycle skipping is necessary for route planning as opposed to a byproduct of route planning? While this question is extremely complex, other studies have shown that spatial navigation and memory are still possible during the optogenetic manipulation of septal oscillations (Mouchati, Kloc, Holmes, White, & Barry, 2020; Quirk et al., 2021). However, pharmacological perturbation or lesioning of septal activity can have a more profound effect on spatial navigation (Bolding, Ferbinteanu, Fox, & Muller, 2019; Winson, 1978). As a descriptive study, I think it would be helpful to remind the readers of these basic concepts.

      We thank the reviewer for their comment and for pointing out possible future directions for linking theta cycle skipping to route planning. Experimental manipulations to directly test this link would be very challenging, but worthwhile to pursue. We now mention how circuit-based studies may help to test if theta cycle skipping in the broader subcortical-cortical network is necessary for route planning. Given that the discussion is already quite long, we decided to omit a more detailed discussion of the possible role of the medial septum (which is the focus of the papers cited by the reviewer).

      Very minor points

      (A) In the introduction, "one study" begins the sentence but there is a second reference.

      Thank you, we have rephrased the sentence.

      (B) Also in the introduction, it could be helpful to have an operational definition of theta cycle skipping (i.e., 'enhanced rhythmicity at half theta frequency').

      We followed the reviewer’s suggestion.

      (C) The others should be more explicit in the introduction about their main question. Theta cycle skipping exists in CA1, and then import some of the explanations mentioned in the discussion to the introduction (i.e., attractors states of multiple routes). The main question is then whether this phenomenon, and others from CA1, translate to the output in LS.

      We have edited the introduction to more clearly state the main question of our study, following the suggestion from the reviewer.

      (D) There are a few instances of extra closing parentheses.

      We checked the text but did not find instances of erroneous extra closing parentheses. There are instances of nested parentheses, which may have given the impression that closing parentheses were duplicated.

      (E) The first paragraph of the Discussion lacks sufficient references.

      We have now added references to the first paragraph of the discussion.

      (F) At the end of the 2nd paragraph in the Discussion, the comparison is missing. More than what? It's not until the next reference that one can assume that the authors are referring to a dorsal/ventral axis. However, the physiological motivation for this comparison is lacking. Why would one expect a dorsal/intermediate continuum for theta modulation as there is along the dorsal/ventral axis of the hippocampus?

      Thank you for spotting this omission. We have rewritten the paragraph to more clearly make the parallel between dorsal-ventral gradients in the lateral septum and hippocampus and how this relates to the topographical connections between the two structures.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Recommendations For The Authors):

      In this revision the authors address some of the key concerns, including clarification of the balanced nature of the RL driven pitch changes and conducting analyses to control for the possible effects of singing quantity on their results. The paper is much improved but still has some sources of confusion, especially around Fig. 4, that should be fixed. The authors also start the paper with a statistically underpowered minor claim that seems unnecessary in the context of the major finding. I recommend the authors may want to restructure their results section to focus on the major points backed by sufficient n and stats.

      Major issues.

      (1) The results section begins very weak - a negative result based on n=2 birds and then a technical mistake of tube clogging re-spun as an opportunity to peak at intermittent song in the otherwise muted birds. The logic may be sound but these issues detract from the main experiment, result, analysis, and interpretation. I recommend re-writing this section to home in on, from the outset, the well-powered results. How much is really gained from the n=2 birds that were muted before ANY experience? These negative results may not provide enough data to make a claim. Nor is this claim necessary to motivate what was done in the next 6 birds. I recommend dropping the claim?

      We thank the reviewer for the recommendation. We moved the information to the Methods.

      (2) Fig. 4 is very important yet remains very confusing, as detailed below.

      Fig. 4a. Can the authors clarify if the cohort of WNd birds that give rise to the positive result in Fig 4 ever experienced the mismatch in the absence of ongoing DAF reinforcement pre-deafening? Fig4a does nor the next clearly specifies this. This is important because we know that there are day timescale delays in LMAN-dependent bias away from DAF and consolidation into the HVC-RA pathway (Andalman and Fee, 2009). Thus, if birds experienced mismatch pre-deafening in the absence of DAF, then an earnly learning phase in Area X could be set in place. Then deafening occurs, but these weight changes in X could result in LMAN bias that expresses only days later -independent of auditory feedback. Such a process would not require an internal model as the authors are arguing for here. It would simply arise from delays in implementing reinforcement-driven feedback. If the birds in Fig 4 always had DAF on before deafening, then this is not an issue. But if the birds had hours of singing with DAF off before deafening, and therefore had the opportunity to associate DA error signals with the targeted time in the song (e.g. pauses on the far-from-target renditions (Duffy et al, 2022), then the return-to-baseline would be expected to be set in place independent of auditory feedback. Please clarify exactly if the pitch-contingent DAF was on or off in the WNd cohort in the hours before deafening. In Fig. 3b it looks like the answer is yes but I cannot find this clearly stated in the text.

      We did not provide DAF-free singing experience to the birds in Fig. 4 before deafening. Thus, according to the reviewer, the concern does not apply.

      Note that we disagree with the reviewer’s premise that there is ‘day timescale delay in LMAN-dependent bias away from DAF and consolidation into the HVC-RA pathway’. More recent data reveals immediate consolidation of the anterior forebrain bias without a night-time effect (Kollmorgen, Hahnloser, Mante 2020; Tachibana, Lee, Kai, Kojima 2022). Thus, the single bird in (Andalman and Fee 2009) seems to be somewhat of an outlier.

      Hearing birds can experience the mismatch regardless of whether they experience DAF-free singing (provided their song was sufficiently shifted): even the renditions followed by white noise can be assessed with regards to their pitch mismatch, so that DAF imposes no limitation on mismatch assessment.

      We disagree with their claim that no internal model would be needed in case consolidation was delayed in Area X. If indeed, Area X stores the needed change and it takes time to implement this change in LMAN, then we would interpret the change in Area X as the plan that birds would be able to implement without auditory feedback. Because pitch can either revert (after DAF stops) or shift further away (when DAF is still present), there is no rigid delay that is involved in recovering the target, but a flexible decision making of implementing the plan, which in our view amounts to using a model.

      Fig 4b. Early and Late colored dots in legend are both red; late should be yellow? Perhaps use colors that are more distinct - this may be an issue of my screen but the two colors are difficult to discern.

      We used colors yellow to red to distinguish different birds and not early and late. We modified the markers to improve visual clarity: Early is indicated with round markers and late with crosses.

      Fig 4b. R, E, and L phases are only plotted for 4c; not in 4b. But the figure legend says that R, E and L are on both panels.

      In Fig. 4b E and L are marked with markers because they are different for different birds. In Fig. 4c the phases are the same for all birds and thus we labeled them on top. We additionally marked R in Fig. 4b as in Fig. 4c.

      Fig 4e. Did the color code switch? In the rest of Fig 4, DLO is red and WND is blue. Then in 4e it swaps. Is this a typo in the caption? Or are the colors switch? Please fix this it's very confusing.

      Thank you for pointing out the typo in the caption. We corrected it.

      The y axes in Fig 4d-e are both in std of pitch change - yet they have different ylim which make it visually difficult to compare by eye. Is there a reason for this? Can the authors make the ylim the same for fig 4d-e?.

      We added dashed lines to clarify the difference in ylim.

      Fig 4d-3 is really the main positive finding of the paper. Can the others show an example bird that showcases this positive result, plotted as in Fig 3b? This will help the audience clearly visualize the raw data that go into the d' analyses and get a more intuitive sense of the magnitude of the positive result.

      We added example birds to figure 4, one for WNd and one for dLO.

      Please define 'late' in Fig.4 legend.

      Done

      Minor

      Define NRP In the text with an example. Is an NRP of 100 where the birds was before the withdrawal of reinforcement?

      We added the sentence to the results:

      "We quantified recovery in terms of 𝑵𝑹𝑷 to discount for differences in the amount of initial pitch shift where 𝑵𝑹𝑷 = 𝟎% corresponds to complete recovery and 𝑵𝑹𝑷 = 𝟏𝟎𝟎% corresponds pitch values before withdrawal of reinforcement (R) and thus no recovery."

      Reviewer #3 (Recommendations For The Authors):

      The use of "hierarchically lower" to refer to the flexible process is confusing to me, and possibly to many readers. Some people think of flexible, top-down processes as being _higher_ in a hierarchy. Regardless, it doesn't seem important, in this paper, to label the processes in a hierarchy, so perhaps avoid using that terminology.

      We reformulated the paragraph using ‘nested processes’ instead of hierarchical processes.

      In the statement "a seeming analogous task to re-pitching of zebra finch song, in humans, is to modify developmentally learned speech patterns", a few suggestions: it is not clear whether "re-pitching" refers to planning or feedback-dependent learning (I didn't see it introduced anywhere else). And if this means planning, then it is not clear why this would be analogous to "humans modifying developmentally learned speech patterns". As you mentioned, humans are more flexible at planning, so it seems re-pitching would _not_ be analogous (or is this referring to the less flexible modification of accents?).

      We changed the sentence to:

      "Thus, a seeming analogous task to feedback-dependent learning of zebra finch song, in humans, is to modify developmentally learned speech patterns."

    1. Reviewer #2 (Public Review):

      Summary:

      The physiology and behaviour of animals are regulated by a huge variety of neuropeptide signalling systems. In this paper, the authors focus on the neuropeptide ion transport peptide (ITP), which was first identified and named on account of its effects on the locust hindgut (Audsley et al. 1992). Using Drosophila as an experimental model, the authors have mapped the expression of three different isoforms of ITP (Figures 1, S1, and S2), all of which are encoded by the same gene.

      The authors then investigated candidate receptors for isoforms of ITP. Firstly, Drosophila orthologs of G-protein coupled receptors (GPCRs) that have been reported to act as receptors for ITPa or ITPL in the insect Bombyx mori were investigated. Importantly, the authors report that ITPa does not act as a ligand for the GPCRs TkR99D and PK2-R1 (Figure S3). Therefore, the authors investigated other putative receptors for ITPs. Informed by a previously reported finding that ITP-type peptides cause an increase in cGMP levels in cells/tissues (Dircksen, 2009, Nagai et al., 2014), the authors investigated guanylyl cyclases as candidate receptors for ITPs. In particular, the authors suggest that Gyc76C may act as an ITP receptor in Drosophila.

      Evidence that Gyc76C may be involved in mediating effects of ITP in Bombyx was first reported by Nagai et al. (2014) and here the authors present further evidence, based on a proposed concordance in the phylogenetic distribution ITP-type neuropeptides and Gyc76C (Figure 2). Having performed detailed mapping of the expression of Gyc76C in Drosophila (Figures 3, S4, S5, S6), the authors then investigated if Gyc76C knockdown affects the bioactivity of ITPa in Drosophila. The inhibitory effect of ITPa on leucokinin- and diuretic hormone-31-stimulated fluid secretion from Malpighian tubules was found to be abolished when expression of Gyc76C was knocked down in stellate cells and principal cells, respectively (Figure 4). However, as discussed below, this does not provide proof that Gyc76C directly mediates the effect of ITPa by acting as its receptor. The effect of Gyc76C knockdown on the action of ITPa could be an indirect consequence of an alteration in cGMP signalling.

      Having investigated the proposed mechanism of ITPa in Drosophila, the authors then investigated its physiological roles at a systemic level. In Figure 5 the authors present evidence that ITPa is released during desiccation and accordingly, overexpression of ITPa increases survival when animals are subjected to desiccation. Furthermore, knockdown of Gyc76C in stellate or principal cells of Malphigian tubules decreases survival when animals are subject to desiccation. However, whilst this is correlative, it does not prove that Gyc76C mediates the effects of ITPa. The authors investigated the effects of knockdown of Gyc76C in stellate or principal cells of Malphigian tubules on i). survival when animals are subject to salt stress and ii). time taken to recover from of chill coma. It is not clear, however, why animals over-expressing ITPa were also not tested for its effect on i). survival when animals are subject to salt stress and ii). time taken to recover from of chill coma. In Figures 6 and S8, the authors show the effects of Gyc76C knockdown in the female fat body on metabolism, feeding-associated behaviours and locomotor activity, which are interesting. Furthermore, the relevance of the phenotypes observed to potential in vivo actions of ITPa is explored in Figure 7. The authors conclude that "increased ITPa signaling results in phenotypes that largely mirror those seen following Gyc76C knockdown in the fat body, providing further support that ITPa mediates its effects via Gyc76C." Use of the term "largely mirror" seems inappropriate here because there are opposing effects- e.g. decreased starvation resistance in Figure 6A versus increased starvation resistance in Figure 7A. Furthermore, as discussed above, the results of these experiments do not prove that the effects of ITPa are mediated by Gyc76C because the effects reported here could be correlative, rather than causative.

      Lastly, in Figures 8, S9, and S10 the authors analyse publicly available connectomic data and single-cell transcriptomic data to identify putative inputs and outputs of ITPa-expressing neurons. These data are a valuable addition to our knowledge ITPa expressing neurons; but they do not address the core hypothesis of this paper - namely that Gyc76C acts as an ITPa receptor.

      Strengths:

      (1) The main strengths of this paper are i) the detailed analysis of the expression and actions of ITP and the phenotypic consequences of over-expression of ITPa in Drosophila. ii). the detailed analysis of the expression of Gyc76C and the phenotypic consequences of knockdown of Gyc76C expression in Drosophila.

      (2) Furthermore, the paper is generally well-written and the figures are of good quality.

      Weaknesses:

      (1) The main weakness of this paper is that the data obtained do not prove that Gyc76C acts as a receptor for ITPa. Therefore, the following statement in the abstract is premature: "Using a phylogenetic-driven approach and the ex vivo secretion assay, we identified and functionally characterized Gyc76C, a membrane guanylate cyclase, as an elusive Drosophila ITPa receptor." Further experimental studies are needed to determine if Gyc76C acts as a receptor for ITPa. In the section of the paper headed "Limitations of the study", the authors recognise this weakness. They state "While our phylogenetic analysis, anatomical mapping, and ex vivo and in vivo functional studies all indicate that Gyc76C functions as an ITPa receptor in Drosophila, we were unable to verify that ITPa directly binds to Gyc76C. This was largely due to the lack of a robust and sensitive reporter system to monitor mGC activation." It is not clear what the authors mean by "the lack of a robust and sensitive reporter system to monitor mGC activation". The discovery of mGCs as receptors for ANP in mammals was dependent on the use of assays that measure GC activity in cells (e.g. by measuring cGMP levels in cells). Furthermore, more recently cGMP reporters have been developed. The use of such assays is needed here to investigate directly whether Gyc76C acts as a receptor for ITPa. In summary, insufficient evidence has been obtained to conclude that Gyc76C acts as a receptor for ITPa. Therefore, I think there are two ways forward, either:<br /> (a) The authors obtain additional biochemical evidence that ITPa is a ligand for Gyc76C.<br /> or<br /> (b) The authors substantially revise the conclusions of the paper (in the title, abstract, and throughout the paper) to state that Gyc76C MAY act as a receptor for ITPa, but that additional experiments are needed to prove this.

      (2) The authors state in the abstract that a phylogenetic-driven approach led to their identification of Gyc76C as a candidate receptor for ITPa. However, there are weaknesses in this claim. Firstly, because the hypothesis that Gyc76C may be involved in mediating effects of ITPa was first proposed ten years ago by Nagai et al. 2014, so this surely was the primary basis for investigating this protein. Nevertheless, investigating if there is correspondence in the phylogenetic distribution of ITP-type and Gyc76C-type genes/proteins is a valuable approach to addressing this issue. Unfortunately, the evidence presented is rather limited in scope. Essentially, the authors report that they only found ITP-type and Gyc76C-type genes/proteins in protostomes, but not in deuterostomes. What is needed is a more fine-grained analysis at the species level within the protostomes. Thus, are there protostome species in which both ITP-type and Gyc76C-type genes/proteins have been lost? Furthermore, are there any protostome species in which an ITP-type gene is present but an Gyc76C-type gene is absent, or vice versa? If there are protostome species in which an ITP-type gene is present but a Gyc76C-type gene is absent or vice versa, this would argue against Gyc76C being a receptor for ITPa. In this regard, it is noteworthy that in Figure 2A there are two ITP-type precursors in C. elegans, but there are no Gyc76C-type proteins shown in the tree in Figure 2B. Thus, what is needed is a more detailed analysis of protostomes to investigate if there really is correspondence in the phylogenetic distribution of Gyc76C-type and ITP-type genes at the species level.

      (3) The manuscript would benefit from a more comprehensive overview and discussion of published literature on Gyc76C in Drosophila, both as a basis for this study and for interpretation of the findings of this study.

    1. Author response:

      We thank eLife and the reviewers for the thoughtful summary and valuable review of our manuscript. We largely agree with the summary and review and have provided our responses to the comments below. We believe BADGER is a significant new tool for identifying associated risk factors for complex diseases, and the associations we observed in the analysis provide insights into the genetic basis of Alzheimer's disease.

      Reviewer #1 (Public Review):

      The major aim of the paper was a method for determining genetic associations between two traits using common variants tested in genome-wide association studies. The work includes a software implementation and application of their approach. The results of the application of their method generally agree with what others have seen using similar AD and UKB data.

      The paper has several distinct portions. The first is a method for testing genetic associations between two or more traits using genome-wide association tests statistics. The second is a python implementation of the method. The last portion is the results of their method using GWAS from AD and UK Biobank.

      We thank the reviewer for the conclusion and positive comments.

      Regarding the method, it seems like it has similarities to LDSC, and it is not clear how it differs from LDSC or other similar methods. The implementation of the method used python 2.7 (or at least was reportedly tested using that version) that was retired in 2020. The implementation was committed between Wed Oct 3 15:21:49 2018 to Mon Jan 28 09:18:09 2019 using data that existed at the time so it was a bit surprising it used python 2.7 since it was initially going to be set for end-of-life in 2015. Anyway, trying to run the package resulted in unmet dependency errors, which I think are related to an internal package not getting installed. I would expect that published software could be installed using standard tooling for the language, and, ideally, software should have automated testing of key portions.

      We thank the reviewer for their comments. To clarify, the primary difference between our proposed method, BADGERS, and LDSC lies in their respective objectives and applications. LDSC is designed to estimate heritability and genetic correlations between traits by utilizing GWAS summary statistics, thereby aiding in the elucidation of the genetic architecture of complex traits and diseases. Conversely, BADGERS is specifically developed to explore causal relationships between risk factors, such as biomarkers, and diseases of interest. It employs genetic variants as variables to deduce causality, thereby addressing the challenges of confounding and reverse causation that are common in observational studies. Although BADGERS utilizes the LD reference panel derived from LDSC, the LD reference panel is used to obtain the predicted trait expression. The ultimate goal is to focus on linking biobank traits with Alzheimer’s disease and building causal relationships instead of identifying genetic architecture.

      Regarding the technical aspects mentioned, we acknowledge the concerns about the use of Python 2.7 and the issues encountered during the package installation. We are in the process of updating the software to ensure compatibility with current versions of Python and to enhance the installation process with standard tooling and automated testing for a more user-friendly experience. We have provided tests for each portion of the software so the user can test if the software is working properly.

      Regarding the main results, they find what has largely been shown by others using the same data or similar data, which add prima facie validity to the work The portions of the work dealing with AD subgroups, pathology, biomarkers, and cognitive traits of interest. I was puzzled why the authors suggested surprise regarding parental history and high cholesterol not associated with MCI or cognitive composite scores since the this would seem like the likely fallout of selection of the WRAP cohort. The discussion paragraph that started "What's more, environmental factors may play a big role in the identified associations." confused me. I think what the authors are referring to are how selection, especially in a biobank dataset, can induce correlations, which is not what I think of as an environmental effect.

      We thank the reviewer very much for their comment. We're glad that our findings align with existing research using similar data, increasing the validity of our work and the proposed BADGER algorithm. Your point about the lack of association between parental history, high cholesterol, and mild cognitive impairment (MCI) or cognitive composite scores in the WRAP cohort is well-taken. We agree that the selection criteria of the WRAP cohort may influence these findings, as it consists of individuals with a specific risk profile for Alzheimer's disease. This selection could indeed mitigate the observed association between these factors and cognitive outcomes, which we initially found surprising.

      Regarding the environmental factors, we appreciate your clarification and understand the confusion. Our intention was to discuss the potential for selection bias and confounding factors in biobank datasets for the identified associations, which might not necessarily be direct environmental effects.

      Overall, the work has merit, but I am left without a clear impression of the improvement in the approach over similar methods. Likewise, the results are interesting, but similar findings are described with the data that was used in the study, which are over 5 years old at the time of this review.

      We thank the reviewer a lot for their endorsement of the BADGER framework. We believe that our method, BADGER, improves on existing approaches by effectively linking genetic data with the detailed phenotypic information in biobanks and large disease GWAS. This enhances our ability to detect associations without needing individual-level data, offering clearer insights while reducing issues like reverse causality and confounding factors.

      Even though the IGAP dataset is over five years old, it remains one of the largest publicly available datasets for Alzheimer’s Disease. Likewise, the UK biobank is one of the largest publicly available human traits datasets, which researchers continue to use. These datasets' continued utility demonstrates their value in the research community. Additionally, the versatility of the BADGER framework makes it suitable for future research investigating the relationship between human traits and various diseases using different datasets.

      Reviewer #2 (Public Review):

      Summary:

      Yan, Hu, and colleagues introduce BADGERS, a new method for biobank-wide scanning to find associations between a phenotype of interest, and the genetic component of a battery of candidate phenotypes. Briefly, BADGERS capitalizes on publicly available weights of genetic variants for a myriad of traits to estimate polygenic risk scores for each trait, and then identify associations with the trait of interest. Of note, the method works using summary statistics for the trait of interest, which is especially beneficial for running in population-based cohorts that are not enriched for any particular phenotype (ie. with few actual cases of the phenotype of interest).

      Here, they apply BADGERS on Alzheimer's disease (AD) as the trait of interest, and a battery of circa 2,000 phenotypes with publicly available precalculated genome-wide summary statistics from the UK Biobank. They run it on two AD cohorts, to discover at least 14 significant associations between AD and traits. These include expected associations with dementia, cognition (educational attainment), and socioeconomic status-related phenotypes. Through multivariate modelling, they distinguish between (1) clearly independent components associated with AD, from (2) by-product associations that are inflated in the original bivariate analysis. Analyses stratified according to APOE inclusion show that this region does not seem to play a role in the association of some of the identified phenotypes. Of note, they observe overlap but significant differences in the associations identified with BADGERS and other Mendelian randomization (MR), hinting at BADGERS being more powerful than classical top variant-based MR approaches. They then extend BADGERS to other AD-related phenotypes, which serves to refine the hypotheses about the underlying mechanisms accounting for the genetic correlation patterns originally identified for AD. Finally, they run BADGERS on a pre-clinical cohort with mild cognitive impairment. They observe important differences in the association patterns, suggesting that this preclinical phenotype (at least in this cohort) has a different genetic architecture than general AD.

      We thank the reviewer a lot for the conclusion and positive comments.

      Strengths:

      BADGERS is an interesting new addition to a stream of attempts to "squeeze" biobank data beyond pure association studies for diagnosis. Increasingly available biobank cohorts do not usually focus on specific diseases. However, they tend to be data-rich, opening for deep explorations that can be useful to refine our knowledge of the latent factors that lead to diagnosis. Indeed, the possibility of running genetic correlation studies in specific sub-settings of interest (e.g. preclinical cohorts) is arguably the most interesting aspect of BADGERS. Classical methods like LDSC or two-sample MR capitalize on publicly available summary statistics from large cohorts, or having access to individual genotype data of large cohorts to ensure statistical power. Seemingly, BADGERS provides a balanced opportunity to dissect the correlation between traits of interest in settings with small sample size in which other methods do not work well.

      We thank the reviewer a lot for the conclusion and positive comments.

      Weaknesses:

      However, the increased statistical power is just hinted, and for instance, they do not explore if LDSC would have identified these associations. Although I suspect that is the case, this evidence is important to ensure that the abovementioned balance is right. Finally, as discussed by the authors, the reliance on polygenic risk scoring necessarily undermines the causality evidence gained through BADGERS. In this sense, BADGERS provides an alternative to strict instrumental-variable based analysis, which can be particularly useful to generate new mechanistic hypotheses.

      We thank the reviewer a lot for the comments. We understand the importance of comparing BADGER to other methods. The comparison with LDSC, while not directly relevant to BADGER’s causal inference aims, is indeed an interesting aspect to consider for future studies. In this paper, we focused on comparing BADGER with Mendelian Randomization (MR), which shares its causal inference objective.

      As a result, BADGERS identified a total of 48 traits that reached Bonferroni-corrected statistical significance. In contrast, MR-IVW only identified nine traits with Bonferroni-corrected statistical significance. Among these nine traits, seven were also identified by BADGERS. This demonstrates that BADGER holds higher power in detecting causal relationships.

      Regarding the use of polygenic risk scoring, we agree that it holds challenges in directly inferring causality. While BADGERS offers an innovative way to explore genetic correlations and can help generate new hypotheses about disease mechanisms, it does not replace the causal inferences that can be drawn from instrumental-variable-based analyses. Instead, it should be viewed as a complementary tool that can illuminate potential genetic relationships and guide further causal investigations.

      In summary, after 15 years of focus on diagnosis that would require having individual access to large patient cohorts, BADGERS can become an excellent tool to dig into trait heterogeneity, especially if it turns out to be more powerful than other available methodologies.

      We thank the reviewer a lot for the conclusion and positive comments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a valuable contribution to cardiac arrhythmia research by demonstrating long noncoding RNA Dachshund homolog 1 (lncDACH1) tunes sodium channel functional expression and affects cardiac action potential conduction and rhythms. Whereas the evidence for functional impact of lncDACH1 expression on cardiac sodium currents and rhythms is convincing, biochemical experiments addressing the mechanism of changes in sodium channel expression and subcellular localization are incomplete.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors show that a long-non coding RNA lncDACH1 inhibits sodium currents in cardiomyocytes by binding to and altering the localization of dystrophin. The authors use a number of methodologies to demonstrate that lncDACH1 binds to dystrophin and disrupts its localization to the membrane, which in turn downregulates NaV1.5 currents. Knockdown of lncDACH1 upregulates NaV1.5 currents. Furthermore, in heart failure, lncDACH1 is shown to be upregulated which suggests that this mechanism may have pathophysiolgoical relevance.

      Strengths:

      (1) This study presents a novel mechanism of Na channel regulation which may be pathophysiologically important.

      (2) The experiments are comprehensive and systematically evaluate the physiological importance of lncDACH1.

      Weaknesses:

      (1). What is indicated by the cytoplasmic level of NaV1.5, a transmembrane protein? The methods do not provide details regarding how this was determined. Do you authors means NaV1.5 retained in various intracellular organelles?

      Thank you for the good suggestion. Our study showed that Nav1.5 was transferred to the cell membrane by the scaffold protein Dystropin in response to the regulation of LncDACH1, but not all Nav1.5 in the cytoplasm was transferred to the cell membrane. Therefore, the cytoplasmic level of Nav1.5 represents the Nav1.5 protein that is not transferred to the cell membrane but stays in the cytoplasm and various organelles within the cytoplasm when Nav1.5 is regulated by LncDACH1

      (2) What is the negative control in Fig. 2b, Fig. 4b, Fig. 6e, Fig. 7c? The maximum current amplitude in these seem quite different. -40 pA/pF in some, -30 pA/pF in others and this value seems to be different than in CMs from WT mice (<-20 pA/pF). Is there an explanation for what causes this variability between experiments and/or increase with transfection of the negative control? This is important since the effect of lncDACH1 is less than 50% reduction and these could fall in the range depending on the amplitude of the negative control.

      Thank you for the insightful comment. The negative control in Fig. 2b, Fig. 4b, Fig. 6e are primary cardiomyocytes transfected with empty plasmids. The negative control in Fig.7c are cardiomyocytes of wild-type mice injected with control virus. When we prepare cells before the patch-clamp experiments, the transfection efficiency of the transfection reagent used in different batches of cells, as well as the different cell sizes, ultimately lead to differences in CMS.

      (3) NaV1.5 staining in Fig. 1E is difficult to visualize and to separate from lncDACH1. Is it possible to pseudocolor differently so that all three channels can be visualized/distinguished more robustly?

      Thank you for the good suggestion. We have re-added color to the original image to distinguish between the three channels.

      Author response image 1.

      (4) The authors use shRNA to knockdown lncDACH1 levels. It would be helpful to have a scrambled ShRNA control.

      Thank you for the insightful comment. The control group we used was actually the scrambled shRNA, but we labeled the control group as NC in the article, maybe this has caused you to misunderstand.

      (5) Is there any measurement on the baseline levels of LncDACH1 in wild-type mice? It seems quite low and yet is a substantial increase in NaV1.5 currents upon knocking down LncDACH1. By comparison, the level of LncDACH1 seems to be massively upregulated in TAC models. Have the authors measured NaV1.5 currents in these cells? Furthermore, does LncDACH1 knockdown evoke a larger increase in NaV1.5 currents?

      Thank you for the insightful comment.

      (1).The baseline protein levels of LncDACH1 in wild-type mice and LncDACH1-CKO mice has been verified in a previously published article(Figure 3).(Hypertension. 2019;74:00-00. DOI: 10.1161/HYPERTENSIONAHA.119.12998.)

      Author response image 2.

      (2). We did not measure the Nav1.5 currents in cardiomyocytes of the TAC model mice in this artical, but in another published paper, we found that the Nav1.5 current in the TAC model mice was remarkably reduced than that in wild-type mice(Figure 4).(Gene Ther. 2023 Feb;30(1-2):142-149. DOI: 10.1038/s41434-022-00348-z)

      Author response image 3.

      This is consistent with our results in this artical, and our results show that LncDACH1 levels are significantly upregulated in the TAC model, then in the LncDACH1-TG group, the Nav1.5 current is significantly reduced after the LncDACH1 upregulation(Figure 3).

      Author response image 4.

      (6) What do error bars denote in all bar graphs, and also in the current voltage relationships?

      Thank you for the good comment. All the error bars represent the mean ± SEM. They represent the fluctuation of all individuals of a set of data based on the average value of this set of data, that is, the dispersion of a set of data.

      Reviewer #2 (Public Review):

      This manuscript by Xue et al. describes the effects of a long noncoding RNA, lncDACH1, on the localization of Nav channel expression, the magnitude of INa, and arrhythmia susceptibility in the mouse heart. Because lncDACH1 was previously reported to bind and disrupt membrane expression of dystrophin, which in turn is required for proper Nav1.5 localization, much of the findings are inferred through the lens of dystrophin alterations.

      The results report that cardiomyocyte-specific transgenic overexpression of lncDACH1 reduces INa in isolated cardiomyocytes; measurements in whole heart show a corresponding reduction in conduction velocity and enhanced susceptibility to arrhythmia. The effect on INa was confirmed in isolated WT mouse cardiomyocytes infected with a lncDACH1 adenoviral construct. Importantly, reducing lncDACH1 expression via either a cardiomyocyte-specific knockout or using shRNA had the opposite effect: INa was increased in isolated cells, as was conduction velocity in heart. Experiments were also conducted with a fragment of lnDACH1 identified by its conservation with other mammalian species. Overexpression of this fragment resulted in reduced INa and greater proarrhythmic behavior. Alteration of expression was confirmed by qPCR.

      The mechanism by which lnDACH1 exerts its effects on INa was explored by measuring protein levels from cell fractions and immunofluorescence localization in cells. In general, overexpression was reported to reduce Nav1.5 and dystrophin levels and knockout or knockdown increased them.

      Thank you for summarizing our work and thank you very much for your appreciation on our work.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors report the first evidence of Nav1.5 regulation by a long noncoding RNA, LncRNA-DACH1, and suggest its implication in the reduction in sodium current observed in heart failure. Since no direct interaction is observed between Nav1.5 and the LncRNA, they propose that the regulation is via dystrophin and targeting of Nav1.5 to the plasma membrane.

      Strengths:

      (1) First evidence of Nav1.5 regulation by a long noncoding RNA.

      (2) Implication of LncRNA-DACH1 in heart failure and mechanisms of arrhythmias.

      (3) Demonstration of LncRNA-DACH1 binding to dystrophin.

      (4) Potential rescuing of dystrophin and Nav1.5 strategy.

      Thank you very much for your appreciation on our work.

      Weaknesses:

      (1) Main concern is that the authors do not provide evidence of how LncRNA-DACH1 regulates Nav1.5 protein level. The decrease in total Nav1.5 protein by about 50% seems to be the main consequence of the LncRNA on Nav1.5, but no mechanistic information is provided as to how this occurs.

      Thank you for the insightful comment.

      (1) The mechanism of the whole article is as mentioned in the discussion at the end of the article: LncDACH1 binds to dystrophin and thus inhibits membrane trafficking of Nav1.5, Dystrophin is a well-characterized Nav1.5 partner protein. It indirectly interacts with Nav1.5 via syntrophin, which binds with the C-terminus of dystrophin and with the SIV motif on the C-terminus of Nav1.5(Circ Res. 2006;99:407-414. doi: 10.1161/01.RES.0000237466.13252.5e)(Circulation.2014;130:147-160.doi:10.1161/CIRCULATIONAHA.113.007852).

      And we performed pulldown and RNA immunoprecipitation experiments to verify it (Figure 1).

      Author response image 5.

      2) Then we found that overexpression of lncDACH1 increased the ubiquitination of Nav1.5, which explains the downregulation of total Nav1.5 protein (Online Supplementary Figure 12).

      Author response image 6.

      3). Lastly,we found that lncDACH1 failed to pulldown Nav1.5 and anti-Nav1.5 did not precipitate lncDACH1( Supplementary Fig. 1).

      Author response image 7.

      These data indicated that lncDACH does not interact with Nav1.5 directly. It participates in the regulation of Nav1.5 by binding to dystrophin.Cytoplasmic Nav1.5 that failed to target on plasma membrane may be quickly distinguished and then degraded by these ubiquitination enzymes.

      (2) The fact that the total Nav1.5 protein is reduced by 50% which is similar to the reduction in the membrane reduction questions the main conclusion of the authors implicating dystrophin in the reduced Nav1.5 targeting. The reduction in membrane Nav1.5 could simply be due to the reduction in total protein.

      Thank you for the insightful comment. We do not rule out the possibility that the reduction in membrane Nav1.5 maybe be due to the reduction in total protein, but we don't think this is the main mechanism. Our data indicates that the membrane and total protein levels of Nav1.5 were reduced by 50%. However, the cytoplasmic Nav1.5 increased in the hearts of lncDACH1-TG mice than WT controls rather than reduced like membrane and total protein(Figure 1).

      Author response image 8.

      Therefore, we think the mian mechanism of the whole article is as mentioned in the discussion at the end of the article: LncDACH1 binds to dystrophin and thus inhibits membrane trafficking of Nav1.5.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In Fig. 6E the error bars are only in one direction for cF-lncDACH1. It seems that this error overlaps for NC and cF-lncDACH1 at several voltages, yet it is marked as statistically significant. Also in Fig. 7C, what statistical test was used? Do the authors account for multiple comparisons?

      Thank you for the insightful comment.

      (1) We have recalculated the two sets of data and confirmed that there are indeed statistically significant between the two sets of data for NC and cF-lncDACH1 at In Fig. 6E, The overlaps in the picture may only be visually apparent.

      (2) The data in Fig. 7C are expressed as mean ± SEM. Statistical analysis was performed using unpaired Student’s t test or One-Way Analysis of Variance (ANOVA) followed by Tukey’s post-hoc analysis.

      (2) line 57, "The Western blot" remove "The"

      Sorry for the mistake. We have corrected it.

      (3) line 61, "The opposite data were collected" It is unclear what is meant by opposite.

      Sorry for the mistake. We have corrected it.

      (4) Lines 137-140. This sentence is complex, I would simplify as two sentences.

      Sorry for the mistake. We have corrected it.

      (5) Line 150, "We firstly validated" should be "we first validated"

      Sorry for the mistake. We have corrected it.

      (6) Line 181, "Consistently, the membrane" Is this statement meant to indicate that the experiments yielded a consistent results or that this statement is consistent with the previous one? In either case, this sentence should be reworded for clarification.

      Sorry for the mistake. We have corrected it.

      (7) Line 223, "In consistent, the ex vivo" I am not sure what In consistent means here.

      Thank you for the good suggestion. We mean that the results of ex vivo is consistent with the results of in vivo. We have corrected it to make it clearer.

      (8) Line 285. "a bunch of studies" could be rephrased as "multiple studies"

      Sorry for the mistake. We have corrected it.

      (9) Line 299 "produced no influence" Do you mean produced no change?

      Thank you for the good suggestion.As you put it,we mean it produced no change.

      (10) Line 325 "is to interact with the molecules" no need for "the molecules

      Sorry for the mistake. We have corrected it.

      (11) lines 332-335. This sentence is very confusing.

      Thank you for the insightful comment. We have corrected it.

      (12) Lines 341-342. It is unnecessary to claim primacy here.

      Thank you for the good suggestion. We have removed this sentence.

      (13) Line 373. "Sodium channel remodeling is commonly occured in" perhaps rephrase as occurs commonly

      Thank you for the insightful comment. We have corrected it.

      Reviewer #2 (Recommendations For The Authors):

      Critique

      (1) Aside from some issues with presentation noted below, these data provide convincing evidence of a link between lncDACH1 and Na channel function. The identification of a lncDACH1 segment conserved among mammalian species is compelling. The observation that lncDACH1 is increased in a heart failure model and provides a plausible hypothesis for disease mechanism.

      Thank you very much for your appreciation on our work.

      (2) Has a causal link between dystrophin and Na channel surface expression has been made, or is it an argument based on correlation? Is it possible to rule out a direct effect of lncDACH1 on Na channel expression? A bit more discussion of the limitations of the study would help here.

      Thank you for the insightful comment.

      (1). Dystrophin is a well-characterized Nav1.5 partner protein. It indirectly interacts with Nav1.5 via syntrophin, which binds with the C-terminus of dystrophin and with the SIV motif on the C-terminus of Nav1.5(Circ Res. 2006;99:407-414. doi: 10.1161/01.RES.0000237466.13252.5e)(Circulation.2014;130:147-160.doi:10.1161/CIRCULATIONAHA.113.007852).

      Author response image 9.

      (2).we performed pulldown and RNA immunoprecipitation experiments. The data showed that lncDACH1 failed to pulldown Nav1.5 and anti-Nav1.5 did not precipitate lncDACH1 (Online Supplementary Figure 11). These data indicated that lncDACH does not interact with Nav1.5 directly. ( Supplementary Fig. 1)

      Author response image 10.

      (3) What normalization procedures were used for qPCR quantification? I could not find these.

      Thank you for the good suggestion.The expression levels of mRNA were calculated using the comparative cycle threshold (Ct) method (2−ΔΔCt). Each data point was then normalized to ACTIN as an internal control in each sample. The final results are expressed as fold changes by normalizing the data to the values from control subjects. We have added the normalization procedures in the methods section of the article.

      (4) In general, I found the IF to be unconvincing - first, because the reported effects were not very apparent to me, but more importantly, because only exemplars were shown without quantification of a larger sample size.

      Thank you for the good suggestion. Accordingly, we quantified the immunostaining data. The data have been included in Supplementary Figure 2- 16.The sample size is labeled in the caption.

      Author response image 11.

      Fluorescence intensity of lncDACH1, dystrophin and Nav1.5 in isolated cardiomyocytes of lncDACH1-TG mice. a,b, Membrane levels of dystrophin (dys) and Nav1.5. N=9 for dys. N=8 for Nav1.5. P<0.05 versus WT group. c,d, Cytoplasm levels of dystrophin and Nav1.5. N=9. P<0.05 versus WT group. e, Fluorescence in situ hybridization (FISH) images of LncDACH1. N=10. *P<0.05 versus WT group. P-values were determined by unpaired t test.

      Author response image 12.

      Fluorescence intensity of dystrophin and Nav1.5 in cultured neonatal cardiomyocyte overexpressing lncDACH1. a,b, Membrane levels of dystrophin and Nav1.5. N=9. P<0.05 versus NC group. c,d, Cytoplasm levels of dystrophin and Nav1.5. N=9 for dys. N=12 for Nav1.5. P<0.05 versus NC group. P-values were determined by unpaired t test.

      Author response image 13.

      Fluorescence intensity of lncDACH1, dystrophin and Nav1.5 in isolated cardiomyocytes of lncDACH1-cKO mice. a,b, Membrane levels of dystrophin (dys) and Nav1.5. N=12 for dys. N=8 for Nav1.5. P<0.05 versus WT group. c,d, Distribution of cytoplasm levels of dystrophin and Nav1.5. N=12. P<0.05 versus WT group. e, Fluorescence in situ hybridization (FISH) images of LncDACH1 expression. N=8. *P<0.05 versus WT group. P-values were determined by unpaired t test.

      Author response image 14.

      Fluorescence intensity of dystrophin and Nav1.5 in cultured neonatal cardiomyocytes after knocking down of lncDACH1. a,b, Distribution of membrane levels of dystrophin and Nav1.5. N=11 for dys. N=8 for Nav1.5.P<0.05 versus NC group. c,d, Distribution of cytoplasm levels of dystrophin and Nav1.5. N=12 for dys. N=9 for Nav1.5.P<0.05 versus NC group. P-values were determined by unpaired t test.

      Author response image 15.

      Fluorescence intensity of dystrophin and Nav1.5 in isolated cardiomyocytes overexpressing cF-lncDACH1. a,b, Membrane levels of dystrophin (dys) and Nav1.5. N=9 for dys. N=7 for Nav1.5. P<0.05 versus NC group. c,d, Cytoplasm levels of dystrophin and Nav1.5. N=6 for dys. N=7 for Nav1.5. P<0.05 versus NC group. P-values were determined by unpaired t test.

      Author response image 16.

      Fluorescence intensity of dystrophin and Nav1.5 in cultured neonatal cardiomyocytes overexpressing cF-lncDACH1. a,b, Membrane levels of dystrophin and Nav1.5. N=10 for dys. N=11 for Nav1.5. P<0.05 versus NC group. c,d, Cytoplasm levels of dystrophin and Nav1.5. N=7 for dys. N=6 for Nav1.5.P<0.05 versus NC group. P-values were determined by unpaired t test.

      Author response image 17.

      Fluorescence intensity of Nav1.5 in human iPS differentiated cardiomyocytes overexpressing cF-lncDACH1. a, Membrane levels of Nav1.5. N=8 for Nav1.5. P<0.05 versus NC group. b, Cytoplasm levels of Nav1.5. N=10 for Nav1.5.P<0.05 versus NC group. P-values were determined by unpaired t test.

      (5) More information on how the fractionation kit works would be helpful. How are membrane v. cytoplasm fractions identified?

      a. I presume the ER is part of the membrane fraction? When Nav1.5 is found in the cytoplasmic fraction, what subcompartment is it in - the proteasome?

      b. In the middle panel of A - is the dystrophin signal visible on the WB for WT? I assume the selected exemplar is the best of the blots and so this raises concerns. Much is riding on the confidence with which the fractions report "membrane" v "cytoplasm."

      Thank you for the insightful comment.

      (1). How the fractionation kit works:

      The kit utilizes centrifuge column technology to obtain plasma membrane structures with native activity and minimal cross-contamination with organelles without the need for an ultracentrifuge and can be used for a variety of downstream assays. Separation principle: cells/tissues are sensitized by Buffer A, the cells pass through the centrifuge column under the action of 16000Xg centrifugation, the cell membrane is cut to make the cell rupture, and then the four components of nucleus, cytoplasm, organelle and plasma membrane will be obtained sequentially through differential centrifugation and density centrifugation, which can be used for downstream detection.

      Author response image 18.

      (2). How are membrane v. cytoplasm fractions identified:

      The membrane proteins and cytosolic proteins isolated by the kit, and then the internal controls we chose when performing the western blot experiment were :membrane protein---N-cadherin cytosolic protein---β-Actin

      Most importantly, when we incubate either the primary antibody of N-cadherin with the PVDF membrane of the cytosolic protein, or the primary antibody of the cytosolic control β-Actin with the PVDF membrane of the membrane protein, the protein bands cannot be obtained in the scan results

      Author response image 19.

      (6) More detail in Results, figures, and figure legends will assist the reader.

      a. In Fig. 5, it would be helpful to label sinus rhythm vs. arrhythmia segments.

      Thank you for the good suggestion. We've marked Sinus Rhythm and Arrhythmia segments with arrows

      Author response image 20.

      b. Please explain in the figure legend what the red bars in 5A are

      Thank you for the insightful comment. We've added the explanation to the figure legend .The red lines in the ECG traces indicate VT duration.

      c. In 5C, what the durations pertain to.

      Thank you for the good suggestion. 720ms-760ms refers to the duration of one action potential, with 720ms being the peak of one action potential and 760ms being the peak of another action potential.The interval duration is not fixed, in this artical, we use 10ms as an interval to count the phase singularities from the Consecutive phase maps. Because the shorter the interval duration, the larger the sample size and the more convincing the data.

      d. In the text, please define "breaking points" and explain what the physiological underpinning is. Define "phase singularity."

      Thank you for the insightful comment. Cardiac excitation can be viewed as an electrical wave, with a wavefront corresponding to the action potential upstroke (phase 0) and a waveback corresponding to rapid repolarization (phase 3). Normally, Under normal circumstances, cardiac conduction is composed of a sequence of well-ordered action potentials, and in the results of optical mapping experiments, different colors represent different phases.when a wave propagates through cardiac tissue, wavefront and waveback never touch.when arrhythmias occur in the heart, due to factors such as reenfrant phenomenon, the activation contour will meet the refractory contour and waves will break up, initiating a newly spiral reentry. Corresponding to the optical mapping result graph, different colors representing different time phases (including depolarization and repolarization) come together to form a vortex, and the center of the vortex is defined as the phase singularity.

      (7) In reflecting on why enhanced INa is not proarrhythmic, it is noted that the kinetics are not altered. I agree that is key, but perhaps the consequence could be better articulated. Because lncDACH1 does not alter Nav1.5 gating, the late Na current may not be enhanced to the same effect as observed with LQT gain-of-function Nav1.5 mutations, in which APD prolongation is attributed to gating defects that increase late Na current.

      Thank you for the good suggestion. Your explanation is very brilliant and important for this article. We have revised the discussion section of the article and added these explanations to it.

      Reviewer #3 (Recommendations For The Authors):

      (1) Experiments to specifically address the reduction in total Nav1.5 protein should be included.

      Thank you for the insightful comment. We examined the ubiquitination of Nav1.5. We found that overexpression of lncDACH1 increased the ubiquitination of Nav1.5, which explains the downregulation of total Nav1.5 protein (Online Supplementary Figure 12).

      Author response image 21.

      (2) Experiments to convincingly demonstrate that LncRNA-DACH1 regulates Nav1.5 targeting via dystrophin are missing. As it is, total reduction in Nav1.5 seems to be the explanation as to why there is a decrease in membrane Nav1.5.

      Thank you for the insightful comment. we performed pulldown and RNA immunoprecipitation experiments. The data showed that lncDACH1 can pulldown dystrophin(Figure 1),but failed to pulldown Nav1.5 and anti-Nav1.5 did not precipitate lncDACH1( Supplementary Fig. 1). These data indicated that lncDACH does not interact with Nav1.5 directly. It participates in the regulation of Nav1.5 by binding to dystrophin.

      Author response image 22.

    1. Since the main goal of this study was to capture the experiences of Asian American girls, I did not include most of the other Basement Group students in my research. There may be gender, ethnic, and/or racial differences that are not reflected in this study. As an exception, I talked with Savannah and Meli, two Salvadoran immigrant girls who were close friends with the Asian American girls and part of the core members of the Basement Community. Their perspectives helped deepen my understanding of the experiences of the main participants

      I think step-by-step studies that control variables are important. It is precisely because of the various details of the research objects that we pay attention to that determine the rigor and objectivity of our research. We can also count them on a large scale in the future. thereby completing the objectivity of the entire study

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides an important cell atlas of the gill of the mussel Gigantidas platifrons using a single nucleus RNA-seq dataset, a resource for the community of scientists studying deep sea physiology and metabolism and intracellular host-symbiont relationships. The work, which offers solid insights into cellular responses to starvation stress and molecular mechanisms behind deep-sea chemosymbiosis, is of relevance to scientists interested in host-symbiont relationships across ecosystems.

      Public Reviews:

      Reviewer #1 (Public Review):

      Wang et al have constructed a comprehensive single nucleus atlas for the gills of the deep sea Bathymodioline mussels, which possess intracellular symbionts that provide a key source of carbon and allow them to live in these extreme environments. They provide annotations of the different cell states within the gills, shedding light on how multiple cell types cooperate to give rise to the emergent functions of the composite tissues and the gills as a whole. They pay special attention to characterizing the bacteriocyte cell populations and identifying sets of genes that may play a role in their interaction with the symbiotes.

      Wang et al sample mussels from 3 different environments: animals from their native methane-rich environment, animals transplanted to a methane-poor environment to induce starvation, and animals that have been starved in the methane-poor environment and then moved back to the methane-rich environment. They demonstrated that starvation had the biggest impact on bacteriocyte transcriptomes. They hypothesize that the upregulation of genes associated with lysosomal digestion leads to the digestion of the intracellular symbiont during starvation, while the non-starved and reacclimated groups more readily harvest the nutrients from symbiotes without destroying them.

      Strengths:

      This paper makes available a high-quality dataset that is of interest to many disciplines of biology. The unique qualities of this non-model organism and the collection of conditions sampled make it of special interest to those studying deep sea adaptation, the impact of environmental perturbation on Bathymodioline mussels populations, and intracellular symbiotes. The authors do an excellent job of making all their data and analysis available, making this not only an important dataset but a readily accessible and understandable one.

      The authors also use a diverse array of tools to explore their data. For example, the quality of the data is augmented by the use of in situ hybridizations to validate cluster identity and KEGG analysis provides key insights into how the transcriptomes of bacteriocytes change.

      The authors also do a great job of providing diagrams and schematics to help orient non-mussel experts, thereby widening the audience of the paper.

      Thank the reviewer for the valuable feedback on our study. We are grateful that the reviewers found our work to be interesting and we appreciate their thorough evaluation of our research. Their constructive comments will be considered as we continue to develop and improve our study.

      Weaknesses:

      One of the main weaknesses of this paper is the lack of coherence between the images and the text, with some parts of the figures never being referenced in the body of the text. This makes it difficult for the reader to interpret how they fit in with the author's discussion and assess confidence in their analysis and interpretation of data. This is especially apparent in the cluster annotation section of the paper.

      We appreciate the feedback and suggestions provided by the reviewer, and we have revised our manuscript to make it more accessible to general audiences.

      Another concern is the linking of the transcriptomic shifts associated with starvation with changes in interactions with the symbiotes. Without examining and comparing the symbiote population between the different samples, it cannot be concluded that the transcriptomic shifts correlate with a shift to the 'milking' pathway and not other environmental factors. Without comparing the symbiote abundance between samples, it is difficult to disentangle changes in cell state that are due to their changing interactions with the symbiotes from other environmental factors.

      We are grateful for the valuable feedback and suggestions provided by the reviewer. Our keen interest lies in understanding symbiont responses, particularly at the single-cell level. However, it's worth noting that existing commercial single-cell RNA-seq technologies rely on oligo dT priming for reverse transcription and barcoding, thus omitting bacterial gene expression information from our dataset. We hope that advancements in technology will soon enable us to perform an integrated analysis encompassing both host and symbiont gene expression.

      Additionally, conclusions in this area are further complicated by using only snRNA-seq to study intracellular processes. This is limiting since cytoplasmic mRNA is excluded and only nuclear reads are sequenced after the organisms have had several days to acclimate to their environment and major transcriptomic shifts have occurred.

      We appreciate the comments shared by the reviewer and agree that scRNA-seq provides more comprehensive transcriptional information by targeting the entire mRNA of the cell. However, we would like to highlight that snRNA-seq has some unique advantages over scRNA-seq. Notably, snRNA-seq allows for simple snap-freezing of collected samples, facilitating easier storage, particularly for samples obtained during field trips involving deep-sea animals and other ecologically significant non-model animal samples. Additionally, unlike scRNA-seq, snRNA-seq eliminates the need for tissue dissociation, which often involves prolonged enzymatic treatment of deep-sea animal tissue/cells under atmospheric pressure. This process can potentially lead to the loss of sensitive cells or alterations in gene expression. Moreover, snRNA-seq procedures disregard the size and shape of animal cells, rendering it a superior technology for constructing the cell atlas of animal tissues. Consequently, we assert that snRNA-seq offers flexibility and represents a suitable choice for the research objects of our current research.

      Reviewer #2 (Public Review):

      Wang, He et al. shed insight into the molecular mechanisms of deep-sea chemosymbiosis at the single-cell level. They do so by producing a comprehensive cell atlas of the gill of Gigantidas platifrons, a chemosymbiotic mussel that dominates the deep-sea ecosystem. They uncover novel cell types and find that the gene expression of bacteriocytes, the symbiont-hosting cells, supports two hypotheses of host-symbiont interactions: the "farming" pathway, where symbionts are directly digested, and the "milking" pathway, where nutrients released by the symbionts are used by the host. They perform an in situ transplantation experiment in the deep sea and reveal transitional changes in gene expression that support a model where starvation stress induces bacteriocytes to "farm" their symbionts, while recovery leads to the restoration of the "farming" and "milking" pathways.

      A major strength of this study includes the successful application of advanced single-nucleus techniques to a non-model, deep-sea organism that remains challenging to sample. I also applaud the authors for performing an in situ transplantation experiment in a deep-sea environment. From gene expression profiles, the authors deftly provide a rich functional description of G. platifrons cell types that is well-contextualized within the unique biology of chemosymbiosis. These findings offer significant insight into the molecular mechanisms of deep-sea host-symbiont ecology, and will serve as a valuable resource for future studies into the striking biology of G. platifrons.

      The authors' conclusions are generally well-supported by their results. However, I recognize that the difficulty of obtaining deep-sea specimens may have impacted experimental design. In this area, I would appreciate more in-depth discussion of these impacts when interpreting the data.

      Thank the reviewer for their valuable feedback on our study. We're grateful that the reviewers found our work interesting, and we appreciate their thorough evaluation of our research. We'll consider their constructive comments as we continue to develop and improve our study.

      Because cells from multiple individuals were combined before sequencing, the in situ transplantation experiment lacks clear biological replicates. This may potentially result in technical variation (ie. batch effects) confounding biological variation, directly impacting the interpretation of observed changes between the Fanmao, Reconstitution, and Starvation conditions. It is notable that Fanmao cells were much more sparsely sampled. It appears that fewer cells were sequenced, resulting in the Starvation and Reconstitution conditions having 2-3x more cells after doublet filtering. It is not clear whether this is due to a technical factor impacting sequencing or whether these numbers are the result of the unique biology of Fanmao cells. Furthermore, from Table S19 it appears that while 98% of Fanmao cells survived doublet filtering, only ~40% and ~70% survived for the Starvation and Reconstitution conditions respectively, suggesting some kind of distinction in quality or approach.

      There is a pronounced divergence in the relative proportions of cells per cell type cluster in Fanmao compared to Reconstitution and Starvation (Fig. S11). This is potentially a very interesting finding, but it is difficult to know if these differences are the expected biological outcome of the experiment or the fact that Fanmao cells are much more sparsely sampled. The study also finds notable differences in gene expression between Fanmao and the other two conditions- a key finding is that bacteriocytes had the largest Fanmao-vs-starvation distance (Fig. 6B). But it is also notable that for every cell type, one or both comparisons against Fanmao produced greater distances than comparisons between Starvation and Reconstitution (Fig. 6B). Again, it is difficult to interpret whether Fanmao's distinctiveness from the other two conditions is underlain by fascinating biology or technical batch effects. Without biological replicates, it remains challenging to disentangle the two.

      As highlighted by the reviewer, our experimental design involves pooling multiple biological samples within a single treatment state before sequencing. We acknowledge the concern regarding the absence of distinct biological replicates and the potential impact of batch effects on result interpretation. While we recognize the merit of conducting multiple sequencing runs for a single treatment to provide genuine biological replicates, we contend that batch effects may not exert a strong influence on the observed patterns.

      In addition, we applied a bootstrap sampling algorithm to assess whether the gene expression patterns within a cluster are more similar than those between clusters. This algorithm involves selecting a portion of cells per cluster and examining whether this subset remains distinguishable from other clusters. Our assumption was that if different samples exhibited distinct expression patterns due to batch effect, the co-assignment probabilities of a cluster would be very low. This expectation was not met in our data, as illustrated in Fig. S2. The lack of significantly low co-assignment probabilities within clusters suggests that batch effects may not exert a strong influence on our results.

      Indeed, we acknowledge a noticeable shift in the expression patterns of certain cell types, such as the bacteriocyte. However, this is not universally applicable across all cell types. For instance, the UMAP figure in Fig. 6A illustrates a substantial overlap among basal membrane cell 2 from Fanmao, Starvation, and Reconstitution treatments, and the centroid distances between the three treatments are subtle, as depicted in Fig. 6B. This consistent pattern is also observed in DEPC, smooth muscle cells, and the food groove ciliary cells.

      The reviewer also noted variations in the number of cells per treatment. Specifically, Fanmao sequencing yielded fewer than 10 thousand cells, whereas the other two treatments produced 2-3 times more cells after quality control (QC). It is highly probable that the technician loaded different quantities of cells into the machine for single-nucleus sequencing—a not uncommon occurrence in this methodology. While loading more cells may increase the likelihood of doublets, it is crucial to emphasize that this should not significantly impact the expression patterns post-QC. It's worth noting that overloading samples has been employed as a strategic approach to capture rare cell types, as discussed in a previous study (reference: 10.1126/science.aay0267).

      The reviewer highlighted the discrepancy in cell survival rates during the 'doublet filtering' process, with 98% of Fanmao cells surviving compared to approximately 40% and 70% for the Starvation and Reconstitution conditions, respectively. It's important to clarify that the reported percentages reflect the survival of cells through a multi-step QC process employing various filtering strategies.

      Post-doublet removal, we filtered out cells with <100 or >2500 genes and <100 or >6000 unique molecular identifiers (UMIs). Additionally, genes with <10 UMIs in each data matrix were excluded. The observed differences in survival rates for Starvation and Reconstitution cells can be attributed to the total volume of data generated in Illumina sequencing. Specifically, we sequenced approximately 91 GB of data for Fanmao, ~196 GB for Starvation, and ~249 GB for Reconstitution. As a result, the qualified data obtained for Starvation and Reconstitution conditions was only about twice that of Fanmao due to the limited data volume.

      The reviewer also observed a divergence in the relative proportions of cells per cell type cluster in Fanmao compared to Reconstitution and Starvation, as depicted in Fig. S1. This discrepancy may hold true biological significance, presenting a potentially intriguing finding. However, our discussion on this pattern was rather brief, as we acknowledge that the observed differences could be influenced by the sample preparation process for dissection and digestion. It is crucial to consider that cutting a slightly different area during dissection may result in variations in the proportion of cells obtained. While we recognize the potential impact of this factor, we do not think that the sparsity of sampling alone could significantly affect the relative proportions of cells per cell type.

      In conclusion, we acknowledge the reviewer's suggestion that sequencing multiple individual samples per treatment condition would have been ideal, rather than pooling them together. However, the homogenous distribution observed in UMAP and the consistent results obtained from bootstrap sampling suggest that the impact of batch effects on our analyses is likely not substantial. Additionally, based on our understanding, the smaller number of cells in the Fanmao sample should not have any significant effect on the resulting different proportion of cells or the expression patterns per each cluster.

      Reviewer #3 (Public Review):

      Wang et al. explored the unique biology of the deep-sea mussel Gigantidas platifrons to understand the fundamental principles of animal-symbiont relationships. They used single-nucleus RNA sequencing and validation and visualization of many of the important cellular and molecular players that allow these organisms to survive in the deep sea. They demonstrate that a diversity of cell types that support the structure and function of the gill including bacteriocytes, specialized epithelial cells that host sulfur-oxidizing or methane-oxidizing symbionts as well as a suite of other cell types including supportive cells, ciliary, and smooth muscle cells. By performing experiments of transplanting mussels from one habitat which is rich in methane to methane-limited environments, the authors showed that starved mussels may consume endosymbionts versus in methane-rich environments upregulated genes involved in glutamate synthesis. These data add to the growing body of literature that organisms control their endosymbionts in response to environmental change.

      The conclusions of the data are well supported. The authors adapted a technique that would have been technically impossible in their field environment by preserving the tissue and then performing nuclear isolation after the fact. The use of single-nucleus sequencing opens the possibility of new cellular and molecular biology that is not possible to study in the field. Additionally, the in-situ data (both WISH and FISH) are high-quality and easy to interpret. The use of cell-type-specific markers along with a symbiont-specific probe was effective. Finally, the SEM and TEM were used convincingly for specific purposes in the case of showing the cilia that may support water movement.

      We appreciate the valuable feedback provided by the reviewer on our study. It is encouraging to know that our work was found to be interesting and that they conducted a thorough evaluation of our research. We will take their constructive comments into account as we strive to develop and enhance our study. Thank the reviewer for all the input.

      The one particular area for clarification and improvement surrounds the concept of a proliferative progenitor population within the gill. The authors imply that three types of proliferative cells within gills have long been known, but their study may be the first to recover molecular markers for these putative populations. The markers the authors present for gill posterior end budding zone cells (PEBZCs) and dorsal end proliferation cells (DEPCs) are not intuitively associated with cell proliferation and some additional exploration of the data could be performed to strengthen the argument that these are indeed proliferative cells. The authors do utilize a trajectory analysis tool called Slingshot which they claim may suggest that PEBZCs could be the origin of all gill epithelial cells, however, one of the assumptions of this analysis is that differentiated cells are developed from the same precursor PEBZC population.

      However, these conclusions do not detract from the overall significance of the work of identifying the relationship between symbionts and bacteriocytes and how these host bacteriocytes modulate their gene expression in response to environmental change. It will be interesting to see how similar or different these data are across animal phyla. For instance, the work of symbiosis in cnidarians may converge on similar principles or there may be independent ways in which organisms have been able to solve these problems.

      We are grateful for the valuable comments and suggestions provided by the reviewer. All suggestions have been carefully considered, and the manuscript has been revised accordingly. We particularly value the reviewer's insights regarding the characterization of the G. platifrons gill proliferative cell populations. In a separate research endeavor, we have conducted experiments utilizing both cell division and cell proliferation markers on these proliferative cell populations. While these results are not incorporated into the current manuscript, we would be delighted to share our preliminary findings with the reviewer. Our preliminary results indicate that the proliferative cell populations exhibit positivity for cell proliferation markers and contain a significant number of mitotic cells..

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Further experiments are needed to link the changes in transcriptomes of Bathymodioline mussels in the different environmental conditions to changes in their interactions with symbiotes. For example, quantifying the abundance and comparing the morphology of symbiotes between the environmental conditions would lend much support for shifting between milking and farming strategies. Without analyzing the symbiotes and comparing them across populations, it is difficult to comment on the mechanisms of interactions between symbiotes and the hosts. Without this analysis, this data is better suited towards comments about the general effect of environmental perturbation and stress on gene expression in these mussels.

      We appreciate the reviewer’s comments. We are also very curious about the symbiont responses, especially at the single-cell level. However, all the current commercial single-cell RNA-seq technologies are based on oligo dT priming for reverse transcription and barcoding. Therefore, the bacterial gene expression information is omitted from our dataset. Hopefully, with the development of technology, we could conduct an integrated analysis of both host and symbiont gene expression soon.

      Additionally, clarification is needed on which types of symbiotes are being looked at. Are they MOX or SOX populations? Are they homogenous? What are the concentrations of sulfur at the sampled sites?

      We thank you for your valuable comments and suggestions. Gigantidas platifrons harbors a MOX endosymbiont population characterized by a single 16S rRNA phylotype. We apologize for any confusion resulting from our previous wording. To clarify, we have revised lines 57-59 of our introduction

      In the text and images, consider using standardized gene names and leaving out the genome coordinates. This would greatly help with readability. Also, be careful to properly follow gene naming and formatting conventions (ie italicizing gene names and symbols).

      We appreciate the reviewer’s insightful comments. In model animals, gene nomenclature often stems from forward genetic approaches, such as the identification of loss-of-function mutants. These gene names, along with their protein products, typically correspond to unique genome coordinates. Conversely, in non-model invertebrates (e.g., Gigantidas platifrons of present study), gene prediction relies on a combination of bioinformatics methods, including de novo prediction, homolog-based prediction, and transcriptomics mapping. Subsequently, the genes are annotated by identifying their best homologs in well-characterized databases. Given that different genes may encode proteins with similar annotated functions, we chose to include both the gene ID (genome coordinates) and the gene name in our manuscript. This dual labeling approach ensures that our audience receives accurate and comprehensive information regarding gene identification and annotation.

      Additionally, extending KEGG analysis to the atlas annotation section could help strengthen the confidence of annotations. For example, when identifying bacteriocyte populations, the functional categories of individual marker genes (lysosomal proteases, lysosomal traffic regulators, etc) are used to justify the annotation. Presenting KEGG support that these functional categories are upregulated in this population relative to others would help further support how you characterize this cluster by showing it's not just a few specific genes that are enriched in this cell group, but rather an overall functionality.

      We appreciate the valuable suggestion provided by the reviewer. Indeed, incorporating KEGG analysis into the atlas annotation section could further enhance the confidence in our annotations. However, in our study, we encountered some limitations that impeded us from conducting a comprehensive KEGG enrichment analysis.

      Firstly, the number of differentially expressed genes (DEGs) that we identified for certain cell populations was relatively small, making it challenging to meet the threshold required for meaningful KEGG enrichment analysis. For instance, among the 97 marker genes identified for the Bacteriocyte cluster, only two genes, Bpl_scaf_59648-4.5 (lysosomal alpha-glucosidase-like) and Bpl_scaf_52809-1.6 (lysosomal-trafficking regulator-like isoform X1), were identified as lysosomal genes. To generate reliable KEGG enrichments, a larger number of genes is typically required.

      Secondly, single-nucleus sequencing, as employed in our study, tends to yield a relatively smaller number of genes per cell compared to bulk RNA sequencing. This limited gene yield can make it challenging to achieve sufficient gene representation for rigorous KEGG enrichment analysis.

      Furthermore, many genes in the genome still lack comprehensive annotation, both in terms of KEGG and GO annotations. In our dataset, out of the 33,584 genes obtained through single-nuclei sequencing, 26,514 genes have NO KEGG annotation, and 25,087 genes have NO GO annotation. This lack of annotations further restricts the comprehensive application of KEGG analysis in our study.

      The claim that VEPCs are symbiote free is not demonstrated. Additional double in situs are needed to show that markers of this cell type localize in regions free of symbiotes.

      We appreciate your comments and suggestions. In Figure 5B, our results demonstrate that the bacteriocytes (green fluorescent signal) are distant from the VEPCs, which are located around the tip of the gill filaments (close to the food groove). We have revised our Figure 5B to make it clear.

      Additionally, it does not seem like trajectory analysis is appropriate for these sampling conditions. Generally, to create trajectories confidently, more closely sampled time points are needed to sufficiently parse out the changes in expression. More justification is needed for the use of this type of analysis here and a discussion of the limitations should be mentioned, especially when discussing the hypotheses relating to PEBZCs, VEPCs, and DEPCs.

      We greatly appreciate your thoughtful commentary. It is important to acknowledge that in the context of a developmental study, incorporating more closely spaced time points indeed holds great value. In our ongoing project investigating mouse development, for instance, we have implemented time points at 24-hour intervals. However, in the case of deep-sea adult animals, we hypothesized a slower transcriptional shift in such extreme environment, which led us to opt for a time interval of 3-7 days. Examining the differential expression profiles among the three treatments, we observed that most cell types exhibited minimal changes in their expression profiles. For the cell types strongly impacted by in situ transplantation, their expression profiles per cell type still exhibited highly overlap in the UMAP analysis (Figure 6a), thus enabling meaningful comparisons. Nevertheless, we recognize that our sampling strategy may not be flawless. Additionally, the challenging nature of conducting in situ transplantation in 1000-meter depths limited the number of sampling occasions available to us. We sincerely appreciate your input and understanding.

      Finally, more detail should be added on the computational methods used in this paper. For example, the single-cell genomics analysis protocol should be expanded on so that readers unfamiliar with BD single-cell genomics handbooks could replicate the analysis. More detail is also needed on what criteria and cutoffs were used to calculate marker genes. Also, please be careful to cite the algorithms and software packages mentioned in the text.

      Acknowledged, thank you for highlighting this. In essence, the workflow closely resembles that of the 10x Genomics workflow (despite the use of a different software, i.e., Cell Ranger). We better explain the workflow below, and also noting that this information may no longer be relevant for newer users of BD or individuals who are not acquainted with BD, given that the workflow underwent a complete overhaul in the summer of 2023.

      References to lines

      Line 32: typo "..uncovered unknown tissue heterogeny" should read "uncovering" or "and uncovered")

      Overall abstract could include more detail of findings (ex: what are the "shifts in cell state" in line 36 that were observed)

      We apologize for the mistakes, and have revised the manuscript accordingly.

      Line 60: missing comma "...gill filament structure, but also"

      We apologize for the mistakes, and have revised the manuscript accordingly.

      Line 62-63: further discussion here, or in the relevant sections of the specific genes identified in the referenced bulk RNA-seq project could help strengthen confidence in annotation

      We appreciate the comment, and have revised the manuscript accordingly.

      Line 112: what bootstrapping strategy? Applied to what?

      This is a bootstrap sampling algorithm to assess the robustness of each cell cluster developed in a recent biorxiv paper. (Singh, P. & Zhai, Y. Deciphering Hematopoiesis at single cell level through the lens of reduced dimensions. bioRxiv, 2022.2006.2007.495099 (2022). https://doi.org:10.1101/2022.06.07.495099)

      Lines 127-129: What figures demonstrate the location of the inter lamina cells? Are there in situs that show this?

      We apologize for any errors; the referencing of figures in the manuscript has been revised for clarity

      Lines 185-190: does literature support these as markers of SMCs? Are they known smooth muscle markers in other systems?

      We characterized the SMCs by the expression of LDL-associated protein, angiotensin-converting enzyme-like protein, and the "molecular spring" titin-like protein, all of which are commonly found in human vascular smooth muscle cells. Based on this analysis, we hypothesize that these cells belong to the smooth muscle cell category.

      Line 201: What is meant by "regulatory roles"?

      In this context, we are discussing the expression of genes encoding regulatory proteins, such as SOX transcription factors and secreted-frizzled proteins.

      Line 211: which markers disappeared? What in situs show this?

      We apologize for the mistakes, and have revised the manuscript accordingly.

      Line 211: typo, "role" → "roll"

      We apologize for the mistakes, and have revised the manuscript accordingly.

      Line 214: what are these "hallmark genes"

      We apologize for the mistakes, here we are referring to the genes listed in figure 4B. We have revised the manuscript accordingly.

      Line 220: are there meristem-like cells in metazoans? If so, this would be preferable to a comparison with plants.

      In this context, we are discussing the morphological characteristics of gill proliferative cell populations found in filibranch bivalves. These populations, namely PEPC, VEPC, and DEPC, consist of cells exhibiting morphological traits akin to those of plant cambial-zone meristem cells. These cells typically display small, round shapes with a high nucleus-to-plasma ratio. We acknowledge that while these terms are utilized in bivalve studies (citations below), they lack the robust support seen in model systems backed by molecular biology evidences. The present snRNA-seq data, however, may offer valuable cell markers for future comprehensive investigations.

      Leibson, N. L. & Movchan, O. T. Cambial zones in gills of Bivalvia. Mar. Biol. 31, 175-180 (1975). https://doi.org:10.1007/BF00391629

      Wentrup, C., Wendeberg, A., Schimak, M., Borowski, C. & Dubilier, N. Forever competent: deep-sea bivalves are colonized by their chemosynthetic symbionts throughout their lifetime. Environ. Microbiol. 16, 3699-3713 (2014). https://doi.org:10.1111/1462-2920.12597

      Cannuel, R., Beninger, P. G., McCombie, H. & Boudry, P. Gill Development and its functional and evolutionary implications in the blue mussel Mytilus edulis (Bivalvia: Mytilidae). Biol. Bull. 217, 173-188 (2009). https://doi.org:10.1086/BBLv217n2p173

      Line 335: what is slingshot trajectory analysis? Does this differ from the pseudotime analysis?

      Slingshot is an algorithm that uses the principal graph of the cells to infer trajectories. It models trajectories as curves on the principal graph, capturing the progression and transitions between different cellular states.

      Both Slingshot and pseudotime aim to infer cellular trajectories. Slingshot focuses on capturing branching patterns which is fully compatible with the graph generated using dimensionality reduction such as UMAP and PHATE, while pseudotime analysis aims to order cells along a continuous trajectory. It does not rely on dimensionality reduction graphs. We used both in the MS for different purposes.

      Line 241: introduce FISH methodology earlier in the paper, when in situ images are first referenced

      We appreciate the comment, and have revised the manuscript accordingly.

      Line 246-249: can you quantify the decrease in signal or calculate the concentration of symbiotes in the cells? Was 5C imaged whole? This can impact the fluorescent intensity in tissues of different thicknesses.

      We appreciate your comment. In Figure 5C, most of the typical gill filament region is visible (the ventral tip of the gill filament, and the mid part of the gill filament) except for the dorsal end. The gill filament of bathymodioline mussels exhibits a simple structure: a single layer of bacteriocytes grow on the basal membrane. Consequently, the gill slices have a fairly uniform thickness (with two layers of bacteriocytes and one layer of interlamina cells in between), minimizing any potential impact on fluorescent intensity. As of now, detailed quantification of intracellular symbionts may necessitate continuous TEM or ultra-resolution confocal sections to 3D reconstruct the bacteriocytes, which may exceed the scope of the current study. Therefore, fluorescent intensity remains the only method available to us for estimating bacterial density/distribution across the gill filament.

      Line 249: What is meant by 'environmental gradient?'

      Here we are refereeing the gases need for symbiont’s chemosynthesis. We have revised the manuscript to make it clear.

      Lines 255-256: Were the results shown in the TEM images previously known? Not clear what novel information is conveyed in images Fig 5 C and D

      In the Fig 5 C and D, we’ve delivered a high-quality SEM TEM image of a typical bacteriocyte, showcasing its morphology and subcellular machinery with clarity. These electron microscopy images offer the audience a comprehensive introduction to the cellular function of bacteriocytes. Additionally, they serve as supportive evidence for the bacteriocytes' snRNA-seq data.

      Line 295-296: Can you elaborate on what types of solute carrier genes have been shown to be involved with symbioses?

      We appreciate the comment, and have revised the manuscript accordingly. The putative functions of the solute carriers could be found in Figure 5I.

      Line 297-301: Which genes from the bulk RNA-seq study? Adding more detail and references in cluster annotation would help readers better understand the justifications.

      We appreciate the comment, and have revised the manuscript accordingly.

      Line 316 -322: Can you provide the values of the distances?

      We also provide values in the main text, in addition to the Fig6b. We also provide a supplementary Table (Supplementary Table S19).

      Line 328: What are the gene expression patterns?

      We observed genes that are up- and down-regulated in Starvation and reconstitution.

      LIne 334-337: A visualization of the different expression levels of the specific genes in clusters between sites might be helpful to demonstrate the degree of difference between sites.

      We have prepared a new supplementary file showing the different expression levels.

      Line 337: Citation needed

      We appreciate the comment. Here, we hypothesize the cellular responds based on the gene’s function and their expression patterns.

      Line 402-403: Cannot determine lineages from data presented. Need lineage tracing over time to determine this

      We acknowledge the necessity of conducting lineage tracing over time to validate this hypothesis. Nonetheless, in practical terms, it is difficult to obtain samples for testing this. Perhaps, it is easier to use their shallow sea relatives to test this hypothesis. However, in practice, it is very difficult.

      413-414: What are the "cell-type specific responses to environmental change"? It could be interesting to present these results in the "results and discussion" section

      These results are shown in Supplementary Figure S8.

      Line 419-424: Sampling details might go better earlier on in the paper, when the sampling scheme is introduced.

      We appreciate the comments. Here, we are discussing the limitations of our current study, not sampling details.

      Line 552: What type of sequencing? Paired end? How long?

      We conducted 150bp paired-end sequencing.

      556-563: More detail here would be useful to readers not familiar with the BD guide. Also be careful to cite the software used in analysis!

      The provided guide and handbook elucidate the intricacies of gene name preparation, data alignment to the genome, and the generation of an expression matrix. It is worth mentioning that we relied upon outdated versions of the aforementioned resources during our data analysis phase, as they were the only ones accessible to us at the time. However, we have since become aware of a newer pipeline available this year, rendering the information presented here of limited significance to other researchers utilizing BD.

      Many thanks for your kind reminding. We have now included a reference for STAR. All other software was cited accordingly. There are no scholarly papers or publications to refer to for the BD pipeline that we can cite.

      Line 577-578: How was the number of clusters determined? What is meant by "manually combine the clusters?" If cells were clustered by hand, more detail on the method is needed, as well as direct discussion and justification in the body of the paper.

      It would be more appropriate to emphasize the determination of cell types rather than clusters. The clusters were identified using a clustering function, as mentioned in the manuscript. It's important to note that the clustering function (in our case, the FindClusters function of Seurat) provides a general overview based on diffuse gene expression. Technically speaking, there is no guarantee that one cluster corresponds to a single cell type. Therefore, it is crucial to manually inspect the clustering results to assign clusters to the appropriate cell types. In some cases, multiple clusters may be assigned to the same cell type, while in other cases, a single cluster may need to be further subdivided into two or more cell types or sub-cell types, depending on the specific circumstances.

      For studies conducted on model species such as humans or mice, highly and specifically expressed genes within each cluster can be compared to known marker genes of cell types mentioned in previous publications, which generally suffices for annotation purposes. However, in the case of non-model species like Bathymodioline mussels, there is often limited information available about marker genes, making it challenging to confidently assign clusters to specific cell types. In such situations, in situ hybridisation proves to be incredibly valuable. In our study, WISH was employed to visualise the expression and morphology of marker genes within clusters. When WISH revealed the expression of marker genes from a cluster in a specific type of cell, we classified that cluster as a genuine cell type. Moreover, if WISH demonstrated uniform expression of marker genes from different clusters in the same cell, we assigned both clusters to the same cell type.

      We expanded the description of the strategy in the Method section.

      LIne 690-692: When slices were used, what part of the gill were they taken from?

      We sectioned the gill around the mid part which could represent the mature bacteriocytes.

      References to figures:

      General

      Please split the fluorescent images into different channels with an additional composite. It is difficult to see some of the expression patterns. It would also make it accessible to colorblind readers.

      We appreciate the comments and suggestions from the reviewer. We have converted our figures to CMYK colour which will help the colorblind audiences to read our paper.

      Please provide the number of replicates for each in situ and what proportion of those displayed the presented pattern.

      We appreciate the reviewer’s comments. We have explained in the material and methods part of the manuscript.

      Figure 2.C' is a fantastic summary and really helps the non-mussel audience understand the results. Adding schematics like this to Figures 3-5 would be helpful as well.

      We value the reviewer's comments. We propose that Figures 3K, 4C, and 5A-D could offer similar schematic explanations to assist the audience.

      Figure 2:

      Figures 2.C-F, 2.C', 2.H-J are not referenced in the text. Adding in discussions of them would help strengthen your discussions on the cluster annotation

      We appreciate the reviewer's comments. We have revise the manuscript accordingly.

      In 2.B. 6 genes are highlighted in red and said to be shown in in situs, but only 5 are shown.

      We apology for the mistake. We didn’t include the result 20639-0.0 WISH in present study. We have changed the label to black.

      Figure 3:

      FIg 2C-E not mentioned.

      We appreciate the reviewer's comments. We have revise the manuscript accordingly.

      In 3.B 8 genes are highlighted in red and said to be shown in in situs. Only 6 are.

      The result of the WISH were provided in Supplementary Figures S4 and S5.

      FIgure 3.K is not referenced in the legend.

      We appreciate the comment, and have revised the manuscript accordingly.

      Figure 4:

      In Figure D, it might be helpful to indicate the growth direction.

      We appreciate the comment, and have revised the manuscript accordingly by adding an arrow in panel D to indicate growth direction.

      4F: A double in situ with the symbiote marker is needed to demonstrate the nucleolin-like positive cells are symbiote free.

      We appreciate the comment. The symbiont free region could be found in Figure 5A.

      Figure 5:

      In 5.A, quantification of symbiote concentration would help support your conclusion that they are denser around the edges.

      We appreciate the comment, as we mentioned above, detailed quantification of intracellular symbionts may necessitate continuous TEM or ultra-resolution confocal sections to 3D reconstruct the bacteriocytes, which may exceed the scope of the current study. Therefore, fluorescent intensity remains the only method available to us for estimating bacterial density/distribution across the gill filament.

      In 5.D, the annotation is not clear. Adding arrows like in 5.C would be helpful.

      We appreciate the comment, and have revised the manuscript accordingly.

      A few genes in 5.F are not mentioned in the paper body when listing other genes. Mentioning them would help provide more support for your clustering.

      We appreciate the comment, and have revised the manuscript accordingly.

      Is 5.I meant to be color coded with the gene groups from 5.F? Color Coding the gene names, rather than organelles or cellular structures might portray this better and help visually strengthen the link between the diagram and your dot plot.

      We appreciate the suggestions. We've experimented with color-coding the gene names, but some colors are less discernible against a white background.

      Figure 6:

      6.B Is there a better way to visualize this data? The color coding is confusing given the pairwise distances. Maybe heatmaps?

      We attempted a heatmap, as shown in the figure below. However, all co-authors agree that a bar plot provides clearer visualization compared to the heatmap. We agree that the color scheme maya be confusing because they use the same color as for individual treatment. So we change the colors.

      Author response image 1.

      Figure 6.D: Why is the fanmao sample divided in the middle?

      Fig6C show that single-cell trajectories include branches. The branches occur because cells execute alternative gene expression programs. Thus, in Fig 6D, we show changes for genes that are significantly branch dependent in both lineages at the same time. Specifically, in cluster 2, the genes are upregulated during starvation but downregulated during reconstitution. Conversely, genes in cluster 1 are downregulated during starvation but upregulated during reconstitution. It's of note that Fig 6D displays only a small subset of significantly branch-dependent genes.

      FIgure 6.D: Can you visualize the expression in the same format as in figures 2-5?

      We appreciate the comments from the reviewer. As far as we know, this heatmap are the best format to demonstrate this type of gene expression profile.

      Supplementary Figure S2:

      Please provide a key for the cell type abbreviations

      We appreciate the comment, and have added the abbreviations of cell types accordingly.

      Supplementary Figures S4 and S5:

      What part of the larger images are the subsetted image taken from?

      We appreciate the comment, these images were taken from the ventral tip and mid of the gill slices, respectively. We have revised the figure legends to make it clear.

      Supplemental Figure S7:

      If clusters 1 and 2 show genes up and downregulated during starvation, what do clusters 4 and 3 represent?

      Cluster 1: Genes that are obviously upregulated during Starvation, and downregulated during reconstitution; luster4: genes are downregulated during reconstitution but not obviously upregulated during Starvation.

      Cluster 2 show genes upregulated during reconstitution, and cluster 3 obviously downregulated during Starvation.

      Author response table 1.

      Supplemental Figure S8:

      This is a really interesting figure that I think shows some of the results really well! Maybe consider moving it to the main figures of the paper?

      We appreciate the comments and suggestions. We concur with the reviewer on the significance of the results presented. However, consider the length of this manuscript, we have prioritized the inclusion of the most pertinent information in the main figures. Supplementary materials containing additional figures and details on the genes involved in these pathways are provided for interested readers.

      Supplemental Figure S11:

      Switching the axes might make this image easier for the reader to interpret. Additionally, calculating the normalized contribution of each sample to each cluster could help quantify the extent to which bacteriocytes are reduced when starving.

      Thank you for the insightful suggestion, which we have implemented as detailed below. We acknowledge the importance of understanding the changes in bacteriocyte proportions across different treatments. However, it's crucial to note that the percentage of cells per treatment is highly influenced by factors such as the location of digestion and sequencing, as previously mentioned.

      Author response image 2.

      Reviewer #2 (Recommendations For The Authors):

      The following are minor recommendations for the text and figures that may help with clarity:

      Fig. 3K: This figure describes water flow induced by different ciliary cells. It is not clear what the color of the arrows corresponds to, as they do not match the UMAP (i.e. the red arrow) and this is not indicated in the legend. Are these colours meant to indicate the different ciliary cell types? If so it would be helpful to include this in the legend.

      We appreciate the reviewer's comments and suggestions. The arrows indicate the water flow that might be agitated by the certain types of cilium. We have revised our figure and figure legends to make it clear.

      Line 369: The incorrect gene identifier is given for the mitochondrial trifunctional enzyme. This gene identifier is identical to the one given in line 366, which describes long-chain-fatty-acid-ligase ACSBG2-like (Bpl_scaf_28862-1.5).

      We appreciate the reviewer's comments and suggestions. We have revised our manuscript accordingly.

      Line 554: The Bioproject accession number (PRJNA779258) does not appear to lead to an existing page in any database.

      We appreciate the reviewer's comments and suggestions. We have released this Bioproject to the public.

      Line 597-598: it would be helpful to know the specific number of cells that the three sample types were downsampled to, and the number of cells remaining in each cluster, as this can affect the statistical interpretation of differential expression analyses.

      The number of cells per cluster in our analysis ranged from 766 to 14633. To mitigate potential bias introduced by varying cell numbers, we implemented downsampling, restricting the number of cells per cluster to no more than 3500. This was done to ensure that the differences between clusters remained less than 5 times. We experimented with several downsampling strategies, exploring cell limits of 4500 and 2500, and consistently observed similar patterns across these variations.

      Data and code availability:

      The supplementary tables and supplementary data S1 appear to be the final output of the differential expression analyses. Including the raw data (e.g. reads) and/or intermediate data objects (e.g. count matrices, R objects), in addition to the code used to perform the analyses, may be very helpful for replication and downstream use of this dataset. As mentioned above, the Bioproject accession number appears to be incorrect.

      We appreciate the reviewer's comments and suggestions. Regarding our sequencing data, we have deposited all relevant information with the National Center for Biotechnology Information (NCBI) under Bioproject PRJNA779258. Additionally, we have requested the release of the Bioproject. Furthermore, as part of this round of revision, we have included the count matrices for reference.

      Reviewer #3 (Recommendations For The Authors):

      As noted in the public review, my only major concerns are around the treatment of progenitor cell populations. I am sympathetic to the challenges of these experiments but suggest a few possible avenues to the authors.

      First, there could be some demonstration that these cells in G. platifrons are indeed proliferative, using EdU incorporation labeling or a conserved epitope such as the phosphorylation of serine 10 in histone 3. It appears in Mytilus galloprovincialis that proliferating cell nuclear antigen (PCNA) and phospho-histone H3 have previously been used as good markers for proliferative cells (Maiorova and Odintsova 2016). The use of any of these markers along with the cell type markers the authors recover for PEBZCs for example would greatly strengthen the argument that these are proliferative cells.

      If performing these experiments would not be currently possible, the authors could use some computation approaches to strengthen their arguments. Based on conserved cell cycle markers and the use of Cell-Cycle feature analysis in Seurat could the authors provide evidence that these progenitors occupy the G2/M phase at a greater percentage than other cells? Other than the physical position of the cells is there much that suggests that these are proliferative? While I am more convinced by markers in VEPCs the markers for PEBZCs and DEPCs are not particularly compelling.

      While I do not think the major findings of the paper hinge on this, comments such as "the PBEZCs gave rise to new bacteriocytes that allowed symbiont colonization" should be taken with care. It is not clear that the PBEZCs are proliferative and there does not seem to be any direct evidence that PBEZCs (or DEPCs or VEPCS for that manner) are the progenitor cells through any sort of labeling or co-expression studies.

      We appreciate the comments and suggestions from the reviewer. We have considered all the suggestions and have revised the manuscript accordingly. We especially appreciate the reviewer’s suggestions about the characterisations of the G. platifrons gill proliferative cell populations. In a separate research project, we have tested both cell division and cell proliferation markers on the proliferation cell populations. Though we are not able to include these results in the current manuscript, we are happy to share our preliminary results with the reviewer. Our results demonstrate the proliferative cell populations, particularly the VEPCs, are cell proliferation marker positive, and contains high amount of mitotic cells.

      Author response image 3.

      Finally, there is a body of literature that has examined cell proliferation and zones of proliferation in mussels (such as Piquet, B., Lallier, F.H., André, C. et al. Regionalized cell proliferation in the symbiont-bearing gill of the hydrothermal vent mussel Bathymodiolus azoricus. Symbiosis 2020) or other organisms (such as Bird, A. M., von Dassow, G., & Maslakova, S. A. How the pilidium larva grows. EvoDevo. 2014) that could be discussed.

      We appreciate the comments and suggestions from the reviewer. We have considered all the suggestions and have revised the manuscript accordingly (line 226-229).

      Minor comments also include:

      Consider changing the orientation of diagrams in Figure 2C' in relationship to Figure 2C and 2D-K.

      We appreciate the comments and suggestions from the reviewer. The Figure 2 has been reorganized.

      For the diagram in Figure 3K, please clarify if the arrows drawn for the direction of inter lamina water flow is based on gene expression, SEM, or some previous study.

      We are grateful for the reviewer's valuable feedback and suggestions. The arrows in the figure indicate the direction of water flow that could be affected by specific types of cilium. Our prediction is based on both gene expression and SEM results. To further clarify this point, we have revised the figure legend of Fig. 3.

      Please include a label for the clusters in Figure 5E for consistency.

      We have revised our Figure 5E to keep our figures consistent.

      Please include a note in the Materials and Methods for Monocle analysis in Figure 6.

      We conducted Monocle analyses using Monocle2 and Monocle 3 in R environment. We have revised our material and methods with further information of Figure 6.

      In Supplement 2, the first column is labeled PEBC while the first row is labeled PEBZ versus all other rows and columns have corresponding names. I am guessing this is a typo and not different clusters?

      We appreciate the great effort of the reviewer in reviewing our manuscript. We have corrected the typo in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) The authors' findings are primarily rooted in a series of well-conducted in vitro experiments using two CML cell lines, K562 and MEG-01. While the findings are interesting and novel, further work to corroborate these findings in primary CML samples would have greatly strengthened the potential real-world relevance of these discoveries. The authors appear to have some PBMCs from primary CML patients and a BM sample from a Ph+ ALL in which they performed western blot analyses (Fig 1). Couldn't these samples have been used to at least confirm some of the key discoveries? For example, the neddylation of BCR-ABL, or; sensitivity of primary leukemic cells to RAPSYN knockdown, and/or; phosphorylation of RAPSYN by SRC?

      We agree with your points and really appreciate your comments. To demonstrate the clinical relevance, we have conducted a series of experiments to address your concerns.

      (1) after a thorough optimization on the transduction process, we have managed to show that shRNA-mediated gene silencing of RAPSYN impaired the growth of primary CML samples. These additional data are presented as Figure 1D in the revised manuscript with its corresponding figure legend and description, lines 136-141.

      (2) we have invested tremendous time and effort to deal with “key discoveries” regardless of the almost impossible task with a great technical difficulty. With 5 mL (ethical approval) of PBMCs on hands, we have finally managed to confirm BCR-ABL neddylation by IP from two newly acquired CML patients. The results are as presented in Figure 2F in the revised manuscript with its corresponding figure legend and description, lines 186-187.

      (2) The authors initially interrogated a fairly dated (circa 2009) microarray-based primary dataset to show that the increase in RAPSYN is primarily a post-transcriptional event, as mRNA levels are not different between healthy and CML samples. It would be interesting to see whether differences might be more readily seen in more recent RNA-seq datasets from CML patients, given the well-known differences in sensitivity between the two platforms. Additionally, I wonder if there would be transcriptional signatures of increased NEDDylation (or RAPSYN-induced NEDDylation) that could be interrogated in primary samples? Furthermore, there are proteomics datasets of CML cells made resistant to TKIs (through in vitro selection experiments) that could be interrogated for independent validation of the authors' discoveries. For example: from K562 cells, PMID: 30730747 or PMID: 34922009).

      Thank you very much for your constructive comments. Based on your suggestion, we have 1) analyzed mRNA level of RAPSYN in RNA-seq datasets GSE13159 (2009), GSE138883 (2020) and GSE140385 (2020), indicating no difference between CML patients and healthy donors. We have included the results in Figure1-figure supplementary 1A and in the revised manuscript (lines 123-127); 2) examined the RNA levels of RAPSYN-related neddylation enzymes, including E1 (NAE1), E2 (UBE2M), NEDD8 and NEDP1 in these databases, and no significant differences of these neddylation-related genes were found between CML patients and healthy donors as well (Supplementary Figure 2C, lines 168-172).

      We have also analyzed the proteomics datasets from PMID: 30730747 and PMID: 34922009 according to your suggestion. Unfortunately, no information on RAPSYN expression is available in these datasets. To avoid potential negligence, we have examined all CML-related proteomics datasets from 2002 to 2024, still resulting in no information about protein expression of RAPSYN. Consequently, our finding on the higher expression of RAPSYN in the PBMCs of Ph+ patients in this study appears to be an observation for the first time. And we believe that our results should be more clinically relevant than those, if any, from the cells by in vitro selection.

      Reviewer #2 (Public Review):

      Most of the conclusions drawn in this paper are well supported by data, but some aspects of the data need to be clarified and extended:

      (1) The authors propose that targeting RAPSYN in Ph+ leukemia could have a high therapeutic index, suggesting that inhibition of RAPSYN may lead to cytotoxicity in Ph+ leukemia with high specificity and minimal side effects. To substantiate this assertion, the authors should investigate the impact on cell viability upon RAPSYN knockdown in non-Ph leukemic cell lines or HS-5 cells (similar to Figure 1C), despite their lower RAPSYN protein levels.

      We appreciate your valuable comments. When we used shRNA to knockdown the expression of RAPSYN in HS-5 cells, it did not affect the cell growth of HS-5 cells. We have included the data in Figure 1C, modified its figure legend, and added corresponding description, lines 136-141.

      (2) The authors intriguingly show that the protein levels of RAPSYN are significantly enriched in Ph+ patient samples and cell lines (Figure 1A, B), even though the mRNA levels remain unchanged (Supplementary Figure 1 A-C). This observation merits a clear explanation in the context of the presented results. The data in the manuscript does imply a feedforward loop mechanism (Figure 7), where BCR-ABL activates SRC, which subsequently stabilizes RAPSYN, which in turn helps protect BCR-ABL from c-CBL-mediated degradation. If this is the working hypothesis, it would be beneficial for the reader to see supporting evidence.

      Thank you very much for pointing out the issue. We have realized the inappropriateness of Figure 7, which was originally placed as a summarizing figure. To avoid potential confusion and misleading, this figure has been deleted, which does not affect the results and conclusions of this study. In addition, the differences on mRNA levels and protein expressions have been responded to Reviewer #1.

      (3) The authors present compelling evidence to suggest that RAPSYN may possess direct NEDD8-ligase activity on BCR-ABL. To strengthen this claim, it may be valuable to conduct further assays involving a ligase-deficient mutant, such as C366A, beyond its use in Figure 2J. Incorporating this mutant into the in vitro assay illustrated in Figure 2K, for instance, could offer substantial validation for the claim. In addition, showing whether the ligase-deficient mutant is capable of phenocopying the phosphorylation-mutant Y336F, as showcased in Figures 5E, F, and 6D, F, would be beneficial.

      We are grateful to your comments. In the manuscript, we have provided sufficient data to support the direct neddylation of BCR-ABL by RAPSYN, as you commented “The authors present compelling evidence to suggest that RAPSYN may possess direct NEDD8-ligase activity on BCR-ABL.”. Cys366 was previously demonstrated as the catalytic residue essential for E3 activity of RAPSYN (Li et al. 2016, PMID: 27839998), and the phosphorylation at Phe336 was thoroughly verified by site-directed mutagenesis and the treatments of SRC-specific inhibitor saracatinib in present cellular experiments. Therefore, while we fully respect your opinions, we do not think it would be necessary to perform tedious in vitro reactions for expected negative results, which was the reason for us not to conduct enzymatic reactions with known inactive mutants, such as C366A and Y336F, in the first place.

      (4) The observations presented in Figures 6 C-G require additional clarification. Notably, there are discrepancies in relative cell viability effects in K562 cells, and to some extent in MEG-01 cells, under conditions that are indicated as being either identical or highly similar. For instance, this inconsistency is observable when comparing the left panels of Figure 6C and 6D in the case of NC overexpression + shSRC#2, and the left panels of Figure 6E and 6G with NC overexpression or shNC, respectively. Listing potential causes of these discrepancies would strengthen the overall validity of the findings and their subsequent interpretation.

      Thank you for your comments and apologize for the confusion. To make a meaningful comparison, we have revised the method part “Preparation of stable RAPSYNWT, RAPSYNY336F or SRC expression cell lines” (lines 625-627) and reorganized Figure 6 to reflect the differences on the negative controls. In fact, we first used LV6 (EF-1a/Puro; OE-NC1) vector for the overexpression of RAPSYNWT and SRC. Due to low expression level with LV6 and long period of time for subsequent selection, we switched to LV18 (CMV/Puro; OE-NC2) for the overexpression of RAPSYNY336F. Since the sensitivities of K562/MEG01-OE-NC cells to shSRC transduction in Figure 6C (now revised to K562/MEG01-OE-NC1) and 6D (now revised to K562/MEG01-OE-NC2) were noticeably different, we have separated RAPSYNWT and RAPSYNY336F cells as 6C and 6D with their own corresponding empty vector as negative control, instead of merging the results into a single figure with one negative control of OE-NC. In addition, given the fact that K562/MEG01 cells reacted differently upon saracatinib treatments after transduction with the empty vector, we have also distinguished the negative controls as OE-NC1 in Figure 6E, OE-NC2 in Figure 6F and shNC in Figure 6G. Afterall, the transduction of K562/MEG01 cells with different expression vectors and viral particles caused the discrepancies in the experiments of cell viability, which has been clarified by reorganizing Figure 6 in the revision.

      (5) Throughout the manuscript, immunoblots which showcase immunoprecipitations of BCR-ABL or His-BCR-ABL depict poly-neddylation (e.g. Figures 2E-M, 3D-G, and 5A-E) and poly-ubiquitination (e.g. Figures 3D-G) patterns/smears where these patterns seem to extend below the molecular weight of BCR-ABL. To enhance clarity, it would be valuable for the authors to provide an explanation in the text or the figure legend for this observation. Is it reflective of potential degradation of BCR-ABL or is there another explanation behind it?

      Thank you for your valuable comments. After carefully checking original immunoblots, we have ascertained that the protein band of BCR-ABL was at 250 KDa and the smear bands appeared to be higher than 250 KDa were likely caused by the conjugation of NEDD8 (neddylation) or Ubiquitin (ubiquitination) onto BCR-ABL. Regarding the molecular weight of modified BCR-ABL lower than expected, whether it is a common feature as previously reported (Mao, J., et al, 2010, PMID: 21118980) or possible degradation during the modification process or sample preparation requires further investigation. We have corrected the labeling of figures in the revised manuscript.

      Reviewer #1 (Recommendations For The Authors):

      (1) It would really nail the real-world relevance of these nice findings if the authors are able to confirm some aspects of their cell line-based discoveries in publicly available 'omics datasets generated from primary CML samples. I have suggested some of these in the public review as well.

      Alternatively, if they are able to investigate samples from murine CML models (eg. BALB/c CML models), it would represent a step towards real-world relevance.

      Thank you very much for your constructive comments. According to your suggestion, we have examined and analyzed RAPSYN mRNA and protein in updated and publicly available datasets as replied in the public response.

      (2) The Discussion repeats some of the information already presented in the Introduction (for example, lines 311-327 of the merged document, or lines 349-358). I would urge the authors to instead expand more about how RAPSYN might be upregulated at the post-transcriptional level, or its potential post-translational regulation by SRC-mediated phosphorylation.

      Thanks for your constructive suggestion. We have re-written this part according to your suggestion and marked in red color in the revised manuscript, lines 319-325 and lines 351-378.

      (3) There are instances of clunky phrases/grammatical mistakes in the manuscript which detract from its readability (eg: lines 142-143: "...empty body transduced shRAPSN#3 or K562 cells into...."; lines 163-164: "Despite AChR subunits α7, M2, M3, and M4 were expressed in all tested cells, no change..."; line 178: "Preeminent BCR-ABL neddylation was detected in..."). A closer proof-reading of the final manuscript is advisable.

      We appreciate the valuable comments. We have made changes for improvement, which is marked in red color in the revised manuscript, lines 145-147, lines 166-168 and line 185.

      (4) The western blot in Fig 5C (particularly the control "OE-NC" of K562) looks drastically different from the corresponding control lanes in Figs 5A and 5B. Similarly, the cell viability curves presented in Fig 6D and 6F (for both K562 and MEG-01, control conditions) look very different from the corresponding curves in Figs 6A and 6B.

      We appreciate for your valuable comments. Because we accidently used the imagines with different exposure time, the western blots in Fig 5C (particularly the control "OE-NC" of K562) look very different from corresponding control lanes in Figs 5A and 5B. We have replaced images with the same exposure time in the revised manuscript.

      For readers to clearly understand, we have revised the method part “Preparation of stable RAPSYNWT, RAPSYNY336F or SRC expression cell lines” (lines 625-627) and related figure legends to reflect the differences.

      We have publicly responded the discrepancy on cell viability.

      Reviewer #2 (Recommendations For The Authors):

      In reviewing your study, I must insist that the completeness and robustness of your work would significantly benefit from a more exhaustive listing of the antibodies used for immunoblotting and immunoprecipitation within the Materials and Methods section. A number of antibodies have been accounted for, however, crucial ones targeting BCR-ABL, c-CBL, Ubiquitin, NEDD8, HA, Myc, and others appear to be omitted. To maintain rigorous scientific standards, I strongly encourage you to include these.

      We appreciate your comments. We have carefully checked the section of Methods and added detailed information of antibodies for Immunoblotting and Immunoprecipitation in the revised manuscript, lines 502-516.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We are very grateful to the reviewers for their positive appraisal of the manuscript and for their useful comments and suggestions. Below are our answers and corresponding modifications of the manuscript.


      Reviewer #1

      1 - Figures 1&4 focus on JU1264 as the primary double-sensitive strain. However, the authors built their RILs with HK104 by crossing with JU1498 in Figures 7&8. In the results section and/or methods, the authors should provide some justification for this strain switch. Alternatively, the equivalent analysis of Figure 1 focusing on JU1498 would be valuable to demonstrate that the effects of both viruses on fitness are similar to JU1264. I am not recommending that the JU1264xHK104 crosses be performed or that Figures 7&8 be repeated with JU1264xHK104 lines, but that more explanation for strain selection for RIL generation should be provided.

      JU1264 and JU1498 are the strains where SANTV and LEBV were found, respectively. The experiments were performed over the years by different authors and were designed to answer different questions. JU1264 was the strain where the first virus was found and was used as a doubly sensitive strain in Figure 1 and the small RNA experiment. The main reason we chose JU1498 for genetic crosses to discover the genetic basis of LEBV sensitivity is that LEBV was detected and isolated from JU1498. Note that the JU1264 and JU1498 strains come from France and are in the same isotype group at CaeNDR (see also Figure 3) so the two strains may be interchangeable (although we cannot be sure).

      We added in the text concerning the RIL construction: "We chose to use JU1498 as the LEBV-sensitive strain as it was the original strain in which LEBV was discovered."

      2-The authors reasonably claim that the resistance of tropical strains like AF16 could be due to blocking viral entry or early inhibition of replication before the small RNA response is activated. Could the authors test this by directly microinjecting virus (in combination with a dye as a control for successful injection) into the intestine? I understand this could not be done on a scale that would allow for small RNA sequencing, but one could perform small-scale FISH to determine if LEBV or SANTV are replication-competent if the entry barrier is artificially overcome. Such an experiment may require considerable technical development. It may be beyond the scope/timing of this specific study, but it is worth considering to gain some insight into the possible resistance mechanisms observed.

      Although the suggested experiment is in principle a great approach, it is difficult to perform without losing animals during the FISH staining. In addition, in this manuscript we are not particularly searching for the resistance mechanisms of AF16 but trying to present a wider perspective concerning viral infections of C. briggsae and their specificity. We performed small RNA analysis for AF16 together with the sensitive strains and therefore we commented on the lack of small RNA response in AF16 comparing to the sensitive strains. We thus consider that setting up intestinal injections at this point is arduous and beyond the scope of this manuscript.

      Minor Comments: Line 78 - provide the full genus name for Caenorhabditis elegans at first appearance, as done for Caenorhabditis briggsae

      This was modified. Line 117 - The description of cul-6 could also reference Bakowski et al. 2014. This study is referenced more generally as a player in proteostasis a few lines below but could be more explicitly tied to cul-6-mediated resistance to ORV (Bakowski et al. 2014 - see Fig. 7A) This section focus on the use of natural polymorphisms but we added this reference, which is indeed key for the effect of cul-6 knockdown on viral infection in C. elegans. Line 197-198 - The authors could consider adding sequences for FISH probes as part of Table S2. This information could add value to the present study even if previously listed in Frézal et al. We actually removed them from an earlier version since these sequences are already published: here and in further work, it seems preferable to refer to the primary study where these probes were designed, Line 263 - Were embryos obtained by bleaching of gravid adults, or was an egg lay performed, and the embryos were collected from plates? This is potentially an important distinction and should be clarified briefly in the methods. In the section “Preparation of small RNA libraries”, we obtained embryos by bleaching gravid adults.

      We changed the first sentence to “Gravid hermaphrodites from uninfected cultures (AF16, HK104 and JU1264) were harvested using M9 solution, then bleached and washed twice using nuclease-free water. Embryo concentrations were estimated by counting embryos under the dissecting microscope and diluted to 2 embryos per mL of nuclease-free water. 200 embryos of each strain (AF16, HK104 and JU1264) were then plated onto 55 mm NGM plates seeded with E. coli OP50.” We also added “The embryos were obtained by bleaching gravid hermaphrodites.” to the Figure S5 legend. Line 330 - Provide justification for using JU1498 to make these RILs (see comment above). We added this sentence in the Results section. "We chose to use JU1498 as the LEBV-sensitive strain as it was the original strain in which LEBV was discovered." Line 446-Refer to the methods section for full clarity on the role of FISH in this set of experiments or reword for improved clarity. At first read-through, this phrasing made me expect some FISH experiments associated with Fig. 1, which does not appear to be the case.

      We did perform FISH experiments as control that the cultures were infected, as explained in the Methods. We removed this mention from the Results section. Line 478 - The supplementary figure callouts are misaligned with the provided documents. S2A in the text appears to refer to S3A RT-qPCR results. Changed. Line 483 - Similar to above, the text suggests serial dilutions should refer to S4, not S3. Changed. Line 498 - Modify the text to 'Figure 2C and Figure 3' for clarity. Changed. Line 531,535 - viRNAs are defined in line 535 but this should be moved to 531 above at first appearance in the text. Changed. Line 593 - Typo in 'Logarithm of Odds?' Corrected. Line 621-624 - I recommend the authors include the data for the LEBV control experiments with NIL strains, either as a supplementary table, an additional panel for Fig. 6, or represented as done in Figure 8. We removed this sentence. Line 625-632 - How many total genes are represented in the QTL on IV? The reasoning behind testing rde-11 and rsd-2 is sound, but readers might want to know other potential candidates within this region (perhaps something the authors could also speculate on in the discussion). A similar comment applies for # genes in the QTLs on II and III.

      We added in Table S7 the list of detected SNPs and short indels in the chromosome IV region and now indicate in the text "among them over 2700 SNPs and short indels (Table S7)." We added Table S11 with the polymorphisms in the chromosome II QTL region. We note that these tables do not include possible structural variants. The chromosome III QTL being weak, we abstained for this one but the data can now be found using CaeNDR.

      Line 991-992 - Figure 1B - LEBV, SANTV, and co-infection effects on body size are mentioned but not quantified. Has this phenotype been quantified elsewhere? If so, the authors should reference it in the results section or Fig. 1 legend. Alternatively, body size could be quantified as part of this study and added to Fig. 1.

      Because we do not have a large amount of data on body size, we removed "Body size quantification” from Figure 1B legend. Line 1001 - There is a typo in the first sentence; the period after LEBV should be removed. Small suggestion: Figure 2A - While described in the methods, I recommend that the authors briefly reiterate in the figure legend that the white/yellow boxes are intended to indicate serial chunking for clarity.

      We removed the typo and explained the agar chunk representation in the figure legend: "The transfer by chunking a piece of agar is indicated by beige rectangles cut out from one plate and transferred to the next plate." Line 1034 - Small formatting note for Figure 4B - percentages of reads mapping to RNA1 and RNA2 appear underneath gridlines for the graph which obscures visibility and is inconsistent with the other graphs presented.

      This was modified and is indeed clearer. Line 1094 - Figure S1 - this analysis could be strengthened by RT-qPCR represented as fold change in viral load instead of, or in addition to, the agarose gel image (like Fig. S3). Doing so would also allow for the normalization of eft-2 control across individual samples (e.g.: particularly low eft-2 amplification in ED3073). However, these results are sufficiently convincing that LEBV does not replicate in C. elegans, but a more quantitative approach is recommended if feasible for the authors. Alternatively, an additional figure panel and/or repeat of this analysis with C. elegans infected with ORV would also be beneficial as an additional control.

      We do not understand how we can estimate a viral load by a ratio when we do not seem to see any significant amplification. Of course, a RT-qPCR would provide a finite Ct value and a ratio but they are likely to be meaningless. The ED3073 sample did not amplify for eft-2 either and calculating a ratio of high Ct values in a RT-qPCR would be misleading. We could remove the two ED3073 lanes but prefer to leave them.

      Line 1112 - "Experiments using RNA2 primers gave similar results" - if this data isn't included in the study, this text should be removed.

      Removed. Line 1141 - Figure S6 - For full transparency, the authors could consider including HK104 infected with LEBV to show minimal (zero) reads align to the RNA1/RNA2 segments using scales consistent with JU1264 infected with LEBV (S6C) The proportion of reads mapping (0%) are provided in Figure 4A and supplementary tables. We do not show the distribution of antisense 22G and sense 23nt along the LEBV genome for the HK104 (co)infections for the following reasons. 0% of these reads map to LEBV in HK104 monoinfection, and only 0.02% antisense 22G in coinfection. Moreover, the 23nt reads mapping to LEBV-RNA2 in the HK104 coinfection (16.54%;1931 reads) correspond to a 41 bp region with 85% nucleotide similarity between SANTV-RNA2 and LEBV-RNA2. Overall, the few 23nt (+) reads mapping to LEBV in HK104 coinfection are most likely a spillover of the HK104 antiviral response to JUv1264 entry into the intestinal cells.

      Reviewer #2

      Main points: 1. In figure 1C and D, is more than 1 biological replicate performed? Ideally multiple independent infections would be performed which would increase confidence in these experiments, but minimally the authors should make clear that this data was from an experiment only performed once. The conclusion from the life span assays is unlikely to change, but given the variance of the brood size assays within replicates, the conclusions that LEBV infection reduces the brood size is weakly supported.

      We added “Panels C-D correspond to a single experiment (see Methods).” to the legend of Figure 1. We changed the wording to "LEBV and especially the co-infection appeared to lower brood size." We do not have data for independent experiments.

      If the authors want to claim that there is a defect in viral entry in the resistant strains, they should perform infections experiments at an earlier time point that could capture viral invasion. In C. elegans with Orsay virus these experiments have been done as early as 18 hours by FISH. https://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1011120 The way the assays are currently set up, if the infection was cleared it wouldn't be observed.

      The strongest point that indicates that the virus does not replicate is the small RNA experiment, in which the animals were collected on the initial plate inoculated with the virus. We think that our wording was careful:

      We further amended it:

      • in Results " The animals were collected for sRNA sequencing on the plates onto which the viral inoculate was added and where they were constantly exposed to the virus".

      • in Discussion " Indeed, as we did not assay viral entry by sensitive FISH or RT-PCR at early timepoints, it is possible that the viruses are cleared without production of small RNAs."

      The evidence that the region on chromosome III contributes to susceptibility is weak. The analysis in figure 5B does not identify this region and it is not clear to me how to read the scale in figure 5C to determine that a region on chromosome III is significant.

      We added in the Figure legend: "with a LOD score of 10.5, above the threshold calculated by simulations (see Methods)." and detailed the method in the Methods section (see reply to Reviewer 3 below).

      In figure 6 using a more appropriate statistical test such as one way ANOVA with multiple hypothesis testing is necessary to determine if there is a difference between JU2832 and JU2916. It would be helpful if the authors could add more discussion of the evidence that they feel that supports this region being involved in susceptibility.

      We do not think that an ANOVA is appropriate to analyze these proportions which cannot have normal distributions of residuals, therefore we used a generalized linear model, taking genotype and block (day of experiment) into account. This was only explained in the legend and is now explained in the Methods section as well. Maybe the reviewer suggests us to us a global analysis with strain as a factor. We could do this but we do not think that it applies well to this situation: here we test for a specific hypothesis for each one-QTL strain. We have corrected for multiple testing as explained next. The legend now reads: " The significance p values were obtained in a generalized linear model (glm) taking independent experimental blocks and infection replicates into account, testing NILs against their relevant background parent. The p values using the two strains testing for the QTL on chromosome IV and those using the two-QTL strain JU2832 are corrected for multiple testing." In addition, we now provide p values rather than three stars, which reinforce the point (they are very low).

      Minor points 1. In figure 1B it would be helpful to provide more information on the animals chosen to display. Are these representative examples or extreme examples?

      These are representative examples. This detail was added in the legend.

      In figure 2B, adding a legend for the colored dots would be helpful.

      We had indicated: "Dots are replicates within a block, with 100 animals scored per replicate (see Table S4 for the detailed results and Figure S2 and Methods for the experimental design). Experimental blocks are represented by colors and the bar indicates the grand mean of the blocks." 3. In figure 2C, the definitions for a strain to be labeled as belonging to each category should be provided.

      The categorization method is now explained in the Methods section. In addition, Figure 2C legend now refers to Table S4 for the category of each strain. 4. Could the data in figure 2 be used for genome-wide association mapping and compared to the RIL QTL experiments? Adding comment on this would be helpful to understanding the usefulness of this data.

      There are too few strains here to test genome-wide for association. If we had the causative SNP, it would be interesting to assess its frequency but this is beyond the focus and scope of this work, which focused on the outlier phenotype of the HK104 strain. 5. In figure 4b, in HK104 LRBV the numbers in top right corner are not defined.

      We added to the legend of Figure 4B: “For the HK104 infection with LEBV, the number of read counts is provided in the top right corner to signal their rarity compared to ca. 107 in the other conditions. See Table S5 for all read counts. ” 6. Line 1001 remove period from "LEBV.of" and add period after isolates. Removed.

      Reviewer #3 Major comments • The authors provide most data in both a processed and raw format, which is helpful. In two cases (data from 3 DPI, line 492 and LEBV infections in the AF16xHK104 NILs, line 621), the authors state their results, but the data seems not to be provided in the document (at least no direct reference is provided). These are supporting results and do not affect the main conclusions, nevertheless providing the data in form of a table or supplementary figure would be required. Generally, it may help to include a data availability statement to have a combined overview of where data can be found.

      As noted by the reviewer, we tried to provide the data in raw format, but did not judge it necessary when the experiment had two datapoints that are provided in the text. We added the number of animals in the instance where it was missing.

      Minor comments • Line 97-126: Here the manuscript fully focuses on the work in C. elegans. It would be interesting to make clear links to the work in C. briggsae (e.g. mention if homologs are present). The paragraph in line 127 clarifies advantages of studying viral infection in C. briggsae compared to C. elegans. It may be logical to place this information early in the text.

      We added a sentence to link the C. elegans work and C. briggsae. • Line 166 and results from this experiment: Is the LEBV-SANTV mixture consisting of 50uL of both viruses or a total of 50uL (so 25uL of both)? This is also important for the interpretation of results.

      To clarify, we changed to: “50 l ... of an equivolume mix of SANTV and LEBV”. • Line 167: The text says the culture is maintain for 4 days, but then also mentions day 5. Figure 2 clarifies the experimental setup later, but the text could be clearer here.

      Thank you for noticing this. We changed the 4 to 7. • Line 172: What are the nine starter cultures?

      The nine starting cultures were those obtained as described in the paragraph preceding this line in the manuscript. From a plate of infected animals (five L4 larvae), we propagated the infected population by chunking over 3 plates (day 3) and 3*3 plates (day 5). To make this point clear, we have added above: "to generate for the following experiments nine starter cultures for each of the four conditions " • Line 185: 'Infection of the set of C. briggsae natural isolates'. From the text it is not clear what set the authors refer to.

      We changed to "a set" and refer to Figure 2B and Table S4 in the sentence below for the list of natural isolates. • Line 223: 'The proportion of infected animals were overall higher in Batch3 but the qualitative results are similar'. It is unclear why this statement is here instead of in the result section and it is also not clear what the authors mean by the second part of the sentence.

      We moved the sentence to Results and changed it to: " The proportion of infected animals were overall higher in Batch 3 but the relative results of the different strains were similar for the three batches." • Line 326: Is 'the same method as above' using FISH or RT-qPCR?

      Changed to "using FISH as above". • Line 382: What do the authors mean by 'two cross directions'?

      We removed this mention as the method is better explained in the next sentence.

      • Line 454-458: The data presented here does not appear well integrated in the storyline. It does not fit under the subheading. Perhaps it would be a better fit under the subheading of line 462? We moved it below the subheading. • Line 478: Reference to Fig S2 should be reference to Fig S3

      Changed. • Line 483: Reference to Fig S3 should be reference to Fig S4

      Changed. • Line 540-544: The sentence reads as a contradiction (C. elegans defends itself using RNAi, C. briggsae blocks viral infection during entry). As a result, the sentence reads as if RNAi is not of much antiviral importance in C. briggsae, but that cannot be concluded from this data. I am not sure if this is what the authors aim to suggest, but another word choice (e.g. changing 'whereas' and 'this does not seem the case for C. briggsae') may be considered.

      We changed the wording to " whereas the C. elegans N2 reference strain allows for viral entry and defends itself against ORV via its small RNA response (Félix et al. 2011; Ashe et al. 2013; Shirayama et al. 2014; Coffman et al. 2017), in the tested resistant C. briggsae strains, the viruses appeared to be blocked at entry or at early steps of the viral cycle." • Line 585 and 592: There are two QTL approaches being applied and referred to as 'the one- and two-QTL analyses'. The description in this part is rather technical and the terminology is not clear. As a result, for readers not familiar with QTL mapping, the biological interpretation may become obscured.

      We now explain in Methods: " ...scanning each pair of positions for several models, including single-QTL, full, additive and epistatic. The significance threshold LOD score of each model was estimated via 1,000 permutation tests with a coefficient of risk a=0.05. The threshold was 4.91 for the additive model and 6.09 for the full model. The LOD score of each pair of position is represented by a color scale in Figure 5C). The combination of the chromosomes III and IV QTLs had a LOD score of 10.5 in the full and additive models. No epistatic interaction was detected. The LOD score of the single-QTL model comparison was below the threshold."

      • Line 659: The authors end the section about natural genetic variation in the response to SANTV with candidate genes and a CRISPR experiment. As the authors identify a small genetic region associated with LEBV susceptibility, it would be interesting to hear about any candidate genes in this region. There are still many genes and more importantly, many polymorphisms in this region (ca. 700 single-nucleotide polymorphisms and short indels). Because structural variants are difficult to call (long-read sequencing has not been performed on the parents), we had preferred to abstain to provide a list of polymorphisms that would be incomplete and preferentially point towards SNPs. However, because of the reviewer's query, we now provide it in Table S11.

      • Line 674: The authors make use of HK104 strain in this study as it is the exception in their dataset that provides resistance against LEBV, but not SANTV. Possibly, the genetic variation linked to viral susceptibility uncovered using HK104 may therefore be relatively uncommon in C. briggsae. The implications of this choice and option for other studies using different genotypes could be interesting to discuss in this short paragraph. The aim in here is to discover why HK104 is specifically resistant to one virus and not the other. There is a possibility of uncovering a specific mechanism that is present in only two or three strains of our 40-strain dataset but we find this specificity particularly

      interesting, regardless of its prevalence. We explore in the Discussion which of the two crosses may reveal the specificity.

      • Line 774: The IPR is already described on abbreviated in line 742. As a reader, we prefer having the abbreviation explained twice than not understanding it. • Overall, to reach a broader audience, the manuscript can expand explanations in the discussion. E.g. statements in line 695 and 773, refer to previous observations, but do not explain them in enough detail to understand parallels between this and previous studies without prior knowledge.

      We added some explanations, specifically for lines 695 and 773 (of previous version). • Figure 2: Only HK104 is labelled in the figure, it would be useful to also see HK105 as this strain is also explicitly mentioned in the text.

      We now included HK105 and strains that are used in further experiments.

      • Figure 2: It is not clear from the results or methods how strains as designated into a certain class. The figure legend says variability in the data is taken into account and that is why some strains are close to each other, yet distinct in class, but how this works is not described. We now explain our criteria. See above in the response to Reviewer 2. • Figure S3: The strain JU1264 and JU1498 are mentioned thrice (as '2', 'rep' and 'ref'). These annotations should be clarified.

      These explanations were indeed missing. We now explain them in the figure legend. • Figure S4: The figure would benefit from a division in panels per strain to facilitate comparisons across strains.

      Indeed. We now added a division in panels per strain. • Figure S4: Have the authors correlated viral loads with the number of infected animals? This could result in addition information if not all individuals are infected equally.

      We have not done so in this precise experiment but preferred to use the number of infected animals in most other experiments, in particular because it is less subject to outlier effects. • Figure S4: Could the authors clarify the meaning of JU1264 Rep?

      It is explained in the legend: "The undiluted viral preparations on JU1264 are used to normalize and are indicated as "JU1264 1/1". A separate replicate was performed and indicated as "JU1264 Rep"."

      • Figure 8: The meaning of the stars in this figure is a bit confusing and the description of these stars in the legend is not clear. Indeed. We changed the legend to: " ***: p<0.001 comparing JU4034 with its parent strain HK104 using a generalized linear model."
    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, Qiu and colleagues examined the effects of preovulatory (i.e., proestrous or late follicular phase) levels of circulating estradiol on multiple calcium and potassium channel conductances in arcuate nucleus kisspeptin neurons. Although these cells are strongly linked to a role as the "GnRH pulse generator," the goal here was to examine the physiological properties of these cells in a hormonal milieu mimicking late proestrus, the time of the preovulatory GnRH-LH surge. Computational modeling is used to manipulate multiple conductances simultaneously and support a role for certain calcium channels in facilitating a switch in firing mode from tonic to bursting. CRISPR knockdown of the TRPC5 channel reduced overall excitability, but this was only examined in cells from ovariectomized mice without estradiol treatment. The patch clamp experiments are comprehensive and overall solid but a direct demonstration of the role of these conductances in being necessary for surge generation (or at least having a direct physiological consequence on surge properties) is lacking, substantially reducing the impact of the findings.

      Strengths:

      (1) Examination of multiple types of calcium and potassium currents, both through electrophysiology and molecular biology.

      (2) Focus on arcuate kisspeptin neurons during the surge is relatively conceptually novel as the anteroventral periventricular nucleus (AVPV) kisspeptin neurons have received much more attention as the "surge generator" population.

      (3) The modeling studies allow for direct examination of manipulation of single and multiple conductances, whereas the electrophysiology studies necessarily require examination of each current in isolation. The construction of an arcuate kisspeptin neuron model promises to be of value to the reproductive neuroendocrinology field.

      We thank the reviewer for recognizing our comprehensive examination of Kiss-ARH neurons through electrophysiological, molecular and computational modeling of their activity during the preovulatory surge, which as the reviewer pointed out is “conceptually novel.” We will bolster our argument that Kiss1-ARH neurons transition from synchronized firing to burst firing with the E2-mediated regulation of channel expression with the addition of new experiments. We will address the weaknesses as follows:

      Weaknesses:

      (1) The novelty of some of the experiments needs to be clarified. This reviewer's understanding is that prior experiments largely used a different OVX+E2 treatment paradigm mimicking periods of low estradiol levels, whereas the present work used a "high E2" treatment model. However, Figures 10C and D are repeated from a previous publication by the same group, according to the figure legend. Findings from "high" vs. "low" E2 treatment regimens should be labeled and clearly separated in the text. It would also help to have direct comparisons between results from low E2 and high E2 treatment conditions.

      We will revise Figures 10C and 10D to include new findings on Tac2 and Vglut2 expression in OVX and E2-treated Kiss1ARH. We did show the previously published data (Qiu, eLife 2018) to contrast with Figures 10E, F showing the downregulation of TRPC5 and GIRK2 channels following E2 treatment. Most importantly, our E2 treatment regime is clearly stated in the Methods and is exactly the same that was used previously (Qiu, eLife 2016 and Qiu, eLife 2018) for the induction of the LH surge in OVX mice (Bosch, Molecular and Cellular Endocrinology 2013) .

      (2) In multiple places, links are made between the changes in conductances and the transition from peptidergic to glutamatergic neurotransmission. However, this relationship is never directly assessed. The data that come closest are the qPCR results showing reduced Tac2 and increased Vglut2 mRNA, but in the figure legend, it appears that these results are from a prior publication using a different E2 treatment regimen.

      In the revised Figure 1, we will now include a clear depiction of the transition from synchronized firing driven by NKB signaling in OVX females to burst firing driven by glutamate in E2-treated females. We have used the same E2 treatment paradigm as previously published (Qiu, eLife 2018).

      (3) Similarly, no recordings of arcuate-AVPV glutamatergic transmission are made so the statements that Kiss1ARH neurons facilitate the GnRH surge via this connection are still only conjecture and not supported by the present experiments.

      Using a horizontal hypothalamic slice preparation, we have shown that Kiss1-ARH neurons excite GnRH neurons via Kiss1ARH glutaminergic input to Kiss1AvPV neurons (summarized in Fig. 12, Qiu, eLife 2016). We do not think that it is necessary to repeat these experiments in the current manuscript.

      (4) Figure 1 is not described in the Results section and is only tenuously connected to the statement in the introduction in which it is cited. The relevance of panels C and D is not clear. In this regard, much is made of the burst firing pattern that arises after E2 treatment in the model, but this burst firing pattern is not demonstrated directly in the slice electrophysiology examples.

      We will revised Figure 1 to include new whole-cell, current clamp recordings documenting the burst firing in response to glutamate in E2-treated, OVX females.

      (5) In Figure 3, it would be preferable to see the raw values for R1 and R2 in each cell, to confirm that all cells were starting from a similar baseline. In addition, it is unclear why the data for TTA-P2 is not shown, or how many cells were recorded to provide this finding.

      Before initiating photo-stimulation for each Kiss1-ARH neuron, we adjust the resting membrane potential to -70 mV, as noted in each panel in Figure 3, through current injections. We will include new findings on the effects of the T-channel blocker TTA-P2 on slow EPSP in the revised Figure 3. The number of cells tested with each calcium channel blocker is depicted in each of the bar graphs summarizing the effects of the blockers.

      (6) In Figure 5, panel C lists 11 cells in the E2 condition but panel E lists data from 37 cells. The reason for this discrepancy is not clear.

      In Figure 5E, we measured the L-, N-, P/Q and R channel currents after pretreatment with TTA-P2 to block the T-type current, whereas in Figure 5C, we measured the current without TTA-P2.

      (7) In all histogram figures, it would be preferable to have the data for individual cells superimposed on the mean and SEM.

      In all revised Figures we will include the individual data points for the individual neurons.

      (8) The CRISPR experiments were only performed in OVX mice, substantially limiting interpretation with respect to potential roles for TRPC5 in shaping arcuate kisspeptin neuron function during the preovulatory surge.

      The TRPC5 channels are most important for generating slow EPSPs when expression of NKB is high in the OVX state. Conversely, the glutamatergic response becomes more significant when the expression of NKB and TRPC5 channel are muted. Therefore, the CRISPR experiments were specifically conducted in OVX mice to maximize the effects.

      (9) Furthermore, there are no demonstrations that the CRISPR manipulations impair or alter the LH surge.

      In this manuscript, our focus is on the cellular electrophysiological activity of the Kiss1ARH neurons in ovx and E2-treated females. Exploration of CRISPR manipulations related to the LH surge is certainly slated for future experiments, but these in vivo experiments are beyond the scope of these comprehensive cellular electrophysiological and molecular studies.

      (10) The time of day of slice preparation and recording needs to be specified in the Methods.

      We will provide the times of slice preparation and recordings in the revised Methods and Materials.

      Reviewer #2 (Public Review):

      Summary:

      Kisspeptin neurons of the arcuate nucleus (ARC) are thought to be responsible for the pulsatile GnRH secretory pattern and to mediate feedback regulation of GnRH secretion by estradiol (E2). Evidence in the literature, including the work of the authors, indicates that ARC kisspeptin coordinate their activity through reciprocal synaptic interactions and the release of glutamate and of neuropeptide neurokinin B (NKB), which they co-express. The authors show here that E2 regulates the expression of genes encoding different voltage-dependent calcium channels, calcium-dependent potassium channels, and canonical transient receptor potential (TRPC5) channels and of the corresponding ionic currents in ARC kisspeptin neurons. Using computer simulations of the electrical activity of ARC kisspeptin neurons, the authors also provide evidence of what these changes translate into in terms of these cells' firing patterns. The experiments reveal that E2 upregulates various voltage-gated calcium currents as well as 2 subtypes of calcium-dependent potassium currents while decreasing TRPC5 expression (an ion channel downstream of NKB receptor activation), the slow excitatory synaptic potentials (slow EPSP) elicited in ARC kisspeptin neurons by NKB release and expression of the G protein-associated inward-rectifying potassium channel (GIRK). Based on these results, and on those of computer simulations, the authors propose that E2 promotes a functional transition of ARC kisspeptin neurons from neuropeptide-mediated sustained firing that supports coordinated activity for pulsatile GnRH secretion to a less intense firing in glutamatergic burst-like firing pattern that could favor glutamate release from ARC kisspeptin. The authors suggest that the latter might be important for the generation of the preovulatory surge in females.

      Strengths:

      The authors combined multiple approaches in vitro and in silico to gain insights into the impact of E2 on the electrical activity of ARC kisspeptin neurons. These include patch-clamp electrophysiology combined with selective optogenetic stimulation of ARC kisspeptin neurons, reverse transcriptase quantitative PCR, pharmacology, and CRIPR-Cas9-mediated knockdown of the Trpc5 gene. The addition of computer simulations for understanding the impact of E2 on the electrical activity of ARC kisspeptin cells is also a strength.

      The authors add interesting information on the complement of ionic currents in ARC kisspeptin neurons and on their regulation by E2 to what was already known in the literature. Pharmacological and electrophysiological experiments appear of the highest standards. Robust statistical analyses are provided throughout, although some experiments (illustrated in Figures 7 and 8) do have rather low sample numbers.

      The impact of E2 on calcium and potassium currents is compelling. Likewise, the results of Trpc5 gene knockdown do provide good evidence that the TRPC5 channel plays a key role in mediating the NKB-mediated slow EPSP. Surprisingly, this also revealed an unsuspected role for this channel in regulating the membrane potential and excitability of ARC kisspeptin neurons.

      We thank the reviewer for recognizing that the “pharmacological and electrophysiological experiments appear of the highest standards” and “the addition of the computer modeling for understanding the impact of E2 on the electrical activity of ARC kisspeptin cells is also a strength. However, we agree with the reviewer that we need to provide a direct demonstration of “burst-like” firing of Kiss1-ARH neurons. We will address the weaknesses as follows:

      Weaknesses:

      The manuscript also has weaknesses that obscure some of the conclusions drawn by the authors.

      One has to do with the fact that "burst-like" firing that the authors postulate ARC kisspeptin neurons transition to after E2 replacement is only seen in computer simulations, and not in slice patch-clamp recordings. A more direct demonstration of the existence of this firing pattern, and of its prominence over neuropeptide-dependent sustained firing under conditions of high E2 would make a more convincing case for the authors' hypothesis.

      We will provide a more direct demonstration of the existence of this firing pattern in the whole-cell current clamp experiments in the revised Figure 1.

      In addition, and quite importantly, the authors compare here two conditions, OVX versus OVX replaced with high E2, that may not reflect the physiological conditions (the diestrous [low E2] and proestrous [high E2] stages of the estrous cycle) under which the proposed transition between neuropeptide-dependent sustained firing and less intense burst firing might take place. This is an important caveat to keep in mind when interpreting the authors' findings. Indeed, that E2 alters certain ionic currents when added back to OVX females, does not mean that the magnitude of these ionic currents will vary during the estrous cycle.

      We have published that the magnitude of the slow EPSP, which is TRPC5 channel mediated, varies throughout the estrous cycle and the similarity to that found in OVX compared to E2-treated, OVX females (Figure 2, Qiu, eLife 2016). Moreover, TRPC5 channel mRNA expression, similar to the peptides, is downregulated by an E2 treatment (Figure 10 this manuscript) that mimics proestrus levels of the steroid (Bosch, Mol Cell Endocrinology 2013). Furthermore, the magnitude of ionic currents is directly proportional to the number of ion channels expressed in the plasma membrane, which we have found correlates with mRNA expression. Therefore, it is likely that the magnitude of these ionic currents will vary during the estrous cycle.

      Lastly, the results of some of the pharmacological and genetic experiments may be difficult to interpret as presented. For example, in Figure 3, although it is possible that blockade of individual calcium channel subtypes suppresses the slow EPSP through decreased calcium entry at the somato-dendritic compartment to sustain TRPC5 activation and the slow depolarization (as the authors imply), a reasonable alternative interpretation would be that at least some of the effects on the amplitude of the slow EPSP result from suppression of presynaptic calcium influx and, thus, decreased neurotransmitter and neuropeptide secretion. Along the same lines, in Figure 12, one possible interpretation of the observed smaller slow EPSPs seen in mice with mutant TRPC5 could be that at least some of the effect is due to decreased neurotransmitter and neuropeptide release due to the decreased excitability associated with TRPC5 knockdown.

      The reviewer raises a good point, but our previous findings clearly demonstrate that chelating intracellular calcium with BAPTA in whole-cell current clamp recordings abolishes the slow EPSP and persistent firing (Qiu, J. Neurosci 2021), which we have noted is the rationale for dissecting out the contribution of T, R, N, L and P/Q calcium channels to the slow EPSP in our current studies (revised Figure 3 will include the effects of T-channel blocker).

      However, to further bolster the argument for the post-synaptic contribution of the calcium channels to the slow EPSP and eliminate the potential presynaptic effects of calcium channel blockers on the postsynaptic slow EPSP amplitude, which may result from reduced presynaptic calcium influx and subsequently decreased neurotransmitter release, we will utilized an additional strategy. Specifically, we will measure the response to the externally administered TACR3 agonist senktide under conditions in which the extracellular calcium influx, as well as neurotransmitter and neuropeptide release, are blocked (new Figure 3).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      Following small molecule screens, this study provides convincing evidence that 7,8 dihydroxyflavone (DHF) is a competitive inhibitor of pyridoxal phosphatase. These results are important since they offer an alternative mechanism for the effects of 7,8 dihdroxyflavone in cognitive improvement in several mouse models. This paper is also significant due to the interest in the protein phosphatases and neurodegeneration fields.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zink et al set out to identify selective inhibitors of the pyridoxal phosphatase (PDXP). Previous studies had demonstrated improvements in cognition upon removal of PDXP, and here the authors reveal that this correlates with an increase in pyridoxal phosphate (PLP; PDXP substrate and an active coenzyme form of vitamin B6) with age. Since several pathologies are associated with decreased vitamin B6, the authors propose that PDXP is an attractive therapeutic target in the prevention/treatment of cognitive decline. Following high throughput and secondary small molecule screens, they identify two selective inhibitors. They follow up on 7, 8 dihydroxyflavone (DHF). Following structure-activity relationship and selectivity studies, the authors then solve a co-crystal structure of 7,8 DHF bound to the active site of PDXP, supporting a competitive mode of PDXP inhibition. Finally, they find that treating hippocampal neurons with 7,8 DHF increases PLP levels in a WT but not PDXP KO context. The authors note that 7,8 DHF has been used in numerous rodent neuropathology models to improve outcomes. 7, 8 DHF activity was previously attributed to activation of the receptor tyrosine kinase TrkB, although this appears to be controversial. The present study raises the possibility that it instead/also acts through modulation of PLP levels via PDXP, and is an important area for future work.

      Strengths:

      The strengths of the work are in the comprehensive, thorough, and unbiased nature of the analyses revealing the potential for therapeutic intervention in a number of pathologies.

      Weaknesses:

      Potential weaknesses include the poor solubility of 7,8 DHF that might limit its bioavailability given its relatively low potency (IC50= 0.8 uM), which was not improved by SAR. However, the compound has an extended residence me and the co-crystal structure could aid the design of more potent molecules and would be of interest to those in the pharmaceutical industry. The images related to crystal structure could be improved.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors performed a screening for PDXP inhibitors to identify compounds that could increase levels of pyridoxal 5'- phosphate (PLP), the co-enzymatically active form of vitamin B6. For the screening of inhibitors, they first evaluated a library of about 42,000 compounds for activators and inhibitors of PDXP and secondly, they validated the inhibitor compounds with a counter-screening against PGP, a close PDXP relative. The final narrowing down to 7,8-DHF was done using PLP as a substrate and confirmed the efficacy of this flavonoid as an inhibitor of PDXP function. Physiologically, the authors show that, by acutely treating isolated wild-type hippocampal neurons with 7,8-DHF they could detect an increase in the ratio of PLP/PL compared to control cultures. This effect was not seen in PDXP KO neurons.

      Strengths:

      The screening and validation of the PDXP inhibitors have been done very well because the authors have performed crystallographic analysis, a counter screening, and mutation analysis. This is very important because such rigor has not been applied to the original report of 7,8 DHF as an agonist for TrkB. Which is why there is so much controversy on this finding.

      Weaknesses:

      As mentioned in the summary report the study may benefit from some in vivo analysis of PLP levels following 7,8-DHF treatment, although I acknowledge that it may be challenging because of the working out of the dosage and timing of the procedure.

      Reviewer #3 (Public Review):

      This is interesting biology. Vitamin B6 deficiency has been linked to cognitive impairment. It is not clear whether supplements are effective in restoring functional B6 levels. Vitamin B6 is composed of pyridoxal compounds and their phosphorylated forms, with pyridoxal 5-phosphate (PLP) being of particular importance. The levels of PLP are determined by the balance between pyridoxal kinase and phosphatase activities. The authors are testing the hypothesis that inhibition of pyridoxal phosphatase (PDXP) would arrest the age-dependent decline in PLP, offering an alternative therapeutic strategy to supplements. Published data illustrating that ablation of the Pdxp gene in mice led to increases in PLP levels and improvement in learning and memory trials are consistent with this hypothesis.

      In this report, the authors conduct a screen of a library of ~40k small molecules and identify 7,8dihydroxyflavone (DHF) as a candidate PDXP inhibitor. They present an initial characterization of this micromolar inhibitor, including a co-crystal structure of PDXP and 7,8-DHF. In addition, they demonstrate that treatment of cells with 7,8 DHP increases PLP levels. Overall, this study provides further validation of PDXP as a therapeutic target for the treatment of disorders associated with vitamin B6 deficiency and provides proof-of-concept for inhibition of the target with small-molecule drug candidates.

      Strengths include the biological context, the focus on an interesting and under-studied class of protein phosphatases that includes several potential therapeutic targets, and the identification of a small molecule inhibitor that provides proof-of-concept for a new therapeutic strategy. Overall, the study has the potential to be an important development for the phosphatase field in general.

      Weaknesses include the fact that the compound is very much an early-stage screening hit. It is an inhibitor with micromolar potency for which mechanisms of action other than inhibition of PDXP have been reported. Extensive further development will be required to demonstrate convincingly the extent to which its effects in cells are due to on-target inhibition of PDXP.

      Recommendations for the authors:

      There is general agreement that the study represents an advance regarding the mechanisms of pyridoxal phosphatase and 7,8 DHF. From the reviewers' comments, several major questions and considerations are raised, followed by their detailed remarks:

      (1) More analysis of the solubility and dose of 7,8 DHF with regard to the 50% inhibition and the salt bridge of the B protomer, as raised by the reviewers.

      (2) Is there a possible involvement of another phosphatase?

      (3) Does 7,8 DHF cause an effect upon TrkB tyrosine phosphorylation?

      We thank the Reviewers and Editors for their fair and constructive comments and suggestions. We have performed additional experiments to address these questions and considerations. In addition, we have generated two new high-resoling (1.5 Å) crystal structures of human PDXP in complex with 7,8-DHF that substantially expand our understanding of 7,8-DHF-mediated PDXP inhibition. The scientist who performed this work for the revision of our manuscript has been added as an author (shared first authorship).

      We believe that the insights gained from these new data have further strengthened and improved the quality of our manuscript. Together, our data provide compelling evidence that 7,8-dihydroxyflavone is a direct and competitive inhibitor of pyridoxal phosphatase.

      Please find our point-by-point responses to the Public Reviews that are not addressed in the Recommendations for the Authors, and the Recommendations for the Authors below.

      Reviewer #2:

      As mentioned in the summary report the study may benefit from some in vivo analysis of PLP levels following 7,8-DHF treatment, although I acknowledge that it may be challenging because of the working out of the dosage and timing of the procedure.

      We agree that an in vivo analysis of PLP levels following 7,8-DHF treatment could be informative for the further evaluation of a possible mechanistic link between the reported effects of this compound and PDXP/vitamin B6. However, we currently do not have a corresponding animal experimentation permission in place and are unlikely to obtain such a permit within a reasonable me frame for this revision.

      Recommendations For The Authors:

      Reviewer #1:

      The work is already well-written, comprehensive, and convincing.

      Suggestions that could improve the manuscript.

      (1) Include a protein tyrosine phosphatase (PTP) in the selectivity analysis. One possibility is that 7,8 DHF acts on a PTP (such as PTP1B), leading to TrkB activation by preventing dephosphorylation. I note that a previous study has looked at SAR for flavones with PTP1B (PMID: 29175190), which is worth discussion.

      We thank the reviewer for bringing this interesting possibility to our attention. We were not aware of the SAR study for flavonoids with PTP1B by Proenca et al. but have now tested the effect of 7,8-DHF on PTP1B, referring to this paper. As shown in Figure 2d, PTP1B was not inhibited by 7,8-DHF at a concentration of 5 or 10 µM. At the highest tested concentration of 40 µM, 7,8-DHF inhibited PTP1B merely by ~20%. For comparison, compound C13 (3-hydroxy-7,8-dihydroxybenzylflavone-3’,4’dihydroxymethyl-phenyl), which emerged as the most active flavonoid in the SAR study by Proenca et al. inhibited PTP1B with an IC50 of 10 µM. Consistent with the results of these authors, our finding confirms that less polar substituents, such as O-benzyl groups at positions 7 and 8, and O-methyl groups at positions 3’ and 4’ of the flavone scaffold, are important for the ability of flavonoids to effectively inhibit PTP1B. We conclude that PTP1B inhibition by 7,8-DHF is unlikely to be a primary contributor to the reported cellular and in vivo effects of this flavone.

      In addition to PTP1B, we have now additionally tested the effect of 7,8-DHF on the serine/threonine protein phosphatase calcineurin/PP2B, the DNA/RNA-directed alkaline phosphatase CIP, and three other metabolite-directed HAD phosphatases, namely NANP, NT5C1A and PNKP. PP2B, CIP and NANP were not inhibited by 7,8-DHF. Similar to PTP1B, PNKP activity was attenuated (~30%) only at 40 µM 7,8-DHF. In contrast, 7,8-DHF effectively inhibited NT5C1A (IC50 ~10 µM). NT5C1A is an AMP hydrolase expressed in skeletal muscle and heart. To our knowledge, a role of NT5C1A in the brain has not been reported. Based on currently available information, the inhibition of NT5C1A therefore appears unlikely to contribute to 7,8-DHF effects in the brain.

      The results of these experiments are shown in the revised Figure 2d. Taken together, the extended selectivity analysis of 7,8-DHF on a total of 12 structurally and functionally diverse protein- and nonprotein-directed phosphatases supports our initial conclusion that 7,8-DHF preferentially inhibits PDXP.

      (2) Line 144: It is unclear how fig 2c supports the statement here. Remove call out for clarity.

      Our intention was to highlight the fact that 7,8-DHF concentrations >12.5 µM could not be tested in the BLI assay (shown in Figure 2c) due to 7,8-DHF solubility issues under these experimental conditions. However, since this is discussed in the text, but not directly visible in Figure 2c, we agree with the Reviewer and have removed this call out.

      (3) Figure 3a. It is difficult to see the pink 7,8 DHF on top of the pink ribbon backbone. A better combination of colours could be used. Likewise in Figure 3b it is pink on pink again.

      We have improved the combination of colors to enhance the visibility of 7,8-DHF and have consistently color-coded murine and the new human PDXP structures throughout the manuscript.

      (4) Figure 3c and d. These are the two protomers I believe, but the colour coding is not present in 3c where the ribbon is now gray. Please choose colours that can be used to encode protomers throughout the figure.

      Please see response to point 3 above.

      (5) Figure 3f. I think this is the same protomer as 3c but a 180-degree rotation. Could this be indicated, or somehow lined up between the two figures for clarity? It would also be useful to have 3e in the same orientation as 3f, to better visualise the overlap with PLP binding. PLP and 7,8 DHF could be labelled similarly to the amino acids in 3f (the colour coding here is helpful).

      Please see response to point 3 above. We have substantially revised the structural figures and have used consistent color coding and the same perspective of 7,8-DHF in the PDXP active sites.

      (6) Figure 3g. The colours of the bars relating to specific mutations do not quite match the colours in Figure 3f, which I think was the aim and is very helpful.

      We have adapted the colours of the residues in Figure 3f (now Fig. 3b and additionally Fig. 3 – figure supplement 1e) so that they exactly match the colours of the bars in Figure 3g (now Fig. 3d).

      Reviewer #2:

      No further comments.

      Reviewer #3:

      Page 4: The authors describe 7,8DHF as a "selective" inhibitor of PDXP - in my opinion, they do not have sufficient data to support such a strong assertion. Reports that 7,8DHF may act as a TRK-B-agonist already highlight a potential problem of off-target effects. Does 7,8DHF promote tyrosine phosphorylation of TRK-B in their hands? The selectivity panel presented in Figure 2, focusing on 5 other HAD phosphatases, is much too limited to support assertions of selectivity.

      We agree with the Reviewer that our previous selectivity analysis with six HAD phosphatases was limited. To further explore the phosphatase target spectrum of 7,8-DHF, we have now analyzed six other enzymes: three other non-HAD phosphatases (the tyrosine phosphatase PTP1B, the serine/threonine protein phosphatase PP2B/calcineurin, and the DNA/RNA-directed alkaline phosphatase/CIP) and three other non-protein-directed C1/C0-type HAD phosphatases (NT5C1A, NANP, and PNKP). The C1-capped enzymes NT5C1A and NANP were chosen because we previously found them to be sensitive to small molecule inhibitors of the PDXP-related phosphoglycolate phosphatase PGP (PMID: 36369173). PNKP was chosen to increase the coverage of C0-capped HAD phosphatases (previously, only the C0-capped MDP1 was tested).

      We found that calcineurin, CIP and NANP were not inhibited by up to 40 µM 7,8-DHF. The activities of PTP1B or PNKP activity were attenuated (by ~20 or 30%, respectively) only at 40 µM 7,8-DHF. In contrast, 7,8-DHF effectively inhibited NT5C1A (IC50 ~10 µM). We have previously found that NT5C1A was sensitive to small-molecule inhibitors of the PDXP paralog PGP, although these molecules are structurally unrelated to 7,8-DHF (PMID: 36369173). NT5C1A is an AMP hydrolase expressed in skeletal muscle and heart (PMID: 12947102). To our knowledge, a role of NT5C1A in the brain has not been reported. Based on currently available information, the inhibition of NT5C1A therefore appears unlikely to contribute to 7,8-DHF effects in the brain. The results of these experiments are shown in the revised Figure 2d. Taken together, the extended selectivity analysis of 7,8-DHF on a total of 12 structurally and functionally diverse protein- and non-protein-directed phosphatases supports our initial conclusion that 7,8-DHF preferentially inhibits PDXP. To nevertheless avoid any overstatement, we have now also replaced “selective” by “preferential” in this context throughout the manuscript.

      We have not tested if 7,8-DHF promotes tyrosine phosphorylation of TRK-B. Being able to detect 7,8- DHF-induced TRK-B phosphorylation in our hands would not exclude an additional role for PDXP/vitamin B6-dependent processes. Not being able to detect TRK-B phosphorylation may indicate absence of evidence or evidence of absence. This would neither conclusively rule out a biological role for 7,8-DHF-induced TRK-B phosphorylation in vivo, nor contribute further insights into a possible involvement of vitamin B6-dependent processes in 7,8-DHF induced effects.

      Page 6: The authors report that they obtained only two PDXP-selective inhibitor hits from their screen; 7,8DHF and something they describe as FMP-1. For the later, they state that it "was obtained from an academic donor, and its structure is undisclosed for intellectual property reasons". In my opinion, this is totally unacceptable. This is an academic research publication. If the authors wish to present data, they must do so in a manner that allows a reader to assess their significance; in the case of work with small molecules that includes the chemical structure. In my opinion, the authors should either describe the compound fully or remove mention of it altogether.

      We are unable to describe “FMP-1” because its identity has not been disclosed to us. The academic donor of this molecule informed us that they were not able to permit release of any details of its structure or general structural class due to an emerging commercial interest.

      We mentioned FMP-1 simply to highlight the fact that the screening campaign yielded more than one inhibitor. FMP-1 was also of interest due its complete inhibition of PDXP phosphatase activity.

      Because the structure of this molecule is unknown to us, we have now removed any mention of this compound in the manuscript. For the same reason, we have removed the mention of the inhibitor hits “FMP-2” and “FMP-3” in Figure 2 – figure supplement 1 and Figure 2 – figure supplement 2. The number of PDXP inhibitor hits in the manuscript has been adapted accordingly.

      Page 7: The observed plateau at 50% inhibition requires further explanation. It is not clear how poor solubility of the compound explains this observation. For example, the authors state that "due to the aforementioned poor solubility of 7,8DHF, concentrations higher than 12.5µM could not be evaluated". Yet on page 8, they describe assays against the specificity panel at concentrations of compound up to 40µM. Do the analogues of 7,8DHF (Fig 2b) result in >50% inhibition at higher concentrations? Further explanation and data on the solubility of the compounds would be of benefit.

      We currently do not have a satisfactory explanation for the apparent plateau of ~50% PDXP inhibition by 7,8-DHF. Resolving this question will likely require other approaches, including computational chemistry such as molecular dynamics simulations, and we feel that this is beyond the scope of the present manuscript.

      We previously speculated that the limited solubility of 7,8-DHF may counteract a complete enzyme inhibition if higher concentrations of this molecule are required. Specifically, we referred to Todd et al. who have performed HPLC-UV-based solubility assays of 7,8-DHF (ref. 35). These authors found that immediately after 7,8-DHF solubilization, nominal 7,8-DHF concentrations of 5, 20 or 50 µM resulted in 0.5, 3.0 or 13 µM of 7,8-DHF in solution of (i.e., 10, 15 or 26% of the respective nominal concentration). Seven hours later, 46, 26 or 26% of the respective nominal 7,8-DHF concentrations were found in solution. Hence, above a nominal concentration of 5 µM, 7,8-DHF solubility does not increase linearly with the input concentration, but plateaus at ~20% of the nominal concentration. This phenomenon could potentially contribute to the apparent plateau of human or murine PDXP inhibition by 7,8-DHF in vitro.

      However, experiments performed during the revision of our manuscript show that they HAD phosphatase NT5C1A can be effectively inhibited by 7,8-DHF with an IC50-value of 10 µM (see revised Fig. 2). Together with the fact that the activity of the PDXP-Asn61Ser variant can be completely inhibited by 7,8-DHF (see Fig. 3d), we conclude that the reason for the observed plateau of PDXP inhibition is likely to be primarily structural, with Asn61 impeding 7,8-DHF binding. We have therefore removed the mention of the limited solubility of 7,8-DHF here. On p.14, we now say: “These data also suggest that Asn61 contributes to the limited efficacy of 7,8-mediated PDXP inhibition in vitro.”

      The solubility of 7,8-DHF is dependent on the specific assay and buffer conditions. In BLI experiments, interference patterns caused by binding of 7,8-DHF in solution to biotinylated PDXP immobilized on the biosensor surface are measured. In phosphatase selectivity assays, phosphatases are in solution, and the effect of 7,8-DHF on the phosphatase activity is measured via the quantification of free inorganic phosphate.

      In BLI experiments, we observed that the sensorgrams obtained with the highest tested 7,8-DHF concentration (25 µM) showed the same curve shapes as the sensorgrams obtained with 12.5 µM 7,8-DHF. This contrasts with the expected steeper slope of the curves at 25 µM vs. 12.5 µM 7,8-DHF. The same behavior was observed for the reference sensors (i.e., the SSA sensors that were not loaded with PDXP, but incubated with 7,8-DHF at all employed concentrations for referencing against nonspecific binding of 7,8-DHF to the sensors). The sensorgrams at 25 µM 7,8-DHF were therefore not included in the analysis (this is now specified in the Materials and Methods BLI section on p.27). To clarify this point, we now state that “As a result of the poor solubility of the molecule, a saturation of the binding site was not experimentally accessible” (p.7).

      In contrast, the phosphatase selectivity assays described on p.8 could be performed with nominal 7,8-DHF concentrations of up to 40 µM. Although the effective 7,8-DHF concentration in solution is expected to be lower (see ref. 35 and discussed above), the limited solubility of 7,8-DHF in phosphatase assays does not prevent the quantification of free inorganic phosphate. Nevertheless, we cannot exclude some interference with this absorbance-based assay (e.g., due to turbidity caused by insoluble compound). Indeed, 5,6-dihydroxyflavone and 5,6,7-trihydroxyflavone caused an apparent increase in PDXP activity at concentrations above 10 µM (see Figure 2b), which may be related to compound solubility issues. Alternatively, these flavones may activate PDXP at higher concentrations.

      We have tested the 7,8-DHF analogue 3,7,8,4’-tetrahydroxyflavone at concentrations of 70 and 100 µM. At concentrations >100 µM, the DMSO concentration required for solubilizing the flavone interferes with PDXP activity. PDXP inhibition by 3,7,8,4’-tetrahydroxyflavone was slightly increased at 70 µM compared to 40 µM (by ~18%) but plateaued between 70 and 100 µM. These results are now mentioned in the text (p.7): “The efficacy of PDXP inhibition by 3,7,8,4’-tetrahydroxyflavone was not substantially increased at concentrations >40 µM (relative PDXP activity at 40 µM: 0.46 ± 0.05; at 70 µM: 0.38 ± 0.15; at 100 µM: 0.37 ± 0.09; data are mean values ± S.D. of n=6 experiments).”

      Page 9: The authors report that PDXP crystallizes as a homodimer in which 7,8DHF is bound only to one protomer. Is the second protomer active? Does that contribute to the 50% inhibition plateau? If Arg62 is mutated to break the salt bridge, does inhibition go beyond 50%?

      We have no way to measure the activity of the second, inhibitor-free protomer in murine PDXP. We know that PDXP functions as a constitutive homodimer, and based on our current understanding, both protomers are active. We have previously shown that the experimental monomerization of PDXP (upon introduction of two-point mutants in the dimerization interface) strongly reduces its phosphatase activity. Specifically, PDXP homodimerization is required for an inter-protomer interaction that mediates the proper positioning of the substrate specificity loop. Thus, homodimerization is necessary for effective substrate coordination and -dephosphorylation (PMID: 24338687).

      In the murine structure, we observed that 7,8-DHF binding to the second subunit (the B-protomer) is prevented by a salt bridge between Arg62 and Asp14 of a symmetry-related A-protomer in the crystal lace (i.e., this is not a salt bridge between Arg62 in the B-protomer and Asp14 in the A-protomer of a PDXP homodimer). As suggested, we have nevertheless tested the potential role of this salt bridge for the sensitivity of the PDXP homodimer to 7,8-DHF.

      The mutation of Arg62 is not suitable to answer this question, because this residue is involved in the coordination of 7,8-DHF (see Figure 3b), and the PDXP-Arg62Ala mutant is inhibitor resistant (see Figure 3d). We have therefore mutated Asp14, which is not involved in 7,8-DHF coordination. As shown in the new Figure 3 – figure supplement 1d, the 7,8-DHF-mediated inhibition of PDXPAsp14Ala again reached a plateau at ~50%. This result suggests that while an Arg62-Asp14 salt bridge is stabilized in the murine crystal, it is not a determinant of the active site accessibility of protomer B in solution.

      To address this important question further, we have now also generated co-crystals of human PDXP bound to 7,8-DHF, and refined two structures to 1.5 Å. We found that in human PDXP, both protomers bind 7,8-DHF. These new, higher resolution data are now shown in the revised Figure 3 and its figure supplements, and we have moved the panels referring to the previously reported murine PDXP structure to the Figure 3 – figure supplement 1. Thus, both protomers of human PDXP, but only one protomer of murine PDXP bind 7,8-DHF in the crystal structure, yet the 7,8-DHFmediated inhibition of human and murine PDXP plateaus at ~50% under the phosphatase assay conditions (see Figure 2a). We conclude that 7,8-DHF binding efficiency in the PDXP crystal does not necessarily reflect its inhibitory efficiency in solution.

      Taken together, these data indicate that the apparent partial inhibition of murine and human PDXP phosphatase activity by 7,8-DHF in our in vitro assays is not explained by an exclusive binding of 7,8DHF to just one protomer of the homodimer.

      Page 10-12; Is it possible to generate a mutant form of PDXP in which activity is maintained but inhibition is attenuated - an inhibitor-resistant mutant form of PDXP? Can such a mutant be used to assess on-target vs off-target effects of 7,8DHF in cells?

      This is an excellent point, and we agree with the Reviewer that such an approach would provide further evidence for cellular on-target activity of 7,8-DHF. Indeed, the verification of the PDXP-7,8DHF interaction sites has led to the generation of catalytically active, inhibitor-resistant PDXP mutants, such as Tyr146Ala and Glu148Ala (Fig. 3d). However, the biochemical analysis of such mutants in primary hippocampal neurons is a very difficult task.

      Primary hippocampal neurons are derived from pooled, isolated hippocampi of mouse embryos and are subsequently differentiated for 21 days in vitro. The resulting cellular yield is typically low and variable, and the viability (and contamination of the respective cultures with e.g. glial cells) varies from batch to batch. Although such cell preparations are suitable for electrophysiological or immunocytochemical experiments, they are far from ideal for biochemical studies. A meaningful experiment would require the efficient expression of a catalytically active, but inhibitor-resistant PDXP-mutant in PDXP-KO neurons. In parallel, PDXP-KO cells reconstituted with PDXP-WT (at phosphatase activity levels comparable with the PDXP mutant cells) would be needed for comparison. Unfortunately, the generation of (a) sufficient numbers of (b) viable cells that (c) efficiently express (d) functionally comparable levels of PDXP-WT or -mutant for downstream analysis (PLP/PL-levels upon inhibitor treatment) is currently not possible for us.

      Human iPSC-derived (hippocampal) spheroids are at present no alternative, due to the necessity of generating PDXP-KO lines first, and the difficulties with transfecting/transducing them. Such a system would require extensive validation. We have attempted to use SH-SY5Y cells (a metastatic neuroblastoma cell line), but PDXK expression in these cells is modest and they produce too little PLP. We therefore feel that this question is beyond the scope of our current study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is an interesting study that performs scRNA-Seq on infected and uninfected wounds. The authors sought to understand how infection with E. faecalis influences the transcriptional profile of healing wounds. The analysis demonstrated that there is a unique transcriptional profile in infected wounds with specific changes in macrophages, keratinocytes, and fibroblasts. They also speculated on potential crosstalk between macrophages and neutrophils and macrophages and endothelial cells using NicheNet analysis and CellChat. Overall the data suggest that infection causes keratinocytes to not fully transition which may impede their function in wound healing and that the infection greatly influenced the transcriptional profile of macrophages and how they interact with other cells.

      Strengths:

      It is a useful dataset to help understand the impact of wound infection on the transcription of specific cell types. The analysis is very thorough in terms of transcriptional analysis and uses a variety of techniques and metrics.

      Weaknesses:

      Some drawbacks of the study are the following. First, the fact that it only has two mice per group, and only looks at one time point after wounding decreases the impact of the study. Wound healing is a dynamic and variable process so understanding the full course of the wound healing response would be very important to understand the impact of infection on the healing wound. Including unwounded skin in the scRNA-Seq would also lend a lot more significance to this study. Another drawback of the study is that mouse punch biopsies are very different than human wounds as they heal primarily by contraction instead of reepithelialization like human wounds. So while the conclusions are generally supported the scope of the work is limited.

      Thank you for your thoughtful review and acknowledgment of the thoroughness of our analysis.

      First, the fact that it only has two mice per group, and only looks at one time point after wounding decreases the impact of the study.

      We acknowledge your concerns regarding the limitations of our study, particularly regarding the small number of mice per group and the examination of only one time point post-wounding. We agree that a more comprehensive analysis across multiple time points would provide a deeper understanding of the temporal changes induced by infection. While our primary focus in this study was to elucidate the foundational responses to bacteria-infected wounds, we attempted to augment our analysis by incorporating publicly available datasets of similar nature. However, these datasets lacked power in terms of cell number and populations. Nonetheless, we have bolstered our analysis by applying a crossentropy test on the integrated dataset and reporting its significance (Figure S1F), ensuring the robustness of our single-cell RNA sequencing datasets.

      Including unwounded skin in the scRNA-Seq would also lend a lot more significance to this study.

      We also recognize the significance of comparing infected wounds to unwounded skin to establish a baseline for transcriptional changes. While we attempted to incorporate publicly available unwounded skin samples into our analysis, we encountered limitations in the number of cells, particularly within the immune population. This constraint is addressed in the Limitations section of the manuscript.

      Another drawback of the study is that mouse punch biopsies are very different than human wounds as they heal primarily by contraction instead of re-epithelialization like human wounds.

      Regarding the concern about differences between murine and human wound healing mechanisms, we took measures during tissue isolation to mitigate this issue, extracting incisions of the wounds rather than contracted tissues. Despite the primary mode of wound closure in mice being contraction, we believe our analysis still offers valuable insights into cellular responses to infection relevant to human wound healing.

      We appreciate your constructive criticism of our study. Despite these constraints, we believe our work provides valuable insights into the transcriptional changes induced by infection in healing wounds.

      Reviewer #2 (Public Review):

      Summary:

      The authors have performed a detailed analysis of the complex transcriptional status of numerous cell types present in wounded tissue, including keratinocytes, fibroblasts, macrophages, neutrophils, and endothelial cells. The comparison between infected and uninfected wounds is interesting and the analysis suggests possible explanations for why infected wounds are delayed in their healing response.

      Strengths:

      The paper presents a thorough and detailed analysis of the scRNAseq data. The paper is clearly written and the conclusions drawn from the analysis are appropriately cautious. The results provide an important foundation for future work on the healing of infected and uninfected wounds.

      Weaknesses:

      The analysis is purely descriptive and no attempt is made to validate whether any of the factors identified are playing functional roles in wound healing. The experimental setup is analyzing a single time point and does not include a comparison to unwounded skin.

      We are thankful for your acknowledgment of the thoroughness of our analysis and the cautious nature of our conclusions.

      The analysis is purely descriptive, and no attempt is made to validate whether any of the factors identified are playing functional roles in wound healing.

      Regarding your concern about the purely descriptive nature of our analysis and the lack of functional validation of identified factors, we agree on the importance of understanding the functional roles of transcriptional changes in wound healing. To address this limitation, we plan to conduct functional experiments, such as perturbation assays or in vivo validation studies, to validate the roles of specific factors identified in our analysis.

      The experimental setup is analyzing a single time point and does not include a comparison to unwounded skin.

      We acknowledge the importance of comparing wounded tissue to unwounded skin to establish a baseline for understanding transcriptional changes. This point is noted and acknowledged in the limitations section of our manuscript.

      We appreciate your feedback and assure you that we will consider your suggestions in future iterations of our research.

      Recommendations For The Authors:

      We are grateful for the positive overall assessment of our revised work by the reviewers. Critical comments on specific aspects of our work are listed verbatim below followed by our responses.

      Reviewer 1 (Recommendations for the Authors):

      (1) The figures are a bit cluttered and hard to parse out. The different parts of the figure seem to be scattered all over the place with no consistent order.

      Thank you for your feedback regarding the figures in our manuscript. We acknowledge your concern that some panels may appear cluttered and challenging to navigate. In response, we made concerted efforts to declutter certain panels, taking into account page size constraints and ensuring a minimum font size for readability.

      (2) I didn't really understand what the last sentence on page 6 meant. Is this meant to say that these could be biomarkers of infection?

      We thank the reviewer for noting this lack of clarity. We revised the statement.

      Updated manuscript (lines 111-113)

      “Overall, the persistent E. faecalis infection contributed to higher Tgfb1 expression, whilst Pdgfa levels remained low, correlating with delayed wound healing.”

      (3) >(3) A reference on page 19 didn't format correctly.

      We thank the reviewer for catching the typos. We corrected the reference formatting.

      Updated manuscript (lines 503-505)

      “We confirm the immune-suppressive role of E. faecalis in wound healing, consistent with previous findings in different experimental settings (Chong et al., 2017; Kao et al., 2023; Tien et al., 2017).”

      (4) The title doesn't really address the scope of the finding which goes beyond immunomodulatory.

      The reviewer is correct! We therefore revised the title to cover all aspects of the study as:

      “Decoding the complexity of delayed wound healing following Enterococcus faecalis infection”

      Reviewer 2 (Recommendations for the Authors):

      (1) On page 6, the expression of Tgfb1 is described as "aggravated" by wounding alone. I am not sure whether this means Tgfb1 levels are increased or decreased. It appears from the data that it is increased, which was confusing to me since I interpreted "aggravated" as meaning decreased. So perhaps a different more straightforward word could be used to describe the data.

      We modified this ambiguous statement to:

      Updated manuscript (lines 105-106)

      “By contrast, wounding alone resulted in higher transforming growth factor beta 1 (Tgfb1) expression.”

      (2) On page 7, the authors state that "cells from infected wounds...demonstrated distinct clustering patterns compared to cells from uninfected wounds (Figure S1F)" but when I look at the data in this figure, I cannot really see a difference. Perhaps the differences could be more clearly highlighted?

      Thank you for pointing out this issue. We appreciate the reviewer's comment. We utilized the crossentropy test for statistical comparison, employing UMAP embedding space data. While the data underwent batch correction based on infection status, the UMAP plots for each condition may appear visually similar. However, it's important to note that the number of cells per clusters between the infected and uninfected conditions varies significantly. This aspect influences the selection of points (cells) and their nearest neighbours for statistical testing within each cluster in the embedding space. To address this concern, we have included a table indicating the number of cells per cell type alongside the plot (Figure S1F), providing additional context for the interpretation of our results.

      Author response table 1.

      Author response image 1.

      (3) On page 8, Zeb2hi cells are described as "immunosuppressive" and yet the genes are highlighted to express in include Cxcl2 and IL1b which I would classify as inflammatory, not immunosuppressive. Can the authors be a bit more clear on why they describe the phenotype of these cells as "immunosuppressive"?

      We agree with the reviewer that this is a bit counterintuitive. Conventionally, CXCL2 is thought to be chemoattractant for neutrophil recruitment. However, the infection-specific keratinocyte cluster expressing Cxcl2, Il1b, Wfdc17 along with Zeb2 and Thbs1 indicate their myeloid-derived suppressor cell-like features, which play immunosuppressive roles during infection and in cancer (Alshetaiwi et al., 2020; Siriwach et al., 2022; Veglia et al., 2021).

      Updated manuscript (lines 159-163)

      “As the barrier to pathogens, keratinocytes secrete a broad range of cytokines that can induce inflammatory responses (Alshetaiwi et al., 2020; Siriwach et al., 2022; Veglia et al., 2021). However, Zeb2hi keratinocytes co-expressing Cxcl2, Il1b, and Wfdc17, indicate myeloidderived suppressor cell-like phenotype which implies an immunosuppressive environment (Hofer et al., 2021; Veglia et al., 2021).”

      (4) On pages 8-9, Keratinocytes are described to express MHC class II. I find this quite unexpected since class II is usually thought to be expressed primarily by APCs such as DCs and B cells. Is there a precedent for keratinocytes to express class II? The authors should acknowledge that this is unexpected and in need of further validation, or support the claim with references in which class II expression has been previously observed on keratinocytes (and is thus not unexpected)

      Although MHC class II expression is predominantly on immune cells, an antigen-presenting role for keratinocytes has been reported in many studies (Banerjee et al., 2004; Black et al., 2007; Carr et al., 1986; Gawkrodger et al., 1987; Jiang et al., 2020; Li et al., 2022; Oh et al., 2019; Tamoutounour et al., 2019). Therefore, antigen-presenting role of keratinocytes is known and expected, and we think that this should be further investigated in in the context of wound infection.

      Updated manuscript (lines 177-179)

      “These genes are associated with the major histocompatibility complex (MHC) class II, suggesting a self-antigen presenting keratinocyte population, which have a role in costimulation of T cell responses (Meister et al., 2015; Tamoutounour et al., 2019).”

      REFERENCES

      Alshetaiwi, H., Pervolarakis, N., McIntyre, L. L., Ma, D., Nguyen, Q., Rath, J. A., Nee, K., Hernandez, G., Evans, K., Torosian, L., Silva, A., Walsh, C., & Kessenbrock, K. (2020). Defining the emergence of myeloid-derived suppressor cells in breast cancer using single-cell transcriptomics. Science Immunology, 5(44), eaay6017. https://doi.org/10.1126/sciimmunol.aay6017

      Banerjee, G., Damodaran, A., Devi, N., Dharmalingam, K., & Raman, G. (2004). Role of keratinocytes in antigen presentation and polarization of human T lymphocytes. Scandinavian Journal of Immunology, 59(4), 385–394. https://doi.org/10.1111/j.0300-9475.2004.01394.x

      Black, A. P. B., Ardern-Jones, M. R., Kasprowicz, V., Bowness, P., Jones, L., Bailey, A. S., & Ogg, G. S. (2007). Human keratinocyte induction of rapid effector function in antigen-specific memory CD4+ and CD8+ T cells. European Journal of Immunology, 37(6), 1485–1493. https://doi.org/10.1002/eji.200636915

      Carr, M. M., McVittie, E., Guy, K., Gawkrodger, D. J., & Hunter, J. A. (1986). MHC class II antigen expression in normal human epidermis. Immunology, 59(2), 223–227.

      Gawkrodger, D. J., Carr, M. M., McVittie, E., Guy, K., & Hunter, J. A. (1987). Keratinocyte expression of MHC class II antigens in allergic sensitization and challenge reactions and in irritant contact dermatitis. The Journal of Investigative Dermatology, 88(1), 11–16. https://doi.org/10.1111/1523-1747.ep12464641

      Jiang, Y., Tsoi, L. C., Billi, A. C., Ward, N. L., Harms, P. W., Zeng, C., Maverakis, E., Kahlenberg, J. M., & Gudjonsson, J. E. (2020). Cytokinocytes: The diverse contribution of keratinocytes to immune responses in skin. JCI Insight, 5(20), e142067, 142067. https://doi.org/10.1172/jci.insight.142067

      Li, D., Cheng, S., Pei, Y., Sommar, P., Kärner, J., Herter, E. K., Toma, M. A., Zhang, L., Pham, K., Cheung, Y. T., Liu, Z., Chen, X., Eidsmo, L., Deng, Q., & Xu Landén, N. (2022). Single-Cell Analysis Reveals Major Histocompatibility Complex II‒Expressing Keratinocytes in Pressure Ulcers with Worse Healing Outcomes. The Journal of Investigative Dermatology, 142(3 Pt A), 705–716. https://doi.org/10.1016/j.jid.2021.07.176

      Oh, S., Chung, H., Chang, S., Lee, S.-H., Seok, S. H., & Lee, H. (2019). Effect of Mechanical Stretch on the DNCB-induced Proinflammatory Cytokine Secretion in Human Keratinocytes. Scientific Reports, 9(1), 5156. https://doi.org/10.1038/s41598-019-41480-y

      Siriwach, R., Ngo, A. Q., Higuchi, M., Arima, K., Sakamoto, S., Watanabe, A., Narumiya, S., & Thumkeo, D. (2022). Single-cell RNA sequencing identifies a migratory keratinocyte subpopulation expressing THBS1 in epidermal wound healing. iScience, 25(4), 104130. https://doi.org/10.1016/j.isci.2022.104130

      Tamoutounour, S., Han, S.-J., Deckers, J., Constantinides, M. G., Hurabielle, C., Harrison, O. J., Bouladoux, N., Linehan, J. L., Link, V. M., Vujkovic-Cvijin, I., Perez-Chaparro, P. J., Rosshart, S. P., Rehermann, B., Lazarevic, V., & Belkaid, Y. (2019). Keratinocyte-intrinsic MHCII expression controls microbiota-induced Th1 cell responses. Proceedings of the National Academy of Sciences of the United States of America, 116(47), 23643–23652. https://doi.org/10.1073/pnas.1912432116

      Veglia, F., Sanseviero, E., & Gabrilovich, D. I. (2021). Myeloid-derived suppressor cells in the era of increasing myeloid cell diversity. Nature Reviews. Immunology, 21(8), 485–498. https://doi.org/10.1038/s41577-020-00490-y

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Summary:

      The evolution of non-shivering thermogenesis is of fundamental importance to understand. Here, in small mammals, the contractile apparatus of the muscle is shown to increase energy expenditure upon a drop in ambient temperature. Additionally, in the state of torpor, small hibernators did not show an increase in energy expenditure under the same challenge.

      Strengths:

      The authors have conducted a very well-planned study that has sampled the muscles of large and small hibernators from two continents. Multiple approaches were then used to identify the state of the contractile apparatus, and its energy expenditure under torpor or otherwise.

      Weaknesses:

      There was only one site of biopsy from the animals used (leg). It would be interesting to know if non-shivering thermogenesis is something that is regionally different in the animal, given the core body and distal limbs have different temperatures.

      We thank the reviewer for their time and effort in reviewing our manuscript. Furthermore, we agree that it would be of interest to perform similar experiments upon different muscle sites in these animals. This is of particular interest as in some mammals, such as mice, distal limbs do not shiver and therefore non-shivering thermogenesis may play a more prominent role in heat regulation. A paper from Aydin et al., demonstrated that when shivering muscles (soleus) were prevented undergoing non-shivering thermogenesis via knock-out of UCP1 and were then exposed to cold temperatures, the force production of these muscles was significantly reduced due to prolonged shivering [1]. These results do suggest that even in shivering muscle, non-shivering thermogenesis plays a key role in the generation of heat for survival and for the maintenance of muscle performance. Furthermore, there is evidence from garden dormice that muscle temperature during torpor is slightly warmer than abdominal temperature and slighter cooler that heart temperature which is 7-8°C than abdominal suggesting the existence of non-shivering thermogenesis in skeletal and cardiac muscles (Giroud et al. in prep) [2]. We have added this information and reference into our discussion to reflect this important point (Discussion, paragraph 6, “As the biopsies which were used…”).

      Reviewer #2:

      Summary:

      The authors utilized (permeabilized) fibers from muscle samples obtained from brown and black bears, squirrels, and Garden dormice, to provide interesting and valuable data regarding changes in myosin conformational states and energetics during hibernation and different types of activity in summer and winter. Assuming that myosin structure is similar between species then its role as a regulator of metabolism would be similar and not different, yet the data reveal some interesting and perplexing differences between the selected hibernating species.

      Strengths:

      The experiments on the permeabilized fibers are complementary, sophisticated, and well-performed, providing new information regarding the characteristics of skeletal muscle fibers between selected hibernating mammalian species under different conditions (summer, interarousal, and winter).

      The studies involve complementary assessments of muscle fiber biochemistry, sarcomeric structure using X-ray diffraction, and proteomic analyses of posttranslational modifications.

      Weaknesses:

      It would be helpful to put these findings on permeabilized fibers into context with the other anatomical/metabolic differences between the species to determine the relative contribution of myosin energetics (with these other contributors) to overall metabolism in these different species, including factors such as fat volume/distribution.

      We thank the reviewer for the time and effort they have put into reviewing our paper and are grateful for the helpful suggestions which we believe, enhances our work (please see below for detailed answers to critics).

      Reviewer #3:

      Summary and strengths:

      The manuscript, "Remodelling of skeletal muscle myosin metabolic states in hibernating mammals", by Lewis et al, investigates whether myosin ATP activity may differ between states of hibernation and activity in both large and small mammals. The study interrogates (primarily) permeabilized muscle strips or myofibrils using several state-of-the-art assays, including the mant-ATP assay to investigate ATP utilization of myosin, X-ray diffraction of muscles, proteomics studies, metabolic tests, and computational simulations. The overall data suggests that ATP utilization of myosin during hibernation is different than in active conditions.

      A clear strength of this study is the use of multiple animals that utilize two different states of hibernation or torpor. Two large animal hibernators (Eurasian Brown Bear, American Black Bear) represent large animal hibernators that typically undergo prolonged hibernation. Two small animal hibernators (Garden Dormouse, 13 Lined Ground Squirrel) undergo torpor with more substantial reductions in heart rate and body temperature, but whose torpor bouts are interrupted by short arousals that bring the animals back to near-summer-like metabolic conditions.

      Especially interesting, the investigators analyze the impact that body temperature may have on myosin ATP utilization by performing assays at two different temperatures (8 and 20 degrees C, in 13 Lined Ground Squirrels).

      The multiple assays utilized provide a more comprehensive set of methods with which to test their hypothesis that muscle myosins change their metabolic efficiency during hibernation.

      We thank this reviewer for the effort and time they have put into carefully reviewing our manuscript and have taken on board their valuable suggestions to improve our manuscript (please see below for detailed answers to critics).

      Suggestions and potential weaknesses:

      While the samples and assays provide a robust and comprehensive coverage of metabolic needs and testing, the data is less categorical. Some of these may be dependent on sample size or statistical analysis while others may be dependent on interpretation.

      (1) Statistical Analysis

      (1a) The results of this study often cannot be assessed properly due to a lack of clarity in the statistical tests.

      For example, the results related to the large animal hibernators (Figure 1) do not describe the statistical test (in the text of the results, methods, or figure legends). (Similarly for figure 6 and Supplemental Figure 1). Further, it is not clear whether or when the analysis was performed with paired samples. As the methods described, it appears that the Eurasian Brown Bear data should be paired per animal.

      We thank the reviewer for these important points and have added information upon the statistical tests used where previously missing in each figure legend. Details on the statistical testing used for figure 6 are listed in the methods section, paragraph 18, “All statistical analysis of TMT derived protein expression data…”

      (1b) The statistical methods state that non-parametric testing was utilized "where data was unevenly distributed". Please clarify when this was used.

      We have now clariid all statistical tests used in the figure legends.

      (1c) While there are two different myosin isoforms, the isoform may be considered a factor. It is unclear why a one-way ANOVA is generally used for most of the mant-ATP chase data.

      The reviewer is right, in our analysis, we haven’t considered ‘myosin isoforms’ as a factor. One of the main reasons for that is because we have decided to treat fibres expressing different myosin heavy chain isoforms as totally separated entities (not interconnected).

      (1d) While the technical replicates on studies such as the mant-ATP chase assay are well done, the total biological replicates are small. A consideration of the sample power should be included.

      Unfortunately, obtaining additional biological samples from these unique species is challenging. Hence, we have added a statement in the Discussion section. This statement focuses on the potential benefits of increasing sample size to increase statistical power (Discussion, paragraph 2, “In contrast to our study hypothesis…”

      (1e) An analysis of the biological vs statistical significance should be considered, especially for the mant-ATP chase data from the American Black Bear, where there appear to be shifts between the summer and winter data.

      We agree that it is important to be careful when drawing conclusions from data only based on p-values. We agree that the modest differences observed in these data on American Black bear, whilst not significant, are worth noting and we have added these considerations into the manuscript (Discussion, paragraph 2, “In contrast to our study hypothesis…).

      (2) Consistency of DRX/SRX data.

      (2a) The investigators performed both mant-ATP chase and x-ray diffraction studies to investigate whether myosin heads are in an "on" or "off" state. The results of these two studies do not appear to be fully consistent with each other, which should not be a surprise. The recent work of Mohran et al (PMID 38103642) suggests that the mant-ATP-predicted SRX:DRX proportions are inconsistent with the position of the myosin heads. The discussion appears to lack a detailed assessment of this prior work and lack a substantive assessment contrasting the differing results of the two assays in the current study. i.e. why the current study's mant-ATP chase and x-ray diffraction results differ.

      Prior works on skeletal muscle (observing discrepancies between Mant-ATP chase assay and X-ray diffraction) are rather scarce. Adding a comprehensive discussion about this may be beyond the scope of current study and would distract the reader from the main topic. For this reason, we have not added any section. Note that, we have other manuscripts in preparation that are specifically dedicated to the discrepancy.

      (2b) The discussion of the current study's x-ray diffraction data relating to the I_1,1/I_1,0 ratio and how substantially different this is to the M6 results merits discussion. i.e. how can myosin both be more primed to contract during IBA versus torpor (according to intensity ratio), but also have less mass near the thick filament (M6).

      The I1,1/I1,0 ratio indicates a subtle mass shift towards the myosin thick filament whilst the M6 spacing shows a more compliant thick filament. These results are not incompatible and rely on interpretation of the X-ray diffraction patterns. To avoid any confusion and avoid distracting the reader from the main topic, we have decided not to speculate there.

      (3) Possible interactions with Heat Shock Proteins

      Heat Shock Proteins (HSPs), such as HSP70, have been shown to be differential during torpor vs active states. A brief search of HSP and myosin reveals HPSs related to thick filament assembly and Heat Shock Cognate 70 interacting with myosin binding protein C. Especially given the author's discussion of protein stability and the potential interaction with myosin binding protein C and the SRX state, the limitation of not assessing HSPs should be discussed. (While HSP's relation to thick filament assembly might conceivably modify the interpretation of the M3 x-ray diffraction results, this reviewer acknowledges that possibility as a leap.)

      The reviewer raises an interesting and potentially important of the potential impact of HSP and their interaction with the thick filament during hibernation. We have added a section into the discussion of this manuscript regarding this, with particular impact upon the HSP70 acting as a chaperone for myosin binding protein, however we feel that it is important to point out that HSPs have only been shown to interact with MYBPC3, a cardiac isoform of this protein which is not present in skeletal muscle [3]. (Discussion, paragraph 5, “Of potential further interest to the regulation of myosin…”).

      Despite the above substantial concerns/weaknesses, this reviewer believes that this manuscript represents a valuable data set.

      Other comments related to interpretation:

      (4) The authors briefly mention the study by Toepfer et al [Ref 25] and that it utilizes cardiac muscles. There would benefit from increased discussion regarding the possible differences in energetics between cardiac and skeletal muscle in these states.

      As this manuscript focuses solely on skeletal muscle. We believe that introducing comparisons between cardiac and skeletal muscles would confuse the reader. These types of muscles have very different regulations of SRX/DRX as an example. Note that we are preparing a manuscript focusing on cardiac muscle and hibernation.

      (5) The author's analysis of temperature is somewhat limited.

      (5a) First, the authors use 20 degrees C (room temperature), not 37 degrees C, a more physiologic body temperature for large mammals. While it is true that limbs are likely at a lower temperature, 20 degrees C seems substantially outside of a normal range. Thus, temperature differences may have been minimized by the author's protocol.

      The authors agree that the experimental set up to perform these single fiber studies at slightly higher temperatures may have been more beneficial to replicate the physiological conditions of these hind leg muscle in the analyzed animals. However, previous work has shown that the resting myosin dynamics are in fact stable at temperatures between 20-30 degrees Celsius in type I, type II and cardiac mammalian muscle fibers [4].

      (5b) Second, the authors discuss the possibility of myosin contributing to non-shivering thermogenesis. The magnitude of this impact should be discussed. The suggestion of myosin ATP utilization also implies that there is some basal muscle tone (contraction), as the myosin ATPase utilizes ATP to release from actin, before binding and hydrolyzing again. Evidence of this tone should be discussed.

      The reviewer is raising an interesting point and it would indeed be interesting to assess the magnitude of the impact and whether a basal muscle tone exists. Assessing the magnitude of the impact, is not an easy task and would require very advanced simulations which we are not experts in unfortunately. As for basal muscle tone, this is difficult to say as myosin is not actually binding to actin but hydrolyzing ATP at a faster pace during hibernation. We then think that the relation between our data and basal muscle tone is unclear. Hence, we have decided not to discuss these points in the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is a very interesting paper. I have some minor suggestions to help improve it.

      Is there any way to estimate the contribution of contractile apparatus to energy expenditure in reference to what is being generated at SERCA in the resting muscle under the various states examined?

      This is an interesting idea however, as far as we know, this would be challenging experimentally (in the hibernating mammals) and difficult to achieve in a reliable manner.

      It is important to emphasize that while BAT has been traditionally seen to be the site of NST, the skeletal muscle is very important, especially in large mammals, where BAT is going to be a very small % of the body and unlikely to be able to adequately provide heat. The addition of the contractile apparatus to SERCA as a heat generator at rest is very important -- also, the activation of ryanodine receptor Ca2+ to increase the local [Ca2+] at SERCA to generate heat has also recently been shown and should be mentioned (Meizoso-Huesca et al 2022, PNAS; Singh et al 2023, PNAS) alongside the work of Bal et al 2012 etc...

      We have included these mechanisms and references into the manuscript discussion [5, 6]. Discussion, paragraph 4, “A critical difference between the large hibernators…”

      Are you able to report the likely proportion of type II fibers in the muscles you have sampled?

      The fiber type breakdown for all animals used in this study is reported in supplementary table 1.

      The sampling of muscle from the legs of live animals is sensible and convenient. Is it possible different muscles in the body have different levels of NST, changes in energy expenditure in torpor, and other states?

      As discussed in the public review we have added to the discussion of this manuscript to reflect upon this important point of potentially different results from different muscle sites in these animals.

      Reviewer #2 (Recommendations For The Authors):

      Is it likely that the proportion of fast and slow myosin-heavy chains within the selected sample of myofibers from the different mammals contributes to the overall differences in the energetics of different conformational states? In living animals, how does the relative contribution of the energetics from different muscle fiber types compare with the contribution from other organs to the overall regulation of metabolism during activities in summer, winter, or periods of intermittent arousal?

      Fiber types in mammals can be vastly different between species as well as having a considerable amount of plasticity to change within each species upon specific stimuli. Furthermore, some mammals also have specific myosin heavy chain isoforms which have considerable expression, for example, myosin heavy chain 2B which is expressed in rodents such as mice but not larger mammals such as humans.

      In the manuscript, we demonstrate that there is no significant change in the ATP usage by myosin in resting muscle in any of the species which we examined (Fig 1 F, L; Fig 2 E, J). The relatively high mitochondrial density of type I fibers when compared to type II fibers may contribute to a higher overall requirement of energy storage primarily via lipid oxidation. However, mitochondrial respiration is heavily suppressed during hibernation, so questions remain over the overall energy demand in hibernating muscle beyond myosin [7]. The fact that myosin ATP demand is relatively preserved in hibernating muscle suggests that skeletal muscle may be a relatively energy-demanding organ even during hibernation, we speculate in the manuscript this may be due to the requirement of maintaining muscular tone and function during this period of prolonged immobilization. This may be of relevance when one considers the almost complete shutdown of organs involved with food intake and breakdown such as the stomach and liver during hibernation. Furthermore, heart rate and breathing rates are vastly suppressed. Altogether, whilst is it difficult at this point to make an accurate estimate of energy demands between the different organs of hibernators, our data points to skeletal muscle to be a relatively high energy demand organ during these periods. When considering the difference between fiber type, again our data suggests that both type I and type II fibers have relatively similar energy demands during hibernation.

      The supplementary data are quite revealing as to how the myosin isoform composition is stable in some species but highly plastic in others in response to the same environmental/metabolic challenges. Why is the myosin heavy chain isoform (I and II) composition stable for brown bears but not for black bears between summer and winter? This is very interesting. For the Ground squirrel, there is remarkable plasticity between myosin heavy chain isoforms ( I and II) between summer, interbout arousal, and torpor. Yet in the Garden Dormouse, the myosin heavy chain isoform (I and II) composition is stable between these three activity states. The inconsistencies between and within species are perplexing and worthy of closer interrogation.

      The measurements and role of myosin energetics in different conformational states are interesting but need to be explained in context with other metabolic regulators for these hibernating mammals, especially because some species show remarkable plasticity whereas others show remarkable stability. For example, compare brown and black bears which show differences in the response of myosin composition the activity, interbout arousal, and torpor. Ground squirrels show remarkable plasticity in myosin isoform composition between activity states (and likely metabolic differences), but the Garden Dormouse has a remarkably stable myosin isoform composition during the three metabolic/environmental challenges. What mechanisms facilitate these modifications in some but not other mammals, even those of similar size? The differences are very interesting, worthy of follow-up, and may well contribute to further understanding the significance of the energetics of different myosin conformational states.

      We agree that the changes seen between these species are very interesting and worthy of further investigation. What would be of further interest would be to look at methods which would allow for even deeper phenotyping, such as single fiber proteomics, to allow for the assessment of the percentage of hybrid fibers and fibers undergoing any fiber type switch during hibernating periods. Our results do observe a modest, albeit not significant, increase in the number of type I muscle fibers in 13-lined ground squirrels and Garden dormice during torpor which is consistent with previous studies[8]. Previous studies have demonstrated that lower temperatures may promote a shift towards more oxidative type I muscle fibers in mammals[9]. This could be an explanation for why we see this specifically in the smaller hibernators, however as we demonstrate and discuss, these lower temperatures are vital for the survival of these smaller mammals during hibernation so it would be inconsistent to hypothesize that these shifts are for heat-production purposes. Further studies are warranted to understand the relevance of these shifts further, particularly those with a higher sample size. It would also be on interest to examine fiber type percentages during the progression these long hibernating periods to observe if these changes are progressive.

      As for the triggers and mechanisms which facilitate these changes to myosin dynamics, this is of current investigation by the field. One which may be of particular relevance to the changes seen during hibernation would that of steroid hormones previous research has demonstrated that steroid hormone levels in make and female bears change differentially[10]. This may be of relevance as the steroid hormone estradiol has been shown to slow the resting myosin ATP turnover via the binding of myosin RLC[11]. Considering these studies, future work which looks at hibernating animals of each sex as different groups may be fruitful.

      Reviewer #3 (Recommendations For The Authors):

      i. PDF Pg 8- Results- 'Myosin temperature sensitivity is lost in relaxed skeletal muscles fibers of hibernating Ictidomys tridecemlineatus.': An extra comma appears to be placed between "temperature, decrease".

      ii. PDF Pg 9- Results- 'Hyper-phosphorylation of Myh2 predictably stabilizes myosin backbone in hibernating Ictidomys tridecemlineatus.' (last paragraph): A parenthesis needs to be closed upon the first reference to "supplemental figures 2 and 3".

      iii. PDF Pg 15- Methods- 'Samples collection and cryo-preservation'- The authors use the term "individuals" in the 2nd line. Consider using "subjects".

      iv. PDF Pg 15- Methods- 'Samples collection and cryo-preservation' (2nd paragraph)- define "subadult" in approximate months or years.

      v. PDF Pg 15- Methods- 'Samples collection and cryo-preservation' (2nd paragraph)- The authors state that brown bears were located in "February and again ... in late June". Was this order of operations always held? If so, a comment about how the potential ageing from the hibernation (especially if sub-adult transitions to adulthood in this period) should be included.

      All samples were collected during the subadult period of the lifespan of each bear and therefore we do not think that there would be a potential aging affect observed considering the lifespan of this species to be 20-30 years.

      vi. PDF Pg 15- Methods- 'Samples collection and cryo-preservation' (3rd paragraph)- The justification for deprivation of feeding of black bears 24 hours prior to euthanasia should be included. A comment on how this might impact post-translational modifications or gene expression should be included.

      Animals are starved prior to prevent aspiration during euthanasia. Considering these samples are to be compared to animals which have not consumed food or water for five months the impact relative impact on PTMs and gene expression would be considered negligible.

      vii. PDF Pg 17- Methods- 'Mant-ATP chase experiments' (just after normalized fluorescence equation): The "Where" may be lowercase.

      viii. PDF Pg 17- Methods- 'Mant-ATP chase experiments' (last paragraph): The protocol for myosin staining, along with the antibody identification (source, catalog number) should be included.

      ix. PDF Pg 18- Methods- 'Post-translational Modification Peptide mapping': Define the makeup of the acrylamide gel and/or the source and catalog number.

      x. PDF Pg 18- Methods- 'Post-translational Modification Peptide mapping': The authors state that "Gel bands were washed..." Please specify which protein bands and if multiple bands (i.e. multiple isoforms) were isolated.

      We thank this reviewer for their careful reading of our manuscript, we have made the changes above as relevant.

      Reference list

      (1) Aydin, J., et al., Nonshivering thermogenesis protects against defective calcium handling in muscle. Faseb j, 2008. 22(11): p. 3919-24.

      (2) Stickler, S., Regional body temperatures and fatty acid compositions in hibernating garden dormice: a focus on cardiac adaptions. 2022, Vienna: Vienna. p. v, 49 Seiten, Illustrationen.

      (3) Glazier, A.A., et al., HSC70 is a chaperone for wild-type and mutant cardiac myosin binding protein C. JCI Insight, 2018. 3(11).

      (4) Walklate, J., et al., Exploring the super-relaxed state of myosin in myofibrils from fast-twitch, slow-twitch, and cardiac muscle. Journal of Biological Chemistry, 2022. 298(3).

      (5) Meizoso-Huesca, A., et al., Ca<sup>2+</sup> leak through ryanodine receptor 1 regulates thermogenesis in resting skeletal muscle. Proceedings of the National Academy of Sciences, 2022. 119(4): p. e2119203119.

      (6) Singh, D.P., et al., Evolutionary isolation of ryanodine receptor isoform 1 for muscle-based thermogenesis in mammals. Proceedings of the National Academy of Sciences, 2023. 120(4): p. e2117503120.

      (7) Staples, J.F., K.E. Mathers, and B.M. Duffy, Mitochondrial Metabolism in Hibernation: Regulation and Implications. Physiology, 2022. 37(5): p. 260-271.

      (8) Xu, R., et al., Hibernating squirrel muscle activates the endurance exercise pathway despite prolonged immobilization. Exp Neurol, 2013. 247: p. 392-401.

      (9) Yu, J., et al., Effects of Cold Exposure on Performance and Skeletal Muscle Fiber in Weaned Piglets. Animals (Basel), 2021. 11(7).

      (10) Frøbert, A.M., et al., Differential Changes in Circulating Steroid Hormones in Hibernating Brown Bears: Preliminary Conclusions and Caveats. Physiol Biochem Zool, 2022. 95(5): p. 365-378.

      (11) Colson, B.A., et al., The myosin super-relaxed state is disrupted by estradiol deficiency. Biochemical and biophysical research communications, 2015. 456(1): p. 151-155.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review): Weaknesses:

      However, the molecular mechanisms leading to NPC dysfunction and the cellular consequences of resulting compartmentalization defects are not as thoroughly explored. Results from complementary key experiments using western blot analysis are less impressive than microscopy data and do not show the same level of reduction. The antibodies recognizing multiple nucleoporins (RL1 and Mab414) could have been used to identify specific nucleoporins that are most affected, while the selection of Nup98 and Nup107 is not well explained.

      The results for the Western blots are less impressive than single nuclei imaging analysis because the protocol for isolating brain nuclei is heterogeneous and includes non-neuronal cells. For this reason, we selected specific nucleoporins for Western blot studies to complement the nonspecificity of pan-NPC antibodies for which the detection is based on the glycosylated moieties. We reasoned that a combination of pan-NPC and select NUPs will give the strongest complementary validation for the mutant phenotype. We have discussed the rationale of NUP selection in discussion. In brief, we selected NUP107 as it is a major component of the Yscaffold complex and is a long-lived subunit of the NPCs (Boehmer et al., 2003; D'Angelo et al., 2009). NUP98 is a mobile nucleoporin and is associated with the central pore, nuclear basket and cytoplasmic filaments. Both NUPs have been implicated in degenerative disorders. (Eftekharzadeh et al., 2018; Wu et al., 2001).

      There is also no clear hypothesis on how Aβ pathology may affect nucleoporin levels and NPC function. All functional NCT experiments are based on reporters or dyes, although one would expect widespread mislocalization of endogenous proteins, likely affecting many cellular pathways.

      We agree that the interaction between Aβ pathology and the NPC remains a work in progress. We decided to rigorously characterize Aβ-mediated deficits in App KI neurons – using different approaches and in more than one animal model – before moving on to explore mechanisms in subsequent studies, which we think deserves more extensive experiments. We seek your understanding and have included in the discussion, possible mechanisms for direct and indirect Aβ-mediated disruption of NPCs. We have also included an additional study to show the disruption in the localization of an endogenous nucleocytoplasmic protein – CRTC1 (cAMP Regulated Transcriptional Coactivator), which is CREB coactivator responsive to neural activity. We observed under basal and also in tetrodotoxin-silenced conditions, there is much higher CRTC1 in the nucleus in App KI neurons relative to WT. This reflects the compromised permeability barrier that we observed via FRAP studies. (Supplementary Figure S15).

      The second part of this manuscript reports that in App KI neurons, disruption in the permeability barrier and nucleocytoplasmic transport may enhance activation of key components of the necrosome complex that include receptor-interacting kinase 3 (RIPK3) and mixed lineage kinase domain1 like (MLKL) protein, resulting in an increase in TNFα-induced necroptosis. While this is of potential interest, it is not well integrated in the study. This potential disease pathway is not shown in the very simple schematic (Fig. 8) and is barely mentioned in the Discussion section, although it would deserve a more thorough examination.

      The study of necroptosis is meant to showcase a single cellular pathway that requires nucleocytoplasmic transport for activation that is compromised and is relevant for AD. We agree there is much more to explore in this pathway but feel is outside the scope of this study. We have included a new illustration that models how damage to NPCs and permeability barrier results in enhanced vulnerability of App KI neurons for necroptosis (Supplemental figure S12).

      Reviewer #2 (Public Review):

      (1) Adding statistics and comparisons between wild-type changes at different times/ages to determine if the nuclear pore changes with time in wild-type neurons. The images show differences in the Nuclear pore in neurons from the wild-type mice, with time in culture and age. However, a rigorous statistical analysis is lacking to address the impact of age/development on NUP function. Although the authors state that nuclear pore transport is reported to be altered in normal brain aging, the authors either did not design their experiments to account for the normal aging mechanisms or overlooked the analysis of their data in this light.

      All our quantifications and statistical comparisons in neuron cocultures are time-matched between WT and App KI neurons, and thus independent of age and maturity of the neurons in culture. The accelerated loss of NUP expression is evident across all time groups. However, we cannot compare across age groups in cultured neurons as the time-matched WT and App KI samples for each time point were processed and imaged separately as neurons matured over time (Fig. 1B-C). An experiment must be done simultaneously across all age groups to compare agerelated effects for WT and App KI neurons in order to account for time-dependent changes. Given the unique challenges of studying “aging” in culture systems, we opted to be more conservative in our interpretation of the results and as such, we were careful to describe the accelerated nuclear pore deficits in App KI neurons relative to time-matched WT expression and speculate its relationship to normal brain aging only in the discussion section. We seek your understanding in this matter. That said, we are able to capture the decline of the NPC in histology of brain sections and observed a statistically significant drop in WT NUP levels in animal sections across age groups where we quantified and compared the raw nuclear intensities from brain sections that were processed and imaged simultaneously across independent experiments (Fig. 1D-E). We have included a statement in the results section to highlight that point.

      (2) Add experiments to assess the contribution of wild-type beta-amyloid accumulation with aging. It was described in 2012 (Guix FX, Wahle T, Vennekens K, Snellinx A, Chávez-Gutiérrez L, Ill-Raga G, Ramos-Fernandez E, Guardia-Laguarta C, Lleó A, Arimon M, Berezovska O, Muñoz FJ, Dotti CG, De Strooper B. 2012. Modification of γ-secretase by nitrosative stress links neuronal ageing to sporadic Alzheimer's disease. EMBO Mol Med 4:660-673, doi:10.1002/emmm.201200243) and 2021 (Burrinha T, Martinsson I, Gomes R, Terrasso AP, Gouras GK, Almeida CG. 2021. Upregulation of APP endocytosis by neuronal aging drives amyloid-dependent synapse loss. J Cell Sci 134. doi:10.1242/jcs.255752), 28 DIV neurons are senescent and accumulate beta-amyloid42. In addition, beta-amyloid 42 accumulates normally in the human brain (Baker-Nigh A, Vahedi S, Davis EG, Weintraub S, Bigio EH, Klein WL, Geula C. 2015. Neuronal amyloid-β accumulation within cholinergic basal forebrain in ageing and Alzheimer's disease. Brain 138:1722-1737. doi:10.1093/brain/awv024), thus, it would be important to determine if it contributes to NUP dysfunction. Unfortunately, the authors tested the Abeta contribution at div14 when wild-type Abeta accumulation was undetected. It would enrich the paper and allow the authors to conclude about normal aging if additional experiments were performed, namely, treating 28Div neurons with DAPT and assessing if NUP is restored.

      Your point is well-noted. We are intrigued at the potential contribution of WT Aβ to the decline in NUPs and NPC but decided to focus on mutant Aβ for this manuscript. We have observed negligible MOAB2-positive Aβ signals in WT neurons across all age groups (data not shown) but acknowledge the potential contributions of aging toward a reduction in NPC function. Instead, we have included a section in the discussion to highlight the aging-related expression of Aβ in WT neurons and a subset of the citations above to indicate a possible link with normal decay of NPCs.

      Reviewer #3 (Public Review):

      Weaknesses:

      (1) It does not consider the relationship of the findings here to other published work on the intraneuronal perinuclear and nuclear accumulation of amyloid in other transgenic mouse models and in humans.

      We have updated the discussion to further elaborate on intraneuronal and perinuclear accumulation of amyloid and how that relates to our NPC phenotype.

      (2) It appears to presume that soluble, secreted Abeta is responsible for the effect rather than the insoluble amyloid fibrils.

      At present, our data cannot fully discount the role of fibrils or other forms of Aβ causing the NPC deficits, but our studies do show that external presence of Aβ (e.g. addition of synthetic oligomeric Aβ or App KI conditioned media) leads to intracellular accumulation and NPC dysfunction. We are aware that endogenous formation of fibrils could also contribute to the NPC dysfunction but refrained from drawing any conclusions without further studies. We have stated this in the discussion.

      (5) It is not clear when the alteration in NUP expression begins in the KI mice as there is no time at which there is no difference between NUP expression in KI and Wt and the earliest time shown is 2 months. If NUP expression is decreased from the earliest times at birth, then this makes the significance of the observation of the association with amyloid pathology less clear.

      The phenotype we observed early in neuronal cultures and in very young animals is subtle and in all our studies, the severity of the NUP phenotypes consistently correlates with elevated intracellular Aβ. We expect that by looking at earlier/younger neurons, the deficits will not be present. However, neurons before DIV7 are immature, and hence we chose not to include those in our observations. In animals, we observed Aβ expression in neuronal soma in young mice (2 mo.), but it is not clear when the deficits manifests and how early to look. While the NUP expression is reduced at an early stage, we speculate in discussion that cellular homeostatic mechanisms can compensate for any compromised nuclear functions and to maintain viability to the point where age-dependent degradation of cellular mechanisms will eventually lead to progression of AD.

      Reviewer #1 (Recommendations For The Authors):

      While the App KI model is suitable for modeling one key aspect of human AD, the use of the term "AD neurons" throughout the manuscript is misleading and should be avoided when describing experiments with "App KI neurons".

      Noted and corrected.

      The claim that Aβ pathology causes NPC dysfunction via reduced nucleoporin protein expression would be stronger if it was better supported by biochemical evidence based on western blots (WBs) to complement the strong microscopy data. The results shown in Figure 2H show a very weak effect compared to microscopy data that does not appear to match the quantification (e.g. Lamin-B1 staining appears reduced after 2 months in WB but not the graph). It is also not clear why nuclear fractionation is required. WB analyses with RL1 and MAB414 (that recognizes multiple FG-Nupsin ICCs and WBs) would help identify Nups that are most affected by Aβ pathology.

      The weaker Western blot results is due to the heterogeneity of the nuclei we isolated from the whole brain which includes non-neuronal cells. We reasoned that isolating the nuclear fraction would give us a cleaner Western blot with fewer background bands as the input lysate is more specific. We also decided to use antibodies against specific NUPs as a way to complement the pan-NPC antibodies that detect glycosylation-enriched epitopes in the nucleus. We reasoned that Western blot identification of individual subunits should provide complementary and stronger evidence for the reduction of NUPs at the peptide level. Overall, we used four different nuclear pore antibodies (RL1, Mab414, NUP98, NUP107) to demonstrate the same mutant phenotype in App KI neurons.

      While the observed NCT defects are discussed in detail, the authors do not present any potential mechanisms to be tested, how intracellular Aβ may impact NPCs. Does Aβ pathology affect nucleoporin expression or stability?

      We have observed the presence of Aβ adjacent to the nuclear membrane and also in the cytosol via high resolution confocal microscopy (Supplementary Figure S14). Our primary goal in this paper is to provide convincing evidence – using different assays and in more than one mouse model – for the reduction of NUPs and lower NPC counts. We feel mechanistic details of Aβdriven NPC disruption requires more extensive experimentation more suitable for subsequent publications.

      The very simple schematic just represents the loss of compartmentalization, without illustrating more complex concepts. It would also be improved by representing the outer and inner nuclear membrane fusing around the NPCs with a much wider perinuclear space between the membranes. As shown now, the nuclear envelope almost looks like a single membrane, while >60kDa proteins are shown at a similar size as the 125MDa NPC.

      We have updated the illustration along with a new schematic for necroptosis (Supplementary Figure S12). We have refrained from giving specific details of the damage to the nuclear pore complex because it is not yet clear the nature of these deficits.

      Misspelling of "Hoechst" as "Hochest" in several figures (Fig. 1, 2, S5, S7).

      Noted and corrected

      Reviewer #2 (Recommendations For The Authors):

      (1) Additional data analysis is required concerning the wild-type controls. The figures show clear differences in the wild-type neurons with time in culture (referring to figures 1A, 1B, 1C; 2A, 2B, 2C, 2D,6E, 6F, 6G, s4) and in different ages (2E, 2F, 2G, 5B, 5C, 5D). The data analysis is shown for knockin vs the time-matched wild-type condition. The effect of time in wild-type neurons/mice should also be analyzed. All the data is suggested to be normalized to 7 DIV/2month wild-type neurons/mice. Were these experiments done with different time points of the same culture? This would be the best to conclude on the effect of time.

      We have noted a decline of NUPs in WT neurons over time in primary cultures and in animal sections. This is not surprising since the NPC and nuclear signaling pathways deteriorate with age (Liu and Hetzer, 2022; Mertens et al., 2015). However, we are unable to do a direct comparison across age groups in cultured neurons as the time-matched WT and App KI neuronal samples for each time point were processed and imaged separately as neurons matured over time (Fig. 1B-C). Hence, we perform statistical analysis for each time-matched WT and App KI neurons. To be clear, multiple independent experiments across different cultures were performed at each time point. Given the inherent challenges of studying aging in culture systems, we opted to be more conservative in our interpretation of the results and as such, we were careful to describe the accelerated nuclear pore deficits in App KI neurons relative to WT levels without inferring the effect of time and speculate its relationship to normal brain aging only in the discussion section. That said, we are able to capture the decline of the nuclear pore complex across different age groups in histology of brain sections where we observed a drop in WT NUP levels in animal sections when we quantified and compared the raw nuclear intensities from brain sections that were processed and imaged simultaneously across independent experiments (Fig. 1D-E).

      Similarly, in Figure 2H, why aren't 2 months compared with 14 months? Why were these ages chosen? 2 months is a young adult, and 14 months is a middle-aged adult. To conclude, aging should have included an age between 18 and 24 months old.

      As with cultures, we isolated age-matched WT and App KI animals separately. We chose 2 to 14 months as they represent young and middle-aged adults as we wanted to showcase the nuclear pore deficits induced by the presence of Aβ without drawing a conclusion on the effects of age or time. That said, we do show histology of brain sections at 18 months of age with individual NUPs. We agree that the temporal aspects of NPC loss in WT neurons is interesting, however, given our experimental parameters, we cannot draw conclusions across different age groups at the moment.

      In Figure 3, statistics between wild type should have been included.

      Similar to the above comment, samples were processed and imaged independently across different groups, hence we cannot compare the datapoints across time.

      (4) Additional quantification: The intensity of MOAB2 at 2 and 13 months should be measured as in Figure 3C.

      Intracellular Aβ signal in 2-mo. old App KI mice is diffuse throughout the soma but in older animals, they are punctate. This observation was similarly described by Lord et al. for tgAPPArcSwe mice (Lord et al., 2006). We have included a confocal micrograph of MOAB-2 immunocytochemistry of a 13-mo. App KI brain section in supplemental figures (Supplementary Figure S13). We found it challenging to differentiate whether the signal is localized intracellularly or as an extracellular aggregate. Regardless, the differences in the quality and uneven distribution of Aβ signal makes any direct comparison of soma intensity across the different age groups harder to interpret in the context of the mutant phenotype.

      (5) Additional experiments: Because primary neurons differentiate, mature, and age with time in culture, they are required to control for the developmental stage of your cultures. Analyzing neuronal markers such as doublecortin for neuronal precursors, MAP2 (or Tau) for dendritic/axonal maturation, synapsin for synaptic maturation, and accumulation of senescenceassociated beta-galactosidase (SA-Beta-Gal) as an aging marker.

      As part of the maintenance of cultures, we stain cultures for axodendritic markers (e.g. MAP2), glial cell distribution (e.g GFAP) and excitatory vs. inhibitory neuronal subpopulations (e.g. Gad65) and synaptic markers (e.g. PSD95) to ensure that growth, survival and viability of neurons are not compromised (data not shown). These markers for maturity are routinely tracked to ensure proper development. We also test the health of the cultures (e.g. apoptosis, necrosis) and to look for cytoskeletal disruption or fragmentation for neuronal processes.

      (6) Additional methods: The quantification of Abeta intensity in Figure 3 is not clearly explained in the methods. Was the intensity measured per field, per cell body?

      The quantifications for Aβ are done for each MAP2-positive cell body and have included that statement in the methods.

      (7) Missing in discussion integration and references to these papers:

      a. Mertens J, Paquola ACM, Ku M, Hatch E, Böhnke L, Ladjevardi S, McGrath S, Campbell B, Lee H, Herdy JR, Gonçalves JT, Toda T, Kim Y, Winkler J, Yao J, Hetzer MW, Gage FH. 2015. Directly Reprogrammed Human Neurons Retain Aging-Associated Transcriptomic Signatures and Reveal Age-Related Nucleocytoplasmic Defects. Cell Stem Cell 17:705-718. doi:10.1016/j.stem.2015.09.001

      b. Guix FX, Wahle T, Vennekens K, Snellinx A, Chávez-Gutiérrez L, Ill-Raga G, Ramos-Fernandez E, Guardia-Laguarta C, Lleó A, Arimon M, Berezovska O, Muñoz FJ, Dotti CG, De Strooper B. 2012. Modification of γ-secretase by nitrosative stress links neuronal ageing to sporadic Alzheimer's disease. EMBO Mol Med 4:660-673. doi:10.1002/emmm.201200243

      c. Burrinha T, Martinsson I, Gomes R, Terrasso AP, Gouras GK, Almeida CG. 2021. Upregulation of APP endocytosis by neuronal aging drives amyloid-dependent synapse loss. J Cell Sci 134. doi:10.1242/jcs.255752),

      Neuronal amyloid-β accumulation within cholinergic basal forebrain in ageing and Alzheimer's disease. Brain 138:1722-1737. doi:10.1093/brain/awv024).

      We have cited a subset of the papers in the discussion section and also expanded the discussion to include the possibility of time-dependent changes for Aβ expression in WT neurons.

      Reviewer #3 (Recommendations For The Authors):

      Specific comments:

      (1) Fig. 1D,E. Fig. 2E, F. This shows the change in NUP IR with time for the APP-KI, but there is also a difference between Wt and KI from the earliest time shown. How early is this difference apparent? From birth? The study should go back to the earliest time possible as the timing of the staining for NUP is important to correlate this with other events of intraneuronal Abeta and amyloid IR. Is the difference between 4 and 7-month ko mice in Figures 2G and 2F statistically significant? If not, perhaps we need a larger N to determine the timing accurately.

      The point is well taken. We have not examined the WT and App KI brains before 2-mo. of age. At this early time point, the extracellular amyloid deposits are very low but intracellular Aβ can be readily detected in neuronal soma. We expect that as the animal ages, the Aβ inside cells will directly impact the NPC mutant phenotype, but it is unclear how early this phenotype manifests in animals and when we should look. To be clear, in less mature neurons (DIV7), the phenotype is very subtle and can only be observed via high resolution microscopy. The differences between 4-7 mo. old animals (Fig. 2F and G) in terms of severity of the reduction cannot be assessed as the age-matched animals for each time point were processed separately, but at each time point, we observed a significant reduction of NPC relative to WT. Nevertheless, in Figure 1E, we performed immunohistochemistry experiments with pan-NPC antibodies and quantified raw intensities to show a difference between 4/7-mo. with 13-mo. old animals.

      (2) Similarly, the increase in Abeta IR is only shown for cultured neurons and only a single time point of 2 months is shown for CA1 in KI brain. Since a major point is that the decrease in NUP IR is correlated with an increase in Abeta IR, a more convincing approach would be to stain for both simultaneously in KI brain, especially since Abeta IR is quite sensitive to conformational variation between APP, Abeta, and aggregated forms and whether they are treated with denaturants for "antigen retrieval". The entire brain hemisphere should be shown as the pathology is not limited to CA1. There are many different Abeta antibodies that are specific to the amyloid state so it should be possible to come up with a set of antibodies and conditions that work for both Abeta and NUP staining.

      The intracellular Aβ signal in 2-mo. old App KI mice is diffuse throughout the soma but in older animals, they are punctate. We have included a confocal micrograph of MOAB-2 immunocytochemistry of a 13-mo. App KI brain section (Supplementary Figure S13). We did not quantify Aβ as it was challenging to differentiate if the signal is intracellular Aβ or amyloid β plaques. Regardless, the differences in the quality and uneven distribution of Aβ signal makes any direct comparison of soma intensity across the different age groups much harder to interpret.

      (3) Figure 3A. The staining with MOAB 2 and 82E1 appears qualitatively different with 82E1 exhibiting larger perinuclear puncta. Both antibodies appear to stain puncta inside the nucleus consistent with previously published reports of intranuclear amyloid IR. If these are flattened images, then 3D Z stacks should be shown to clarify this. Figure 3H shows what appears to be Abeta immunofluorescence quantitation in DAPT-treated cells, but the actual images are apparently not shown. The details of this experiment aren't clear or what antibody is used, but this may not be Abeta as many APP fragments that are not Abeta also react with antibodies like MOAB2.

      Since 82E1 detects a larger epitope (aa1-16 as compared to 1-4 in MOAB-2), it is possible some forms of Aβ are differentially detected inside the cell. MOAB-2 is shown to detect the different forms of Aβ40 and 42, with a stronger selectivity for the latter. However, it is not known to react with APP or APP/CTFs (Youmans et al., 2012). DAPT-treated cells were processed and imaged as with other experiments in figure 3 using MOAB-2 antibodies to detect Aβ. We have included that information in the figure legends.

      The way we image the cell is to collect LSM800 confocal stacks and use IMARIS software to render the nucleus in a 3D object prior to quantifying the intensity or coverage. In this way, we are capturing and quantifying the entire volume of the nucleus and not just a single plane. The majority of signal for MOAB-2 positive Aβ are punctate signals in the cytosol with a subset adjacent to the nucleus (Supplementary Figure 14; Airyscan; single plane). We also detected MOAB-2 signals coming from within the nucleus. The nature of this interaction between Aβ and the nuclear membrane/perinuclear space/nucleoplasm remains unclear.

      (4) P20 L12. "We demonstrate an Aβ-driven loss of NUP expression in hippocampal neurons both in primary cocultures and in AD mouse models" It isn't clear that exogenous or extracellular Abeta drives this in living animals. All the data that demonstrate this is derived from cell culture and things may be very different (eg. Soluble Abeta concentration) in vivo. It is OK to speculate that the same thing happens in vivo, but to say it has been demonstrated in vivo is not correct.

      We have rewritten the opening statement in the paragraph to narrowly define our observations in the context of App KI. We understand the caveats of our studies in primary cultures, but we have done our due diligence to study the phenomenon in different assays, using at least four different nuclear pore antibodies, and in more than one mouse model to show the deficits. We mentioned Aβ-driven loss but did not conclude which Aβ peptide (e.g. 40 vs. 42) or form (e.g. fibrillar) that drives the deficits. However, we have shown some data that oligomers and not monomers as well as extracellular Aβ can accumulate in the soma and trigger NPC deficits. We also state in the discussion that other possible mechanisms of action, mainly via indirect interactions of Aβ with the cell, could result in the deficits.

      (5) P21, L21 "Inhibition of γ-secretase activity prevented cleavage of mutant APP and generation of Aβ, which led to the partial restoration of NUP levels". What the data actually shows is that treatment of the cells with DAPT led to partial restoration of NUP levels. Other studies have shown that DAPT is a gamma secretase inhibitor, so it is reasonable to suspect that the effect to gamma secretase activity, but the substrates and products are assumed rather than measured, so a little caution is a good idea here. For example, CTF alpha is also a substrate, producing P3, which is not considered abeta. The products Abeta and P3 also typically are secreted, where they can be further degraded. Abeta and P3 can also aggregate into amyloid, so whether the effect is really due to Abeta per se as a monomer or Abeta-containing aggregates isn't clear.

      The point is noted. DAPT inhibition of -secretase can impact more than one substate as the complex can cleave multiple substrates. However, we have measured Aβ intensity which increases with DAPT, and while a singular experiment is insufficient to show direct Aβ involvement, we have performed other experiments that show a correlation of Aβ levels inside the soma and the degree of NPC reduction. This includes the direct application of synthetic Aβ42 oligomers. We agree the data cannot fully exclude the involvement of other -secretase cleavage products, but we feel there is strong enough evidence that Aβ – in whatever form - is at least partially if not, the main driver that promote these deficits.

      (6) Discussion. The authors point to "intracellular Abeta" as a potential causative agent for decreased NUP expression and function and cite a number of papers reporting intracellular Abeta. (D'Andrea et al., 2001; Iulita et al., 2014; Kimura et al., 2003; LaFerla et al., 1997; Oddo et al., 2003b; Takahashi et al., 2004; Wirths et al., 2001). Most of these papers report immunoreactivity with Abeta antibodies and argue about whether this is really Abeta40 or 42 and not APP or APP-CTF immunoreactivity. What is missing from these papers and the discussion in this manuscript is that this is not just soluble Abeta, but Abeta amyloid of the same type that ends up in plaques because it has the same immunoreactivity with Abeta amyloid fibril-specific antibodies and even the classical anti-Abeta antibodies 6E10 and 4G8 after antigen retrieval as shown in papers by Pensalfini, et al., 2014 and Lee, et al., 2022 (1,2) who describe the evolution of neuritic plaques and their amyloid core beginning inside neurons. The term "dystrophic neurite" is a misnomer because the structures that resemble "neurites" morphologically are actually autophagic vesicles packed with Abeta and APP immunoreactive material which has the detergent insolubility properties of amyloid plaques. See (1,2). The apparent intranuclear IR of MOAB2 and 82E1 mentioned in comment 3 is relevant here. In Lee et al., the 3D serial section EM reconstruction of one of these neurons with perinuclear and nuclear amyloid shows abundant amyloid fibrils in the remnant of the nucleus. The nuclear envelope appears to break down as evidenced by the redistribution of NeuN immunoreactivity (Pensalfini et al.,) and other nuclear markers and the EM evidence (Lee et al.,). These papers are also improperly cited as evidence for a hypothetical intracellular source for soluble Abeta.

      We have devoted a section of the discussion to highlight some of these findings in the context of Pensalfini et al. 2014 and Lee et al. 2022. Lee et al. tested multiple animal strains to observe the Panthos structures but did not use the App KI mouse model. Since none of our experiments directly tested their observations (e.g. perinuclear fibrils or acidity of autophagic vesicles) in App KI, we decided to take a more conservative approach in our interpretations by framing the NPC deficits without specifying the nature of the intracellular Aβ. We note in discussion that it is entirely possible that App KI animals also show the same Panthos phenotypes and the perinuclear accumulation of Aβ which results in damaged NUPs. To do that, the Panthos phenotype must first be established in App KI mice.

      (7) The authors also cite the work of Ditaranto et al., 2001 and Ji et al., 2002 for Aβ-induced lysosomal leakage from these vesicular structures but overlook the original publications on Abeta-induced lysosomal leakage by Yang et al., (3) who further show that this is correlated with aggregation of Abeta42 upon internalization which also leads to the co-aggregation of APP and APP-CTFs in a detergent-insoluble form (4) and pulse-chase studies demonstrate that metabolically-labeled APP ultimately ends up as insoluble Abeta that have "ragged" N-termini (5). This work seems relevant to the results reported here as the perinuclear amyloid that the authors report here is likely to be the same insoluble, aggregated APP and APP-CTF-containing amyloid as that reported in references 1 and 2.

      We have included the literature references in the discussion, highlighting the possibility of lysosomal leakage contributing to the NPC damage.

      Minor points.

      (1) P2, L28 "permeability barrier facilities passive" should be 'facilitates'.

      (2) P7, L24 "homogenate and grounded for 5 additional strokes" One of the peculiarities of English is that the past tense of grind is ground. Grounded means something else.

      (3) P8, L9 "For synthetic Aβ experiments," Abeta what? 42? 40? It makes a difference and if it is Abeta42, you should be specific in the rest of the text where it is used.

      (4) P11, L14. "To determine if Aβ can trigger changes in nuclear structure and function" It seems a little early to start by presupposing that it is Abeta that triggers changes in nuclear structure and function. It sounds like you are starting out with a bias.

      (5) P11, L16,17 "While Aβ pathology is robustly detected in App KIs" At some point in the manuscript, either here or in the introduction, it would be useful to include a couple of sentences about what the pathology is in these mice along with the timing of the development of the pathology to compare with the results presented here. There are several types of amyloid deposits, "neuritic" plaques, diffuse plaques, and cerebrovascular amyloid. This is important because the early "neuritic" plaques are intraneuronal at least early on before the neuron dies. See (1,2).

      (6) P19, L10. "LMB is an inhibitor or CRM-1 mediated" should be of

      All minor points have been addressed in the manuscript and figures.

      References

      (1) Pensalfini, A., Albay, R., 3rd, Rasool, S., Wu, J. W., Hatami, A., Arai, H., Margol, L., Milton, S., Poon, W. W., Corrada, M. M., Kawas, C. H., and Glabe, C. G. (2014) Intracellular amyloid and the neuronal origin of Alzheimer neuritic plaques. Neurobiol Dis 71C, 53-61

      (2) Lee, J. H., Yang, D. S., Goulbourne, C. N., Im, E., Stavrides, P., Pensalfini, A., Chan, H., Bouchet-Marquis, C., Bleiwas, C., Berg, M. J., Huo, C., Peddy, J., Pawlik, M., Levy, E., Rao, M., Staufenbiel, M., and Nixon, R. A. (2022) Faulty autolysosome acidification in Alzheimer’s disease mouse models induces autophagic build-up of Abeta in neurons, yielding senile plaques. Nat Neurosci 25, 688-701

      (3) Yang, A. J., Chandswangbhuvana, D., Margol, L., and Glabe, C. G. (1998) Loss of endosomal/lysosmal membrane impermeability is an early event in amyloid Aß1-42 pathogenesis. J. Neurosci. Res. 52, 691-698

      (4) Yang, A. J., Knauer, M., Burdick, D. A., and Glabe, C. (1995) Intracellular A beta 1-42 aggregates stimulate the accumulation of stable, insoluble amyloidogenic fragments of the amyloid precursor protein in transfected cells. J Biol Chem 270, 14786-14792

      (5) Yang, A., Chandswangbhuvana, D., Shu, T., Henschen, A., and Glabe, C. G. (1999) Intracellular accumulation of insoluble, newly synthesized Aßn-42 in APP transfected cells that have been treated with Aß1-42. J. Biol. Chem. 274, 20650-20656

      References

      Boehmer, T., Enninga, J., Dales, S., Blobel, G., and Zhong, H. (2003). Depletion of a single nucleoporin, Nup107, prevents the assembly of a subset of nucleoporins into the nuclear pore complex. Proc Natl Acad Sci U S A 100, 981-985.

      D'Angelo, M.A., Raices, M., Panowski, S.H., and Hetzer, M.W. (2009). Age-dependent deterioration of nuclear pore complexes causes a loss of nuclear integrity in postmitotic cells. Cell 136, 284-295.

      Eftekharzadeh, B., Daigle, J.G., Kapinos, L.E., Coyne, A., Schiantarelli, J., Carlomagno, Y., Cook, C., Miller, S.J., Dujardin, S., Amaral, A.S., et al. (2018). Tau Protein Disrupts Nucleocytoplasmic Transport in Alzheimer's Disease. Neuron 99, 925-940 e927.

      Liu, J., and Hetzer, M.W. (2022). Nuclear pore complex maintenance and implications for agerelated diseases. Trends Cell Biol 32, 216-227.

      Lord, A., Kalimo, H., Eckman, C., Zhang, X.Q., Lannfelt, L., and Nilsson, L.N. (2006). The Arctic Alzheimer mutation facilitates early intraneuronal Abeta aggregation and senile plaque formation in transgenic mice. Neurobiol Aging 27, 67-77.

      Mertens, J., Paquola, A.C., Ku, M., Hatch, E., Bohnke, L., Ladjevardi, S., McGrath, S., Campbell, B., Lee, H., Herdy, J.R., et al. (2015). Directly Reprogrammed Human Neurons Retain Aging-Associated Transcriptomic Signatures and Reveal Age-Related Nucleocytoplasmic Defects. Cell stem cell 17, 705-718.

      Wu, X., Kasper, L.H., Mantcheva, R.T., Mantchev, G.T., Springett, M.J., and van Deursen, J.M. (2001). Disruption of the FG nucleoporin NUP98 causes selective changes in nuclear pore complex stoichiometry and function. Proc Natl Acad Sci U S A 98, 3191-3196.

      Youmans, K.L., Tai, L.M., Kanekiyo, T., Stine, W.B., Jr., Michon, S.C., Nwabuisi-Heath, E., Manelli, A.M., Fu, Y., Riordan, S., Eimer, W.A., et al. (2012). Intraneuronal Abeta detection in 5xFAD mice by a new Abeta-specific antibody. Molecular neurodegeneration 7, 8.

  4. Apr 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to express our gratitude to the reviewers for their suggestions and critiques as we continually strive to enhance the quality of the manuscript. We improved it, by incorporating the reviewers’ suggestions, changing the content and numbering of figures (Figs 1, 3S1 were edited; 4 figures were moved to supplemental materials), and adding several analyses suggested by the reviewers along with accompanying figures (1S2, 1S3) and tables (1 and 2). These analyses include investigating the link between freezing behavior and 44-kHz calls as well as their sound mean power and duration. Also, we have introduced detailed information regarding the experiments performed as well as expanded the description and discussion of the results section. Finally, we added the information about 44-kHz calls reported by another group – which was inspired by our findings.

      Below is the point-by-point response to the reviewers’ comments.

      Reviewer #1 (Public Review):

      Olszyński and colleagues present data showing variability from canonical "aversive calls", typically described as long 22 kHz calls rodents emit in aversive situations. Similarly long but higher-frequency (44 kHz) calls are presented as a distinct call type, including analyses both of their acoustic properties and animals' responses to hearing playback of these calls. While this work adds an intriguing and important reminder, namely that animal behavior is often more variable and complex than perhaps we would like it to be, there is some caution warranted in the interpretation of these data. The authors also do not provide adequate justification for the use of solely male rodents. With several reported sex differences in rat vocal behaviors this means caution should be exercised when generalizing from these findings.

      We fully agree that our data should be interpreted with caution and we followed the Reviewer’s suggestions along these lines (see below). Also, we appreciate the suggestion to explore the prevalence of 44-kHz calls in female subjects, which would indeed represent an important and intriguing extension of our research. However, due to present financial constraints, we can only plan such experiments. To address the comment, we have added the sentence: “Here we are showing introductory evidence that 44-kHz vocalizations are a separate and behaviorally-relevant group of rat ultrasonic calls. These results require further confirmations and additional experiments, also in form of repetition, including research on female rat subjects.”

      It is important to note that the data presented in the current manuscript originates primarily from previously conducted experiments. These earlier experiments employed male subjects only; it was due to established evidence indicating that the female estrus cycle significantly influences ultrasonic vocalization (Matochik et al., 1992). Adhering to controls for the estrus cycle would require a greater number of female subjects than males, which would not only increase animal suffering but also escalate the demands of human labor and financial costs.

      Firstly, the authors argue that the shift to higher-frequency aversive calls is due to an increase in arousal (caused by the animals having received multiple aversive foot shocks towards the end of the protocols). However, it cannot be ruled out that this shift would be due to factors such as the passage of time and increase in fatigue of the animals as they make vocalizations (and other responses) for extended periods of time. In fact the gradual frequency increase reported for 22 kHz calls and the drop in 44 kHz calls the next day in testing is in line with this.

      Answer: We would like to point out that the “increased-arousal” hypothesis, declared in the manuscript, is only a hypothesis – as reflected by the wording used. However, we changed the beginning of the sentence in question from “It could be argued” to “We would like to propose a hypothesis” to emphasize the speculative aspect of the proposed explanation behind the increase of 44-kHz ultrasonic emissions.

      Also, we do agree that other factors could contribute to the increased emission of 44kHz calls. These factors could include: heightened fear, stress/anxiety, annoyance/anger, disgust/boredom, grief/sadness, despair/helplessness, and weariness/fatigue. We are listing these potential factors in the discussion. Also, we added: “It is not possible, at this stage, to determine which factors played a decisive role. Please note that the potential contribution of these factors is not mutually exclusive”. However, we propose a list of arguments supporting the idea that 44-kHz vocalizations communicate an increased negative emotional state. Among these arguments were the conclusions drawn from additional analyses – mostly inspired by the fatigue hypothesis proposed by the Reviewer #1. In particular, we investigated changes in the sound mean power and duration of 22-kHz and 44-kHz calls. Specifically, we showed that the mean power of 44-kHz vocalizations did not change, and was higher than that of 22-kHz vocalizations (Fig. 1S2EF).

      Finally, the Reviewer #1 listed “the gradual frequency increase reported for 22 kHz calls and the drop in 44 kHz calls the next day” as arguments for the fatigue hypothesis. We do not agree that the “increase” should be interpreted as a sign of fatigue [Producing and maintaining higher frequency calls require greater effort from the vocalizer, on which we elaborated in the manuscript], also we are not sure what “drop in 44 kHz calls” the Reviewer is referring to [We assume it refers to less 44-kHz calls during testing vs. training; we suppose that the levels of arousal are lower in the test due to shorter session time and lack of shocks, which additionally contributes to fear extinction].

      Secondly, regarding the analysis where calls were sorted using DBSCAN based on peak frequency and duration, it is not surprising that the calls cluster based on frequency and duration, i.e. the features that are used to define the 44 kHz calls in the first place. Thus presenting this clustering as evidence of them being truly distinct call types comes across as a circular argument.

      Answer: The DBSCAN sorting results were to convey that when changing the clustering ε value, the degree of cluster separation, the 44-kHz vocalizations remained distinct from the 22-kHz and various short-call clusters that merged. In other words: 44-kHz calls remained separate from long 22-kHz, short 22-kHz and 50-kHz vocalizations, which all consolidated into one common cluster. As a result, in this mathematical analysis, 44-kHz vocalizations remained distinct without applying human biases. Additionally, frequency and duration are the two most common features used to define all types of calls (Barker et al., 2010; Silkstone & Brudzynski, 2019a, 2019b; Willey & Spear, 2013). In summary, we did not expect the analysis to isolate out the 44-kHz calls, and we were surprised by this result.

      The sparsity of calls in the 30-40 kHz range (shown in the individual animal panels in Figure 2C) could in theory be explained by some bioacoustics properties of rat vocal cords, without necessarily the calls below and above that range being ethologically distinct.

      Answer: We respectfully disagree with the argument regarding sparsity. It is important to note that, during prolonged fear conditioning experiments, we observed an increased incidence of 44-kHz calls (Fig. 1E-G) of up to >19% (Fig. 1S2AB) of the total ultrasonic vocalizations during specific inter-trial intervals. Also, it is possible that in observed experimental circumstances almost every fifth call could be attributed to the vocal apparatus as an artifact of its functioning (assuming we are interpreting the Reviewer’s argument correctly). While we do not believe this to be the case, we acknowledge the importance of considering such a hypothesis.

      The behavioral response to call playback is intriguing, although again more in line with the hypothesis that these are not a distinct type of call but merely represent expected variation in vocalization parameters. Across the board animals respond rather similarly to hearing 22 kHz calls as they do to hearing 44 kHz calls, with occasional shifts of 44 kHz call responses to an intermediate between appetitive and aversive calls. This does raise interesting questions about how, ethologically, animals may interpret such variation and integrate this interpretation in their responses. However, the categorical approach employed here does not address these questions fully.

      Answer: We are unsure of the Reviewer’s critique in this paragraph and will attempt to address it to the best of our understanding. Our finding of up to >19% of long seemingly aversive, 44-kHz calls, at a frequency in the define appetitive ultrasonic range (usually >32 kHz) is unexpected rather than “expected”. We would agree that aversive call variation is expected, but not in the appetitive frequency range.

      Kindly note the findings by Saito et al. (2019), which claim that frequency band plays the main role in rat ultrasonic perception. It is possible that the higher peak frequency of 44kHz calls may be a strong factor in their perception by rats, which is, however, modified by the longer duration and the lack of modulation.

      Also, from our experience, it is quite challenging to demonstrate different behavioral responses of naïve rats to pre-recorded 22-kHz (aversive) vs. 50-kHz (appetitive) vocalizations. Therefore, to demonstrate a difference in response to two distinct, potentially aversive, calls, i.e., 22-kHz vs. 44-kHz calls, to be even more difficult (as to our knowledge, a comparable experiment between short vs. long 22-kHz ultrasonic vocalizations, has not been done before).

      Therefore, we do not take lightly the surprising and interesting finding that “animals respond rather similarly to hearing 22 kHz calls as they do to hearing 44 kHz calls, with occasional shifts of 44 kHz call responses to an intermediate between appetitive and aversive calls”. We would rather put this description in analogous words: “the rats responded similarly to hearing 44-kHz calls as they did to hearing aversive 22-kHz calls, especially regarding heartrate change, despite the 44-kHz calls occupying the frequency band of appetitive 50-kHz vocalizations” and “other responses to 44-kHz calls were intermediate, they fell between response levels to appetitive vs. aversive playback” – which we added to the Discussion.

      Finally, we acknowledge that our findings do not present a finite and complete picture of the discussed aspects of behavioral responses to the presented ultrasonic stimuli (44-kHz vocalizations). Therefore, we have incorporated the Reviewer’s suggestion in the discussion. The added sentence reads: “Overall, these initial results raise further questions about how, ethologically, animals may interpret the variation in hearing 22-kHz vs. 44-kHz calls and integrate this interpretation in their responses.”

      In sum, rather than describing the 44kHz long calls as a new call type, it may be more accurate to say that sometimes aversive calls can occur at frequencies above 22 kHz. Individual and situational variability in vocalization parameters seems to be expected, much more so than all members of a species strictly adhering to extremely non-variable behavioral outputs.

      Answer: The surprising fact that there are presumably aversive calls that are beyond the commonly applied thresholds, i.e. >32 kHz, while sharing some characteristics with 22-kHz calls, is the main finding of the current publication. Whether they be finally assigned as a new type, subtype, i.e. a separate category or become a supergroup of aversive calls with 22-kHz vocalizations is of secondary importance to be discussed with other researchers of the field of study.

      However, we would argue – by showing a comparison – that 22-kHz calls occur at durations of <300 ms and also >300 ms, and are, usually, referred to in literature as short and long 22-kHz vocalizations, respectively (not introduced with a description that “sometimes 22kHz calls can occur at durations below 300 ms”). These are then regarded and investigated as separate groups or classes usually referred to as two different “types” (e.g., Barker et al., 2010) or “subtypes” (e.g., Brudzynski, 2015). Analogously, 44-kHz vocalizations can also be regarded as a separate type or a subtype of 22-kHz calls. The problem with the latter is that 22-kHz vocalizations are traditionally and predominantly defined by 18–32 kHz frequency bandwidth (Araya et al., 2020; Barroso et al., 2019; Browning et al., 2011; Brudzynski et al., 1993; Hinchcliffe et al., 2022; Willey & Spear, 2013).

      Reviewer #2 (Public Review):

      Olszyński et al. claim that they identified a "new-type" ultrasonic vocalization around 44 kHz that occurs in response to prolonged fear conditioning (using foot-shocks of relatively high intensity, i.e. 1 mA) in rats. Typically, negative 22-kHz calls and positive 50-kHz calls are distinguished in rats, commonly by using a frequency threshold of 30 or 32 kHz. Olszyński et al. now observed so-called "44-kHz" calls in a substantial number of subjects exposed to 10 tone-shock pairings, yet call emission rate was low (according to Fig. 1G around 15%, according to the result text around 7.5%).

      Answer: We are thankful for praising the strengths. Please note Figure 1G referred to 10-trial Wistar rats during delay fear conditioning session in which 44-kHz constituted 14.1% of ultrasonic vocalizations. The 7.5% number in results refers to the total of vocalizations analyzed across all animal groups used in fear conditioning experiments. These values have been updated in the current version of the manuscript. Also, please note – 44-kHz calls constituted up to 19.4% of calls, on average, in one of the ITI during fear conditioning session. However, the prevalence of aversive calls and of 44-kHz vocalizations in particular varied. It varied between individual rats; we added the text: “for n = 3 rats, 44-kHz vocalizations accounted for >95% of all calls during at least one ITI (e.g., 140 of total 142, 222 of 231, and 263 of 265 tallied 44-kHz calls), and in n = 9 rats, 44-kHz vocalizations constituted >50% of calls in more than one ITI.” See also further for the description of the array of experiments analyzed and the prevalence/percentage of 44-kHz calls encountered (Tab. 1, Fig. 1S3).

      Weaknesses: I see a number of major weaknesses.

      While the descriptive approach applied is useful, the findings have only focused importance and scope, given the low prevalence of "44 kHz" calls and limited attempts made to systematically manipulate factors that lead to their emission. In fact, the data presented appear to be derived from reanalyses of previously conducted studies in most cases and the main claims are only partially supported. While reading the manuscript, I got the impression that the data presented here are linked to two or three previously published studies (Olszyński et al., 2020, 2021, 2023). This is important to emphasize for two reasons:

      (1) It is often difficult (if not impossible) to link the reported data to the different experiments conducted before (and the individual experimental conditions therein). While reanalyzing previously collected data can lead to important insight, it is important to describe in a clear and transparent manner what data were obtained in what experiment (and more specifically, in what exact experimental condition) to allow appropriate interpretation of the data. For example, it is said that in the "trace fear conditioning experiment" both single- and grouphoused rats were included, yet I was not able to tell what data were obtained in single- versus group-housed rats. This may sound like a side aspect, however, in my view this is not a side aspect given the fact that ultrasonic vocalizations are used for communication and communication is affected by the social housing conditions.

      Answer: Preparing the current manuscript, we indeed used data collected during fear conditioning experiments which were described previously (Olszyński et al., 2021; Olszyński et al., 2022). Please note, however, that vocalization behavior during the fear conditioning itself was not the main subject of these publications. Our previous publications (Olszyński et al., 2020; Olszyński et al., 2021; Olszyński et al., 2022) present primarily ultrasonic-vocalization data from playback-part of experiments whereas here we analyze recordings obtained during fear conditioning experiments, thus we are analyzing new parts, i.e., not yet analyzed, of previously published studies. Also, we have performed additional experiments.

      In the first version of the current manuscript, we did not attempt to demonstrate exactly which calls were recorded in which conditions as the focus was to demonstrate that 44-kHz calls were emitted in several different fear-conditioning experiments. Also, as the experiments were not performed simultaneously and are results from different experimental situations, we would prefer to not compare these results directly.

      However, in the current version of the manuscript, we have introduced an additional reference system, based on Tab. 1, to more clearly indicate which rats have been employed in each analysis, e.g. the group of “Wistar rats that undergone 10 trials of fear conditioning” are described as “Tab. 1/Exp. 1-3/#2,4,8,13; n = 46”, i.e., these are the rats listed in rows 2, 4, 8, and 13 of Tab. 1.

      We have also tried to unify the analyses, in terms of rats used, as much as possible. Finally, we have also introduced Fig. 1S3 to demonstrate the prevalence of 44-kHz calls in all experiments analyzed with the note that “the experiments were not performed in parallel”.

      Regarding the Reviewer’s concerns about analyzing single- and pair-housed rats together. We have examined ultrasonic vocalizations emitted and freezing behavior in these two groups.

      • Ultrasonic vocalizations; when comparing the number of vocalizations, their duration, peak frequency and latency to first occurrence, equally for all types of calls and divided into types (short 22-kHz, long 22-kHz, 44-kHz, 50-kHz), the only difference was observed in peak frequency in 50-kHz vocalizations (50.7 ± 2.8 kHz for paired vs. 61.8 ± 3.1 kHz for single rats; p = 0.0280, Mann-Whitney). Since 50-kHz calls are not the subject of the current publication, we did not investigate this difference further. Also, this difference was not observed during playback experiments (Olszyński et al., 2020, Tab. 1).

      • Freezing. There were no differences between single- and pair-housed groups in freezing behavior, both in the time before first shock presentation and during fear conditioning training (Mann-Whitney).

      In summary, since the two groups did not differ in relevant ultrasonic features and freezing, we decided to present the results obtained from these rats together. However, we agree with the Reviewer, and it is possible that social housing conditions may in fact affect the emission of 44-kHz vocalizations, which could be a subject of another project – involving, e.g., larger experimental groups observed under hypothesis-oriented and defined conditions.

      (2) In at least two of the previously published manuscripts (Olszyński et al., 2021, 2023), emission of ultrasonic vocalizations was analyzed (Figure S1 in Olszyński et al., 2021, and Fig. 1 in Olszyński et al., 2023). This includes detailed spectrographic analyses covering the frequency range between 20 and 100 kHz, i.e. including the frequency range, where the "newtype" ultrasonic vocalization, now named "44 kHz" call, occurs, as reflected in the examples provided in Fig. 1 of Olszyński et al. (2023). In the materials and methods there, it was said: "USV were assigned to one of three categories: 50-kHz (mean peak frequency, MPF >32 kHz), short 22-kHz (MPF of 18-32 kHz, <0.3 s duration), long 22-kHz (MPF of 18-32 kHz, >0.3 s duration)". Does that mean that the "44 kHz" calls were previously included in the count for 50-kHz calls? Or were 44 kHz calls (intentionally?) left out? What does that mean for the interpretation of the previously published data? What does that mean for the current data set? In my view, there is a lack of transparency here.

      Answer: As mentioned above, we indeed used data collected during fear conditioning experiments which were described previously (Olszyński et al., 2021; Olszyński et al., 2022). However, in these publications, ultrasonic vocalizations emitted during playback experiments were the main subject, while the ultrasonic calls emitted during fear conditioning (performed before the playback) were only analyzed in a preliminary way. As a result, the 44-kHz vocalizations analyzed in the current manuscript were not included in the previous analyses. In particular, in Olszyński et al. (2021), we counted the overall number of ultrasonic vocalizations before fear conditioning session to determine the basal ultrasonic emissions (Fig. S1). Then, our next article (Olszyński et al., 2022), we analyzed again the number of all ultrasonic vocalizations before fear conditioning (Fig. S1) and restricted the analysis of vocalizations during fear conditioning to 22-kHz calls (Tab. S1 and S2).

      Also, we re-reviewed all the data used in our previous playback publications. Overall, 44-kHz calls were extremely rare in playback parts of the experiments. There were no 44-kHz calls in the playback data used in Olszyński et al. (2022) and Olszyński et al. (2020). In Olszyński et al. (2021), one rat produced eight 44-kHz calls. These 44-kHz calls constituted 0.03% of all vocalizations analyzed in the experiment (8/24888) and were included in the total number of calls analyzed (but not in the 50-kHz group), they were not described in further detail in that publication.

      Moreover, whether the newly identified call type is indeed novel is questionable, as also mentioned by the authors in their discussion section. While they wrote in the introduction that "high-pitch (>32 kHz), long and monotonous ultrasonic vocalizations have not yet been described", they wrote in the discussion that "long (or not that long (Biały et al., 2019)), frequency-stable high-pitch vocalizations have been reported before (e.g. Sales, 1979; Shimoju et al., 2020), notably as caused by intense cholinergic stimulation (Brudzynski and Bihari, 1990) or higher shock-dose fear conditioning (Wöhr et al., 2005)" (and I wish to add that to my knowledge this list provided by the authors is incomplete). Therefore, I believe, the strong claims made in abstract ("we are the first to describe a new-type..."), introduction ("have not yet been described"), and results ("new calls") are not justified.

      Answer: We would argue that 44-kHz vocalizations were indeed reported but not described. As far as we are concerned, an in-depth analysis of the properties and experimental circumstance of emission of long, high-frequency calls has not yet been performed. These researchers have observed, at least to a degree, similar calls to the ones we observed – as we mentioned in the discussion section. However, since these reported 44-kHz vocalizations were not fully described, we can only guess that they may be similar to ours. We speculate that perhaps like us, these researchers unknowingly recorded 44-kHz calls in their experiments and may also be able to describe them more extensively when re-analyzing their data as we have done here.

      Possibly, it was difficult to find reports on vocalizations, similar to the 44-kHz calls that we observed, because of the canonical and accepted definitions of ultrasonic vocalization types. Biały et al. (2019) allocated them as a part of 22-kHz group, perhaps because their calls were often of a step variation having both low and high components. Shimoju et al. (2020) grouped them along with 50-kHz vocalizations because they appeared during stroking rats held vertically; this procedure was compared to tickling which usually elicits appetitive calls.

      The Reviewer #2 states there are other publications to complete the list. We are aware of other articles authored by the same team as Shimoju et al. (2020) with different first authors. However, they are reporting similar findings to the cited article. Otherwise, we would gladly cite a more complete list of publications showing atypical, long, monotonous highfrequency vocalizations, similar to those observed in our experiments. Therefore, we would argue that ultrasonic vocalizations which were long, flat, high in frequency, and repeatedly occurring in a defined behavioral situation, have not been reported before. However, concerning the strong claims of novelty of our finding, we toned them down where we found this was warranted.

      In general, the manuscript is not well written/ not well organized, the description of the methods is insufficient, and it is often difficult (if not impossible) to link the reported data to the experiments/ experimental conditions described in the materials and methods section.

      Answer: The description of the methods has been adjusted and expanded. We added the requested link to each particular experiment as a formula “Tab. 1/Exp. nos./# nos.” which shows, each time, which experiments and experimental groups were analyzed. The list of the experiments and groups is found in the Tab. 1.

      For example, I miss a clear presentation of basic information: 1) How many rats emitted "44 kHz" calls (in total, per experiment, and importantly, also per experimental condition, i.e. single- versus group-housed)?

      Answer: We now clearly show which experiments were performed and how many animals were tested in each condition (Tab. 1), while the prevalence of 44-kHz calls amongst experimental conditions and animal groups is shown in Fig. 1S3. Also, we included information regarding the number of animals and treatment of each group of rats when reporting results. For example, we are stating that:

      (1a) “53 of all 84 conditioned Wistar rats (Tab. 1/Exp. 1-3/#2,4,6-8,13, Figs 1B, 1E, 1S1BC) displayed” 44-kHz vocalizations – as a general assessment; these numbers are different from those in the first version of the Ms, when we are mentioning Wistar rats conditioned 6 or 10 times only.

      (1b) “From this group of rats (n = 46), n = 41 (89.1%) emitted long 22-kHz calls, and 32 of them (69.6%) emitted 44-kHz calls” – this time referring only to 10-times conditioned Wistar rats as the biggest group that could be analyzed together (Figs 1F, 1G, 1S2A).

      (1c) “for n = 3 rats, 44-kHz vocalizations accounted for >95% of all calls during at least one ITI (e.g., 140 of total 142, 222 of 231, and 263 of 265 tallied 44-kHz calls), and in n = 9 rats, 44kHz vocalizations constituted >50% of calls in more than one ITI.”

      (2) Out of the ones emitting "44 kHz" calls, what was the prevalence of "44 kHz" calls (relative to 22- and 50-kHz calls, e.g. shown as percentage)?

      Answer: The prevalence of 44-kHz vocalizations in all investigated experiments and groups is shown in Fig. 1S3CD. Also, more information regarding the percentage of 44-kHz calls was demonstrated in Fig. 1S2AB where we calculated the distribution of 44-kHz calls to 22-kHz calls in Wistar rats, in 10-trial fear conditioning, across the length of the session.

      Additionally, the values are listed in the sentence regarding all Wistar rats which underwent 10 trials of fear conditioning: “these vocalizations were less frequent following the first trial (1.2 ± 0.4% of all calls), and increased in subsequent trials, particularly after the 5th (8.8 ± 2.8%), through the 9th (19.4 ± 5.5%, the highest value), and the 10th (15.5 ± 4.9%) trials, where 44-kHz calls gradually replaced 22-kHz vocalizations in some rats (Fig. 1F, 1S2B, Video 1; comp Fig. 1D vs. 1E).”

      (3) How did this ratio differ between experiments and experimental conditions?

      Answer: The prevalence of 44-kHz vocalizations in all experimental conditions is shown in Fig. 1S3. However, the direct comparison of results obtained in different conditions was not the goal of the present work. Also, we would argue, that such direct comparisons of results of different experiments would not be allowed. These experiments were done with different groups of animals, at different times, with different timetables of experimental manipulations.

      However, we are comfortable to state that:

      • There were more 44-kHz vocalizations during fear conditioning training than testing in all fear-conditioned Wistar rats;

      • We observed more 44-kHz vocalizations in Wistar rats compared to SHR.

      (4) Was there a link to freezing? Freezing was apparently analyzed before (Olszyński et al., 2021, 2023) and it would be important to see whether there is a correlation between "44-kHz" calls and freezing. Moreover, it would be important to know what behavior the rats are displaying while such "44-kHz" calls are emitted? (Note: Even not all 22-kHz calls are synced to freezing.) All this could help to substantiate the currently highly speculative claims made in the discussion section ("frequency increases with an increase in arousal" and "it could be argued that our prolonged fear conditioning increased the arousal of the rats with no change in the valence of the aversive stimuli"). Such more detailed analyses are also important to rule out the possibility that the "new-type" ultrasonic vocalization, the so-called "44 kHz" call, is simply associated with movement/ thorax compression.

      Answer: We analyzed freezing behavior and its association with ultrasonic emissions. The emission of 44-kHz vocalizations was associated with freezing. The results are now described and presented in the manuscript, i.e., Tab. 2, its legend and the description in Results: “Freezing during the bins of 22-kHz calls only (p < 0.0001, for both groups) and during 44-kHz calls only bins (p = 0.0003) was higher than during the first 5 min baseline freezing levels of the session. Also, the freezing associated with emissions of 44-kHz calls only was higher than during bins with no ultrasonic vocalizations (p = 0.0353), and it was also 9.9 percentage points higher than during time bins with only long 22-kHz vocalizations, but the difference was not significant (p = 0.1907; all Wilcoxon)” and “To further investigate this potential difference, we measured freezing during the emission of randomly selected single 44-kHz and 22-kHz vocalizations. The minimal freezing behavior detection window was reduced to compensate for the higher resolution of the measurements (3, 5, 10, or 15 video frames were used). There was no difference in freezing during the emission of 44-kHz vs. 22-kHz vocalizations for ≥150ms-long calls (3 frames, p = 0.2054) and for ≥500-ms-long calls (5 frames, p = 0.2404; 10 frames, p = 0.4498; 15 frames, p = 0.7776; all Wilcoxon, Tab. 2B).”

      Please note, that the general observation that "frequency increases with an increase in arousal" is not our claim but a general rule derived from large body of observations and proposed by the others (Briefer et al., 2012); we changed the wording of this statement to: “frequency usually increases with an increase in arousal (Briefer et al., 2012)”.

      The figures currently included are purely descriptive in most cases - and many of them are just examples of individual rats (e.g. majority of Fig. 1, all of Fig. 2 to my understanding, with the exception of the time course, which in case of D is only a subset of rats ("only rats that emitted 44-kHz calls in at least seven ITI are plotted" - is there any rationale for this criterion?)), or, in fact, just representative spectrograms of calls (all of Fig. 3, with the exception of G, all of Fig. 4).

      Answer: Please note, the former figures 2, 4, 6, and 8 have been now moved to supplementary figures 1S1, 2S1, 3S1, and 4S1 – to better organize the presentation of data. Figures 1, 3, 5, 7 are now 1, 2, 3, 4 respectively. In regards to presenting data from individual rats, this was to show the general patterns of ultrasonic-calls distributions observed. Showing the full data set as seen in Fig. 5A (now Fig. 3A) would obscure the readability of the graph without using mathematical clustering techniques such as DBSCAN.

      Concerning the Reviewer’s #2 question regarding the criterion of “minimum seven ITI”, we selected the highest vocalizers by taking animals above the 75th percentile of the number of ITI with 44-kHz calls. However, in the current version of the manuscript, we decided to omit this part of the analysis and the accompanying part of the figure, since it did not provide any additional informative value (apart from employing questionable criterion).

      Moreover, the differences between Fig. 5 and Fig. 6 are not clear to me. It seems Fig. 5B is included three times - what is the benefit of including the same figure three times?

      Answer: We hope that designating Fig. 6 as supplementary to Fig. 5 (now Figs 3S1 and 3, respectively) will make interpreting them more streamlined. Fig. 6A (now Fig. 3S1A) is a more detailed look on information presented in Fig. 5B (now Fig. 3B) with spectrogram images of ultrasonic vocalizations from different areas of the plot. Also, Fig. 3B (former Fig. 5B) was removed from Fig. 3S1B (former Fig. 6B).

      A systematic comparison of experimental conditions is limited to Fig. 7 and Fig. 8, the figures depicting the playback results (which led to the conclusion that "the responses to 44-kHz aversive calls presented from the speaker were either similar to 22-kHz vocalizations or in between responses to 22-kHz and 50-kHz playbacks", although it remains unclear to me why differences were seen b e f o r e the experimental manipulation, i.e. the different playback types in Fig. 8B).

      Answer: There were indeed instances of such before-differences. Such differences were observed in our previous studies (Olszyński et al., 2020, Tabs S9-12; Olszyński et al., 2021, Tabs S7; Olszyński et al., 2022, Tabs S4, S9, S13, S17, S18) and were most likely due to analyzing multiple comparisons. However, we think that the carry-over effect, mentioned by the Reviewer #2 (see below), also played a role.

      Related to that, I miss a clear presentation of relevant methodological aspects: 1) Why were some rats single-housed but not the others?

      Answer: As stated before, data were collected from our previous experiments and the observation of 44-kHz vocalizations in fear conditioning was an emergent discovery as we decided to analyze ultrasonic recordings from fear conditioning procedures. Single-housed animals were part of our experiment comparing fear conditioning and social situation on the perception of ultrasonic playback as described in Olszyński et al. (2020). Aside from this experiment, all other rats were housed in pairs.

      (2) Is the experimental design of the playback study not confounded? It is said that "one group (n = 13) heard 50-kHz appetitive vocalization playback while the other (n = 16) 22-kHz and 44kHz aversive calls". How can one compare "44 kHz" calls to 22- and 50-kHz calls when "44 kHz" calls are presented together with 22-kHz calls but not 50-kHz calls? What about carry-over effects? Hearing one type of call most likely affects the response to the other type of call. It appears likely that rats are a bit more anxious after hearing aversive 22-kHz calls, for example. Therefore, it would not be very surprising to see that the response to "44 kHz" calls is more similar to 22-kHz calls than 50-kHz calls.

      Of note, in case of the other playback experiment it is just said that rats "received appetitive and aversive ultrasonic vocalization playback" but it remains unclear whether "44 kHz" calls are seen as appetitive or aversive. Later it says that "rats were presented with two 10-s-long playback sets of either 22-kHz or 44-kHz calls, followed by one 50-kHz modulated call 10-s set and another two playback sets of either 44-kHz or 22-kHz calls not previously heard" (and wonder what data set was included in the figures and how - pooled?). Again, I am worried about carry-over effects here. This does not seem to be an experimental design that allows to compare the response to the three main call types in an unbiased manner.

      Answer: We apologize for being confounding and brief in our original description of the playback experiments. We wanted to avoid confusion associated with including several additional playback signals (please note some are not related to the current comparisons and include different 50-kHz ultrasonic subtypes and two different subtypes of short 22-kHz calls). We lengthened the description of these playback experiments in the current version.

      In general, including more than one type of ultrasonic calls as playback has a risk of a carry-over effect as well as a habituation effect (the responses become weak). However, it greatly reduces the number of required animals. Finally, regarding the first experiment, we chose 3 playbacks to compare the rats’ reactions, as this was the most conservative choice we thought of.

      We would like to highlight that we wanted to compare specifically the rats’ responses to 22-kHz vs. 44-kHz playback (as well as the effects of playback of different subtypes 50-kHz calls, which is not the subject of the current work). Therefore, we would argue, that the design of both experiments is actually unbiased regarding this key comparison (responses to 22-kHz vs. 44-kHz playback). In both experiments, 22-kHz and 44-kHz playbacks were included in the same sequences of stimuli and counterbalanced regarding their order (i.e., taking into account possible carry-over effects), and presented to the same rats. We regarded the group of rats that heard 50-kHz recordings as a baseline/control, since we know from previous playback studies what reactions to expect from rats exposed to these vocalizations (and 22-kHz playback), while in the second experiment, we reduced the 50-kHz playback to one set in order to minimize possible habituation to multiple playbacks.

      We agree that the design of both experiments does not allow for full comparison of the effects of aversive playbacks to 50-kHz playback. Also, we agree that some carry-over effects could play a role. It was mentioned in the discussion: ”Please factor in potential carryover effects (resulting from hearing playbacks of the same valence in a row) in the differences between responses to 50-kHz vs. 22/44-kHz playbacks, especially, those observed before the signal (Fig. 4AB).” However, we would still argue that the observed lack of difference in heartrate response (Fig. 4A) and the differences regarding the number of 50-kHz calls emitted (e.g., Fig. 4S1F) are void of the constraints raised by the Reviewer #2.

      We acknowledge that our studies do not give a complete picture of 44-kHz ultrasonic perception in relation to other ultrasonic bands and, given the possibility, we would like to perform more in-depth and focused experiments to study this aspect of 44-kHz calls in the future.

      Finally, regarding the second experiment, the description of the rats now includes that they “received 22-kHz, 44-kHz, and 50-kHz ultrasonic vocalization playback”, while the description of the experiment itself includes: “Responses to the pairs of playback sets were averaged”.

      Of note, what exactly is meant by "control rats" in the context of fear conditioning is also not clear to me. One can think of many different controls in a fear conditioning experiment.

      More concrete information is needed.

      Answer: This information was included in our previous publications. However, it was now provided in the method section of the current version of the manuscript. In general, control rats were subjected to the same procedures but did not receive electric shocks.

      Literature included in the answers

      Araya, E. I., Baggio, D. F., Koren, L. O., Andreatini, R., Schwarting, R. K. W., Zamponi, G. W., & Chichorro, J. G. (2020). Acute orofacial pain leads to prolonged changes in behavioral and affective pain components. Pain, 161(12), 2830-2840. https://doi.org/10.1097/j.pain.0000000000001970

      Barker, D. J., Root, D. H., Ma, S., Jha, S., Megehee, L., Pawlak, A. P., & West, M. O. (2010). Dose-dependent differences in short ultrasonic vocalizations emitted by rats during cocaine self-administration. Psychopharmacology (Berl), 211(4), 435-442. https://doi.org/10.1007/s00213-010-1913-9

      Barroso, A. R., Araya, E. I., de Souza, C. P., Andreatini, R., & Chichorro, J. G. (2019). Characterization of rat ultrasonic vocalization in the orofacial formalin test: Influence of the social context. Eur Neuropsychopharmacol, 29(11), 1213-1226. https://doi.org/10.1016/j.euroneuro.2019.08.298

      Biały, M., Podobinska, M., Barski, J., Bogacki-Rychlik, W., & Sajdel-Sulkowska, E. M. (2019). Distinct classes of low frequency ultrasonic vocalizations in rats during sexual interactions relate to different emotional states. Acta Neurobiol Exp (Wars), 79(1), 1-12. https://www.ncbi.nlm.nih.gov/pubmed/31038481

      Briefer, E. F., Padilla de la Torre, M., & McElligott, A. G. (2012). Mother goats do not forget their kids' calls. Proc Biol Sci, 279(1743), 3749-3755. https://doi.org/10.1098/rspb.2012.0986

      Browning, J. R., Browning, D. A., Maxwell, A. O., Dong, Y., Jansen, H. T., Panksepp, J., & Sorg, B. A. (2011). Positive affective vocalizations during cocaine and sucrose self administration: a model for spontaneous drug desire in rats. Neuropharmacology, 61(1-2), 268-275. https://doi.org/10.1016/j.neuropharm.2011.04.012

      Brudzynski, S. M. (2015). Pharmacology of Ultrasonic Vocalizations in adult Rats: Significance, Call Classification and Neural Substrate. Curr Neuropharmacol, 13(2), 180-192. https://doi.org/10.2174/1570159x13999150210141444

      Brudzynski, S. M., & Bihari, F. (1990). Ultrasonic vocalization in rats produced by cholinergic stimulation of the brain. Neurosci Lett, 109(1-2), 222-226. https://doi.org/10.1016/0304-3940(90)90567-s

      Brudzynski, S. M., Bihari, F., Ociepa, D., & Fu, X. W. (1993). Analysis of 22 kHz ultrasonic vocalization in laboratory rats: long and short calls. Physiol Behav, 54(2), 215-221. https://doi.org/10.1016/0031-9384(93)90102-l

      Hinchcliffe, J. K., Jackson, M. G., & Robinson, E. S. (2022). The use of ball pits and playpens in laboratory Lister Hooded male rats induces ultrasonic vocalisations indicating a more positive affective state and can reduce the welfare impacts of aversive procedures. Lab Anim, 56(4), 370-379. https://doi.org/10.1177/00236772211065920

      Matochik, J. A., White, N. R., & Barfield, R. J. (1992). Variations in scent marking and ultrasonic vocalizations by Long-Evans rats across the estrous cycle. Physiol Behav, 51(4), 783-786. https://doi.org/10.1016/0031-9384(92)90116-j

      Olszyński, K. H., Polowy, R., Małż, M., Boguszewski, P. M., & Filipkowski, R. K. (2020). Playback of Alarm and Appetitive Calls Differentially Impacts Vocal, Heart-Rate, and Motor Response in Rats. iScience, 23(10), 101577. https://doi.org/10.1016/j.isci.2020.101577

      Olszyński, K. H., Polowy, R., Wardak, A. D., Grymanowska, A. W., & Filipkowski, R. K. (2021). Increased Vocalization of Rats in Response to Ultrasonic Playback as a Sign of Hypervigilance Following Fear Conditioning. Brain Sci, 11(8). https://doi.org/10.3390/brainsci11080970

      Olszyński, K. H., Polowy, R., Wardak, A. D., Grymanowska, A. W., Zieliński, J., & Filipkowski, R. K. (2022). Spontaneously hypertensive rats manifest deficits in emotional response to 22-kHz and 50-kHz ultrasonic playback. Prog Neuropsychopharmacol Biol Psychiatry, 120, 110615. https://doi.org/10.1016/j.pnpbp.2022.110615

      Saito, Y., Tachibana, R. O., & Okanoya, K. (2019). Acoustical cues for perception of emotional vocalizations in rats. Scientific Reports, 9(1), 10539.

      Sales, G. D. (1979). Strain Differences in the Ultrasonic Behavior of Rats (Rattus norvegicus) Am Zool, 19(2), 513-527. https://www.jstor.org/stable/3882331

      Shimoju, R., Shibata, H., Hori, M., & Kurosawa, M. (2020). Stroking stimulation of the skin elicits 50-kHz ultrasonic vocalizations in young adult rats. J Physiol Sci, 70(1), 41. https://doi.org/10.1186/s12576-020-00770-1

      Silkstone, M., & Brudzynski, S. M. (2019a). The antagonistic relationship between aversive and appetitive emotional states in rats as studied by pharmacologically-induced ultrasonic vocalization from the nucleus accumbens and lateral septum. Pharmacology Biochemistry and Behavior, 181, 77-85. https://doi.org/10.1016/j.pbb.2019.04.009

      Silkstone, M., & Brudzynski, S. M. (2019b). Intracerebral injection of R-(-)-Apomorphine into the nucleus accumbens decreased carbachol-induced 22-kHz ultrasonic vocalizations in rats. Behavioural Brain Research, 364, 264-273. https://doi.org/10.1016/j.bbr.2019.01.044

      Willey, A. R., & Spear, L. P. (2013). The effects of pre-test social deprivation on a natural reward incentive test and concomitant 50 kHz ultrasonic vocalization production in adolescent and adult male Sprague-Dawley rats. Behav Brain Res, 245, 107-112. https://doi.org/10.1016/j.bbr.2013.02.020

      Wöhr, M., Borta, A., & Schwarting, R. K. (2005). Overt behavior and ultrasonic vocalization in a fear conditioning paradigm: a dose-response study in the rat. Neurobiol Learn Mem, 84(3), 228-240. https://doi.org/10.1016/j.nlm.2005.07.004

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Additional considerations:

      The discussion of the "perfect fifth" and the proposition that this observation could be evidence of an evolutionary mechanism underlying it is rather far-fetched, especially for being presented in the Results section (with no supporting non-anecdotal evidence).

      Answer: We agree with the Reviewer #1. The text was modified, the word “evolutionary” was deleted. Instead, we expended on the possible reason for prevalence of the perfect fifth in the current version of the manuscript; we added that the prevalence of the perfect fifth: “could be explained by the observation that all physical objects capable of producing tonal sounds generate harmonic vibrations, the most prominent being the octave, perfect fifth, and major third (Christensen, 1993, discussed in Bowling and Purves, 2015).”

      It is not clear why Sprague-Dawleys were used as "receivers" in the playback experiment, when presumably the calls were recorded from Wistars and SHRs. While this does not critically impact the conclusions, within the species rats should be able to respond appropriately to calls made by rats of different genetic backgrounds, it adds an unnecessary source of variance.

      Answer: Sprague-Dawley rats were used to test another normotensive strain of rats. Regarding the Reviewer’s main point – we beg to differ as we think that it is worth testing playback stimuli in different strains. Diverging the stimuli between different rat strains would add unnecessary variance and it seemed logical to use the same recordings to test effects in different strains. Please note that finally, in spite of this additional variance, the results of both playback experiments are, in general, similar – which may point to a universal effect of 44-kHz playback across rat strains.

      It is pertinent to note that for the trace fear conditioning experiment, the rats had previously been exposed to a vocalization playback experiment. While such a pre-exposure is unlikely to be a very strong stressor, the possibility for it to influence the vocal behaviors of these rats in later experiments cannot be ruled out. It is also not clear what the control rats in this experiment experienced (home cage only?), nor what they were used for in analyses.

      Answer: In the current version of the manuscript, we have described in greater detail all the experiments performed and analyzed. We would like to emphasize that both delay and trace fear conditioning experiments with radiotelemetric transmitters were not performed specifically to elicit any particular response during fear conditioning, rather that our observation of 44-kHz vocalizations emerged as a result of re-examining the audio recordings. As a result, this work summarizes our observations of 44-kHz calls from several different experiments. It is relevant to note, that 44-kHz vocalizations were observed “in rats which were exposed to vocalization playback experiment”, in rats before the playback experiments as well as in naïve rats, without transmitters implemented, trained in fear conditioning (Tab. 1/Exp. 1-3).

      Our main message is that 44-kHz vocalizations were present in several experiments, with different conditions and subjects, while we are not attempting to compare in detail the results across the different experiments. In other words, we agree that pre-exposure to playback (and even more likely – transmitters implantation) could influence, but are not necessary, for 44-kHz ultrasonic emissions by the rats. To demonstrate this, we added a prolonged fear conditioning group with naïve Wistar rats (Exp. 3) to verify the emission of 44kHz calls in the absence of those experimental factors.

      We modified the methods section to clarify the circumstances under which these discoveries were made, such as including the information regarding the control rats in trace fear conditioning. In particular we mention that: “Control rats were subjected to the exact same procedures but did not receive the electric shock at the end of trace periods”.

      For Figure 1A-E, only example call distributions from individual rats are shown. It would perhaps be more informative to see the full data set displayed in this manner, with color/shape codes distinguishing individuals if desired.

      Answer: Please note the Fig. 1S1 shows more examples of ultrasonic call distribution. Showing all the data would make it more difficult to read and interpret. The problem is partly amended in Fig. 3A.

      It is not clear what is presented in Figure 2D vs. E, i.e. panel D is shown only for "selected rats" but the legend does not clarify how and why these rats were selected. It is also not clear why the legend reports p-values for both Friedman and Wilcoxon tests; the latter is appropriate for paired data which seems to be the case when the question is whether the call peak frequency alters across time, but the Friedman assumes non-paired input data.

      Answer: The question refers to the current Fig. 1S2C panel (former Fig. 2E panel) and the former Fig. 2D panel. The latter was not included in the current version of the manuscript, since both reviewers opposed the presentation of “selected rats” only (see above). The full description of the Fig. 1S2C panel is now in the results section together with p-values for Friedman and Wilcoxon test. We used the latter to investigate the difference between the first and the last ITI (selected paired data), while the Friedman to investigate the presence of change within the chain of ten ITI – since it is a suitable test for a difference between two or more paired samples.

      Reviewer #2 (Recommendations For The Authors):

      The weaknesses listed in the public review need to be addressed.

      Answer: We have done our best to address the weaknesses.

      Notes: 1) Page and line numbers would have been useful.

      Answer: We are including a separate manuscript version with page and line numbers.

      .(2) English language needs to be improved.

      Answer: The text has been checked by two native English speakers (one with a scientific background). Both only identified minor changes to improve the text which we applied.

      (3) I am a bit unsure whether the comment about the Star Wars movie (1997) and the Game of Thrones series (2011) is supposed to be a joke.

      Answer: These are indeed two genuine examples of the perfect fifth in human music that we hope are easily recognizable and familiar to readers. Parts of the same examples of the perfect fifth can also heard in the rat voice files provided.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      During the last decades, extensive studies (mostly neglected by the authors), using in vitro and in vivo models, have elucidated the five-step mechanism of intoxication of botulinum neurotoxins (BoNTs). The binding domain (H chain) of all serotypes of BoNTs binds polysialogangliosides and the luminal domain of a synaptic vesicle protein (which varies among serotypes). When bound to the synaptic membrane of neurons, BoNTs are rapidly internalized by synaptic vesicles (SVs) via endocytosis. Subsequently, the catalytic domain (L chain) translocates, a process triggered by the acidification of these organelles. Following translocation, the disulfide bridge connecting the H chain with the L chain is reduced by the thioredoxin reductase/thioredoxin system, and it is refolded by the chaperone Hsp90 on SV's surface. Once released into the cytosol, the L chains of different serotypes cleave distinct peptide bonds of specific SNARE proteins, thereby disrupting neurotransmission. In this study, Yeo et al. extensively revise the neuronal intoxication model, suggesting that BoNT/A follows a more complex intracellular route than previously thought. The authors propose that upon internalization, BoNT/A-containing endosomes are retro-axonally trafficked to the soma. At the level of the neuronal soma, this serotype then traffics to the endoplasmic reticulum (ER) via the Golgi apparatus. The ER SEC61 translocon complex facilitates the translocation of BoNT/A's LC from the ER lumen into the cytosol, where the thioredoxin reductase/thioredoxin system and HSP complexes release and refold the catalytic L chain. Subsequently, the L chain diffuses and cleaves SNAP25 first in the soma before reaching neurites and synapses. Strengths:

      I appreciate the authors' efforts to confirm that the newly established methods somehow recapitulate aspects of the BoNTs mechanism of action, such as toxin binding and uptake occurring at the level of active synapses. Furthermore, even though I consider the SNAPR approach inadequate, the genome-wide RNAi screen has been well executed and thoroughly analyzed. It includes well-established positive and negative controls, making it a comprehensive resource not only for scientists working in the field of botulinum neurotoxins but also for cell biologists studying endocytosis more broadly. Weaknesses:

      I have several concerns about the authors' main conclusions, primarily due to the lack of essential controls and validation for the newly developed methods used to assess toxin cleavage and trafficking into neurons. Furthermore, there is a significant discrepancy between the proposed intoxication model and existing studies conducted in more physiological settings. In my opinion, the authors have omitted over 20 years of work done in several labs worldwide (Montecucco, Montal, Schiavo, Rummel, Binz, etc.). I want to emphasize that I support changes in biological dogma only when these changes are supported by compelling experimental evidence, which I could not find in the present manuscript.

      We thank the reviewer for his reading and comments and for pointing out the discrepancy between our proposed model and the existing model. However, we respectfully disagree with the phrase of “extensive studies have elucidated the five-steps mechanism of intoxication…”. This sentence and the following imply that the model is well-established and demonstrated. It also highlights how the reviewer is convinced about this previous model.

      We contest this model for theoretical reasons and contest the strength of evidences that support it. We previously included references to previous work showing that the model is also being challenged by others. In light of the reviewer’s comments, we incluced more references in the introduction and we also explicit our main theoretical concern in the introduction:

      “Arguably, the main problem of the model is its failure to propose a thermodynamically consistent explanation for the directional translocation of a polypeptidic chain across a biologial membrane. Other known instances of polypeptide membrane translocation such as the co-translational translocation into the ER indicate that it is an unfavorable process, which consumes significant energy (Alder and Theg 2003). ”

      We also added the following text in the Discussion to address with the reviewer’s concerns: “Our study contradicts the long-established model of BoNT intoxication, which is described in several reviews specifically dedicated to the subject 1–4. In short, these reviews support the notion that BoNT are molecular machines able to mediate their own translocation across membranes; this notion has convinced some cell biologists interested in toxins and retrograde traffic, who describe BoNT mode of translocation in their reviews 5,6.

      But is this notion well supported by data? A careful examination of the primary literature reveals that early studies indeed report that BonTs form ion channels at low pH values 7,8. These studies have been extended by the use of patch-clamp 9,10. These works and others lead to various suppositions on how the toxin forms a channel and translocate the LC 1,11 .

      However, only a single study claims to reconstitute in vitro the translocation of BonT LC across membranes 12. In this paper, the authors report using a system of artificial membranes separating two aqueous compartments. They load the toxin in the cis compartment and measure the protease activity in the trans compartment after incubation. However, when the experimental conditions described are actually converted in terms of molarity, it appears that the cis compartment was loaded at 10e-8M BonT and that the reported translocated protease activity is equivalent to 10e-17 M (Figure 3D, 12). Thus, in this experiment, about 1 LC molecule in 100 millions has crossed the membrane. Such extremely low transfert rate does not tally with the extreme efficiency of intoxication in vivo, even while taking into account the difference between artificial and biological membranes.

      In sum, a careful analysis of the primary literature indicate that while there is ample evidence that BoNTs have the ability to affect membranes and possibly create ion channels, there is actually no credible evidence that these channels mediate translocation of the LC. As mentioned earlier, it is not clear how such a self-translocation mechanism would function thermodynamically. By contrast, our model proposes a mechanism without a thermodynamic problem, is consistent with current knowledge about other protein toxins, such as PE, Shiga and Ricin, and can help explain previously puzzling features of BonT effects. It is worth noting that a similar self-translocation model was proposed for other protein toxins such as Pseudomonas exotoxin, which have similar molecular organisation as BonT (68). However, it has since been demonstrated that the PE toxins require cellular machinery, in particular in the ER, for intoxication (21,69,70).”

      Reviewer #2 (Public Review):

      Summary:

      The study by Yeo and co-authors addresses a long-lasting issue about botulinum neurotoxin (BoNT) intoxication. The current view is that the toxin binds to its receptors at the axon terminus by its HCc domain and is internalized in recycled neuromediator vesicles just after the release of the neuromediators. Then, the HCn domain assists the translocation of the catalytic light chain (LC) of the toxin through the membrane of these endocytic vesicles into the cytosol of the axon terminus. There, the LC cleaves its SNARE substrate and blocks neurosecretion. However, other views involving kinetic aspects of intoxication suggest that the toxin follows the retrograde axonal transport up to the nerve cell body and then back to the nerve terminus before cleaving its substrate.

      In the current study, the authors claim that the BoNT/A (isotype A of BoNT) not only progresses to the cell body but once there, follows the retrograde transport trafficking pathway in a retromer-dependent fashion, through the Golgi apparatus, until reaching the endoplasmic reticulum. Next, the LC dissociates from the HC (a process not studied here) and uses the translocon Sec61 machinery to retro-translocate into the cytosol. Only then, does the LC traffic back to the nerve terminus following the anterograde axonal transport. Once there, LC cleaves its SNARE substrate (SNAP25 in the case of BoTN/A) and blocks neurosecretion.

      To reach their conclusion, Yeo and co-authors use a combination of engineered tools: a cell line able to differentiate into neurons (ReNcell VN), a reporter dual fluorescent protein derived from SNAP25, the substrate of BoNT/A (called SNAPR), the use of either native BoNT/A or a toxin to which three fragment 11 of the reporter fluorescent protein Neon Green (mNG) are fused to the N-terminus of the LC (BoNT/A-mNG11x3), and finally ReNcell VN transfected with mNG1-10 (a protein consisting of the first 10 beta strands of the mNG).

      SNAPR is stably expressed all over in the ReNcell VN. SNAPR is yellow (red and green) when intact and becomes red only when cleaved by BoNT/A LC, the green tip being degraded by the cell. When the LC of BoNT/A-mNG11x3 reaches the cytosol in ReNcell VN transfected by mNG1-10, the complete mNG is reconstituted and emits a green fluorescence.

      In the first experiment, the authors show that the catalytic activity of the LC appears first in the cell body of neurons where SNAPR is cleaved first. This phenomenon starts 24 hours after intoxication and progresses along the axon towards the nerve terminus during an additional 24 hours. In a second experiment, the authors intoxicate the ReNcell VN transfected by mNG1-10 using the BoNT/A-mNG11x3. The fluorescence appears also first in the soma of neurons, then diffuses in the neurites in 48 hours. The conclusion of these two experiments is that translocation occurs first in the cell body and that the LC diffuses in the cytosol of the axon in an anterograde fashion.

      In the second part of the study, the authors perform a siRNA screen to identify regulators of BoNT/A intoxication. Their aim is to identify genes involved in intracellular trafficking of the toxin and translocation of the LC. Interestingly, they found positive and negative regulators of intoxication. Regulators could be regrouped according to the sequential events of intoxication.

      Genes affecting binding to the cell-surface receptor (SV2) and internalization. Genes involved in intracellular trafficking. Genes involved in translocation such as reduction of the disulfide bond linking the LC to the HC and refolding in the cytosol. Genes involved in signaling such as tyrosine kinases and phosphatases. All these groups of genes may be consistent with the current view of BoNT intoxication within the nerve terminus. However, two sets of genes were particularly significant to reach the main conclusion of the work and definitely constitute an original finding important to the field. One set of genes consists of those of the retromer, and the other relates to the Sec61 translocon. This should indicate that once endocytosed, the BoNT traffics from the endosomes to the Golgi apparatus, and then to the ER. Ultimately, the LC should translocate from the ER lumen to the cytosol using the Sec61 translocon. The authors further control that the SV2 receptor for the BoNT/A traffics along the axon in a retromer-dependent fashion and that BoNT/A-mNG11x3 traverses the Golgi apparatus by fusing the mNG1-10 to a Golgi resident protein.

      Strengths:

      The findings in this work are convincing. The experiments are carefully done and are properly controlled. In the first part of the study, both the activity of the LC is monitored together with the physical presence of the toxin. In the second part of the work, the most relevant genes that came out of the siRNA screen are checked individually in the ReNcell VN / BoNT/A reporter system to confirm their role in BoNT/A trafficking and retro-translocation.

      These findings are important to the fields of toxinology and medical treatment of neuromuscular diseases by BoNTs. They may explain some aspects of intoxication such as slow symptom onset, aggravation, and appearance of central effects.

      Weaknesses:

      The findings antagonize the current view of the intoxication pathway that is sustained by a vast amount of observations. The findings are certainly valid, but their generalization as the sole mechanism of BoNT intoxication should be tempered. These observations are restricted to one particular neuronal model and engineered protein tools. Other models such as isolated nerve/muscle preparations display nerve terminus paralysis within minutes rather than days. Also, the tetanus neurotoxin (TeNT), whose mechanism of action involving axonal transport to the posterior ganglia in the spinal cord is well described, takes between 5 and 15 days. It is thus possible that different intoxication mechanisms co-exist for BoNTs or even vary depending on the type of neurons.

      Although the siRNA experiments are convincing, it would be nice to reach the same observations with drugs affecting the endocytic to Golgi to ER transport (such as Retro-2, golgicide or brefeldin A) and the Sec61 retrotranslocation (such as mycolactone). Then, it would be nice to check other neuronal systems for the same observations.

      We thank the reviewer for the careful reading and comments of our manuscript. The reference to “a vast amount of observation” is a similar argument to the Reviewer 1 and used to suggest that our study may not be applicable as a general mechanism.

      We respectfully disagree as described above and posit on the contrary that the model we propose is much more likely to be general than the model presented in current reviews for the several reasons cited (see added text in Introduction and Discussion). While we agree that more work is needed to confirm the proposed mechanisms of BonT translocation in other models, these experiments fall outside the perimeter of our study.

      The fact that nerve/muscle preparations of BonT activity have relatively fast kinetics does not pose a contradiction to our model. Our model reveals primarily the requirement for trafficking to the ER membranes. This ER targeting requires trafficking through the Golgi complex, in turn explaining the requirement for trafficking to the soma of neurons in the experimental system we used. However, in neuronal cells in vivo, Golgi bodies can be found along the lenght of the axon, thus BonT may not always require trafficking to the soma of the affected cells. The time required for intoxication could thus vary greatly depending on the neuronal structural organisation.

      TenT is proposed to transfer from excitatory neurons into inhibitory neurons before exerting its action. While the detailed mechanism of this fascinating mechanism remain to be explored, it clearly falls beyond the purview of this manuscript.

      Regarding the use of drugs, we agree that it would be a nice addition; unfortunately we are unable to perform such experiments at this stage. Setting up a large scale siRNA screen for BonT mechanism of action is challenging as it requires a special facility with controlled access and police authorisation (in Singapore) given the high toxicity of this molecule. Unfortunately, the authorisations have now lapsed.

      Reviewer #3 (Public Review): Summary:

      The manuscript by Yao et al. investigates the intracellular trafficking of Botulinum neurotoxin A (BoNT/A), a potent toxin used in clinical and cosmetic applications. Contrary to the prevailing understanding of BoNT/A translocation into the cytosol, the study suggests a retrograde migration from the synapse to the soma-localized Golgi in neurons. Using a genome-wide siRNA screen in genetically engineered neurons, the researchers identified over three hundred genes involved in this process. The study employs organelle-specific split-mNG complementation, revealing that BoNT/A traffics through the Golgi in a retromer-dependent manner before moving to the endoplasmic reticulum (ER). The Sec61 complex is implicated in the retro-translocation of BoNT/A from the ER to the cytosol. Overall, the research challenges the conventional model of BoNT/A translocation, uncovering a complex route from synapse to cytosol for efficient intoxication. The findings are based on a comprehensive approach, including the introduction of a fluorescent reporter for BoNT/A catalytic activity and genetic manipulations in neuronal cell lines. The conclusions highlight the importance of retrograde trafficking and the involvement of specific genes and cellular processes in BoNT/A intoxication.

      Strengths:

      The major part of the experiments are convincing. They are well-controlled and the interpretation of their results is balanced and sensitive.

      Weaknesses:

      To my opinion, the main weakness of the paper is in the interpretation of the data equating loss of tGFP signal (when using the Red SNAPR assay) with proteolytic cleavage by the toxin. Indeed, the first step for loss of tGFP signal by degradation of the cleaved part is the actual cleavage. However, this needs to be degraded (by the proteasome, I presume), a process that could in principle be affected (in speed or extent) by the toxin.

      We thank the reviewer for his comments and careful reading of our manuscript.

      Regarding the read-out of the assay, we agree that the assay could be sensitive to alteration in the protein degradation pathway. We have added the following sentence in the Discussion to take it into account:

      “As noted by one reviewer, the assay may be sensitive to perturbation in the general rate of protein degradation, a consideration to keep in mind when evaluating the results of large scale screens.”

      While this may be valid for some hits in the general list, it is important to note that the main hits have been shown to affect toxin trafficking by an independent, orthogonal assay based on the split GFP reconstitution.

      Recommendations to authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) To assess the activity of BoNT/A in neurons, Yeo et al. have generated a neuronal stem line referred to as SNAPR. This cell line stably expresses a chimeric reporter protein that consists of SNAP25 flanked at its N-terminus with a tagRFPT and at its C-terminus with a tagGFP. After exposure to BoNT/A, SNAP25 is cleaved and, the C-terminal tGFP-containing moiety is rapidly degraded. I have many doubts about the validity of the described method. Indeed, BoNT/A activity is analysed in an indirect way by quantifying the degradation of the GFP moiety generated after toxin cleavage (Fig. 2). In this regard, the authors should consider that their approach is dependent, not only on the toxin's metalloprotease activity but also on the functionality of the proteasome in neurons. Therefore, considering the current dataset, it is impossible to rule out the possibility that the progression of GFP signal loss from the soma to the neurite terminals may be attributed to the different proteasome activity in these compartments. Is it conceivable that the GFP fragment generated upon toxin cleavage degrades more rapidly in the soma in comparison to axonal terminals? This alternative explanation could challenge the conclusion drawn in Fig. 2.

      The reviewer’s alternative explanation disregards the experiments performed with the split-GFP complementation approach, which indicate translocation in the soma first. The split GFP reporter is not dependent on the proteasome activity. It also disregard the genetic data implicating many genes involved in membrane retrograde traffic, which are also not consistent with the hypothesis of the reviewer. These genes depletions not only affect SNAPR degradation but also BoNT/A-mNG11 trafficking: thus, their effect cannot be attributed to an completely hypothetical spatial heterogeneous distribution of the proteasome.

      For this reason, I strongly suggest using a more physiological approach that does not depend on proteasomal degradation or on the expression of the sensor in neurons. The authors should consider performing a time course experiment following intoxication and staining BoNT/A-cleaved SNAP25 by using specific antibodies (see Antonucci F. et al., Journal of Neuroscience, 2008 or Rheaume C. et al., Toxins 2015).

      For the above reason, we do not agree with the pressing importance of confirming by a third method using specific antibodies; especially considering that BonT is very difficult to detect in cells when incubated at physiological levels. By the way, the cited paper, by Antonucci F; et al. documents long distance retrograde traffic of BonT/A, which is in line with our data.

      An alternative approach could involve the use of microfluidic devices that physically separate axons from cell bodies. Such a separation will allow us to test the authors' primary conclusion that SNAP25 is initially cleaved in the soma. The suggested experiments will also rule out potential overexpression artifacts that could influence the authors' conclusions when using the newly developed SNAPR approach. Without these additional experiments, the authors' main conclusion that SNAP25 is cleaved first in the neuronal soma rather than at the nerve terminal is inadequate.

      As discussed above we disagree about the doubts raised by the reviewer: we present three types of evidences (SNAPR, split GFP and genetic hits) and they all point in the same direction. Thus, we respectfully doubt that a fourth approach would convince this reviewer. To note, we have attempted to use microfluidics devices as suggested by the reviewer, however, the Ren-VM neurons were not able to extend axons long enough across the device.

      (2) To detect BoNT/A translocation into the cytosol, the authors have used a complementation assay by intoxicating ReNcell VM cell expressing a cytosolic HA-tagged split monomeric NeonGreen (Cyt-mNG1-10) with an engineered BoNT/A, where the catalytic domain (LC) was fused to mNG1-11. When drawing conclusions regarding the detection of cytosolic LC in the neuronal soma, the authors should highlight the limitations of this assay and explicitly describe them to the readers. Firstly, the authors need to investigate whether the addition of mNG1-11 to the LC affects the translocation process itself (by comparing with a WT, not tagged, LC).

      Additionally, from the data shown in Fig. 2C, it is evident that the Cyt-mNG1-10 is predominantly expressed in the cytosol and less detected in neurites. This raises the question of whether there might be a bias for the cell soma in this assay. To address this important concern, I suggest quantifying MFI per cell (Fig. 2D) taking into consideration the amount of HA-tagged Cyt-mNG1-10. Furthermore, I strongly suggest targeting mNG1-10 to synapses and performing a similar time course experiment to observe when LC translocation occurs at nerve terminals. Alternative experiments, to prove that BoNT/A requires retrograde trafficking before it can translocate, may be done to repeat the experiments shown in Fig. 2D in the presence of inhibitors (or by KD some of the hits identified as microtubule stabilizers) that should interfere with BoNT/A trafficking to the neuronal somata. Without these additional experiments, the authors' main conclusion that the BoNT/A catalytic domain is first detected in the neuronal soma rather than at the nerve terminal is very preliminary.

      Similarly as for the SNAPR assay, the reviewer is raising the level of doubt to very high levels. We respect his thoroughness and eagerness to question the new model. However, we note that a similar level of scrutiny does not apply to the prevalent competitive model. Indeed, the data supporting the self-translocation model is based on a single in vitro experiment published in one panel as we have explain din the discussion (see above).

      (3) In the genome-wide RNAi screening, rather than solely assessing SV2 surface levels, it would have been beneficial to directly investigate BoNT/A binding to the neuronal membrane. For instance, this could have been achieved by using a GFP-tagged HC domain of BoNT/A. At present, the authors cannot exclude the possibility that among the 135 hits that did not affect SV2 levels, some might still inhibit BoNT/A binding to the neuronal surface. These concerns, already exemplified by B4CALT4 (which is known to be involved in the synthesis of GT1b), should be explicitly addressed in the main text.

      We agree with the reviewer that perturbation of binding of BonT is possible. We added the following text:

      “Network analysis reveals regulators of signaling, membrane trafficking and thioreductase redox state involved in BoNT/A intoxication

      Among the positive regulators of the screen, 135 hits did not influence significantly surface SV2 levels and are thus likely to function in post-endocytic processes (Supplementary Table 2). However, we cannot formerly exclude that they could affect binding of BonT to the cell surface independently of SV2.”

      (4) The authors should clearly state which reagents they have tried to use in order to explain the challenges they faced when directly testing the trafficking of BoNT/A. The accumulation of Dendra-SV2 bulbous structures at the neurite tips in VPS35-depleted cells could be interpreted as a sign of neuronal stress/death. Have the authors investigated other proteins that do not undergo retro-axonal trafficking in a retromer-dependent manner? This control is essential. In this regard, the use of a GFP-tagged HC domain of BoNT/A could prove to be quite helpful.

      We tried multiple commercially available antibodies against BonT but we could not get a very good signal. The postdoc in charge of this project has now gone to greener pastures and we are not in the capacity to provide the details corresponding to these antibodies. We di dnot observe significant cell death after VPS-35 knockdown at the time of the experiment, however longe rterm treatment might result in toxicity indeed.

      (5) Considering my concerns related to the SNAPR system and the complementation assay to study SNAP25 cleavage and BoNT/A trafficking, I suggest validating some of their major hits (ex. VPS34 and Sec61) by performing WB or IF analysis to examine the cleavage of endogenous SNAP25. Furthermore, the authors should test VPS35 depletion in the context of the experiments performed in Fig. 6G-H, by validating that this protein is essential for BoNT/A retrograde trafficking.

      The reviewer concerns are well noted but as discussed above, the two systems we used are completely orthogonal. Thus, for the reviewer’s concerns to be valid, it would have to be two completely independent artefacts giving rise to the same result. The alternative explanation is that BonT/A translocates in the soma. The Ockham razor principle dictates that the simplest explanation is the likeliest.

      (6) The introduction and the discussion section of this paper completely disregard more than 20 years of research conducted by several labs worldwide (Montecucco, Montal, Schiavo, Rummel, Binz, etc). The authors should make an effort to contextualize their data within the framework of these studies and address the significant discrepancies between their proposed intoxication model and existing research that clearly demonstrates BoNTs translocating upon the endocytic retrieval of SVs at presynaptic sites. Nevertheless, even assuming that the model proposed by the authors is accurate, numerous questions emerge. One such question is: How can the authors explain the exceptional toxicity of botulinum neurotoxin in an ex vivo neuromuscular junction preparation devoid of neuronal cell bodies (see Cesare Montecucco and Andreas Rummel's seminal studies)?

      Please see above in the answer to public reviews.

      (7) Scale bars should be added to all representative pictures.

      This has been done. Thank you for the thorough reading of our manuscript.

      Reviewer #2(Recommendations For The Authors):*

      (1) The title overstates the results. It may be indicated "in differenciated ReNcell VM".

      Title changed to: “Botulinum toxin intoxication requires retrograde transport and membrane translocation at the ER in RenVM neurons”

      (2) In the provided manuscript there are two Figure 2 and no Figure 3. This made the reading and understanding extremely difficult and should be corrected. As a result, the Figure legends do not fit the numbering. There are also discrepancies between some Figure panels (A, B, C, etc), the text, and the Legends. All this needs to be carefully checked.

      We apologize for the confusion as the manuscript as followed multiple rounds of revisions. We have carefully verified labels and legends.

      (3) The BoNT/A-mNG11x3 may introduce some bias that could be discussed. Would these additional peptides block LC translocation from synaptic vesicles in the nerve termini? In addition, the mNG peptides that are unfolded before complementation may direct LC towards Sec61. These aspects should be discussed.

      The comment would be valid if BoNT/A-mNG11x3 was the only approach used in the paper, however the SNAPR reporter is used with native BonT and shows data consistent with the split GFP approach.

      (4) In the Figure about SV2 (Fig 3 or 4): The authors did not locate SV2. The cells seem not to have the same differentiated phenotype as in Figure 1 and Figure 2/3A.

      We apologized above for the mislabeling. It is not clear what is the question here.

      (5) The authors should check whether BoNT/A wt cleaves the endogeneous SNAP25 by western blot for instance in the original ReNcell VN before SNAPR engineering. This should be compared with wt SNAP25 cleavage by the BoNT/A-LC-mNG.

      It is likely that BoNT/A-LC-mNG11 should have similar activity as it is only adding a small peptide at the end of the LC. At any rate, it is not clear why this is so important since both molecules translocate in the cytosol, with the same kinetics and in the same subcellular locale.

      (6) Perhaps I did not understand. How can the authors exclude that what is observed is the kinetic overproduction of the reporter substrate SNAPR?

      The authors could use SLO toxin (PNAS 98, 3185-3190, 2001) to permeabilize the cells all along their body and axon to introduce BoNT/A or LC (wt) and observe synchronized SNAPR cleavage throughout the cells.

      The concept mentioned here is not very clear to us. The reviewer is proposing that the SNAPR is produced much more efficiently at the tips of the neurites and thus its cleavage takes longer to be detected and is apparent first in the soma?? With all due respect, this is a strange hypothesis, at odds with what we know of protein dynamics in the neurons (i.e. most proteins are largely made in the soma and transported or diffuse into the neurites).

      Again, the two orthogonal approaches: split GFP and SNAPR reporter use different constructs and methods, yet converge on similar results. Perhaps, the incredulity of the reviewer might be more productively directed at the current data “demonstrating” the translocation of LC in the synaptic button?

      (7) The authors could also use an essay on neurotransmitter release monitoring by electrophysiology measurements to check the functional consequences of the kinetic diffusion of LC activity along the axon. Can the authors exclude that some toxin molecules translocate from the endocytic vesicles and block neurotransmission within minutes or a few hours?

      It is well established that inhibition of neurotransmission does not occur within minutes in vivo and in vitro, but rather within hours or even days. This kinetic delay is experienced by many patients and is one of the key argument against the current model of self-translocation at the synaptic vesicle level.

      Minor remarks

      Thank you for pointing out all these.

      (1) Please check typos. There are many. Check space before the parenthesis, between numbers and h (hours), reference style etc.

      Thank you. We have reviewed the text and try to eliminate all these instances.

      (2) Line 90: The C of HC should be capitalized.

      Fixed

      (3) Line 107: add space between "neurons(Donato".

      Fixed

      (4) Line 109: space "72 h".

      Fixed

      (5) Line 115: a word is missing ? ...to show retro-axonal... ? Please clarify this sentence.

      Fixed

      (6) Figure 1E: does nm refer to nM (nanomolar)? Please correct. No mention of panel F.

      Fixed

      (7) Line 161: do you mean ~16 µm/h? Please correct.

      Fixed

      (8) Line 168, words are missing.

      Fixed, thank you

      We verified that Cyt-mNG1-10 was expressed using the HA tag, the expression was homogeneously distributed in differentiated neurons and we observed no GFP signal (Figure2C).

      (9) Line 171: Isn't mNG 11 the eleventh beta strand of the neon green fluorescent protein, not alpha helix? Otherwise, can the authors confirm it acquires the shape of an alpha helix? Same at line 326.

      We have corrected the mistake; thanks for pointing it out.

      (10) Figure 2 is doubled. The legend of Fig 2 refers to Figure 3. There is no legend for Figure 2. Then, some figures are shifted in their numbering.

      Fixed

      (11) The fluorescence in the cell body must appear before the fluorescence in the axon due to higher volume. Please discuss.

      The fluorescence progresses in the neurites extensions in a centripetal fashion. The volume of the neurite near the cell body is not significantly different from the end of the neurite. Thus the fluorescence data is consistent with translocation in soma and not with an effect due to higher volume in the soma.

      (12) Figure 2D, right: the term intoxication is improper for this experiment. Rather, it is the presence of the BoNT/A-mNG11 that is detected. I believe the authors should be particularly careful about the use of terms: intoxication means blockade of neurosecretion, SNAPR cleavage means activity etc.

      While the reviewer is correct that it is the presence of BoNT/A-mNG11 that is detected, it remains that it is an active toxin, so the neurons are effectively intoxicated; as they are when we use the wild type toxin. We do not imply that we are measuring intoxication, but simply that the neurons are put into contact with a toxin.

      (13) Line 196: Should we read TXNRD1 is required for BoNT/A LC translocation? TXNRD1 in the current model of translocation is located in the cytoplasm and is supposed to play a role in the cleavage of the disulfide bond linking LC to HC. In the model proposed by this study, LC is translocated through the Sec61 translocon. In this case, I would assume that the protein disulfide isomerase (PDI) in the endoplasmic reticulum would reduce the LC-HC disulfide bond. In that case, TXNRD1 would not be required anymore. Please discuss.

      Why should we assume that a PDI is involved in the reduction of the LC-HC disulfide bond? In our previous studies on A-B toxins (PE and Ricin), different reduction systems seemed to be at play. There is no conceptual imperative to assume reduction in the ER because the Sec61 translocon is implicated. Reduction might occur on the cytosolic side by TXNRD1 or the effect of this reductase could be indirect.

      (14) The legend of Figure 4 (in principle Figure 5?) is not matching with the panels and panel entries are missing (Figure 4F in particular).

      Fixed

      (15) Figure 6 panels E and H, please match colors with legend (grey and another color).

      Not clear

      (16) Please indicate BoNT/A construct concentrations in all Figure legends.

      Done

      (17) Line 416: isn't SV2 also involved in epilepsy?

      Yes it is.

      (18) Line 433: as above, shouldn't the disulfide bond linking LC to HC be cleaved by PDI in the ER in this model (as for other translocating bacterial toxins) rather than by thioredoxin reductases in the cytoplasm? Please discuss.

      See above

      (19) Identification of vATPase in the screen could be consistent with the endocytic vesicle acidification model of translocation.

      Yes

      (20) Did the authors add KCl in screening controls without toxins? This should be detailed in the Materials and Methods. Could there be a KCl effect on the cells? KCl exposure for 48 hours may be highly stressful for cells. The KCl exposure should last only several minutes for toxin entry.

      We did not observe significant cell detah with the cell culture conditions used. Cell viability was controlled at multiple stages using nuclei number for instance

      Reviewer #3 (Recommendations For The Authors):

      Main comments: (1) In Figure 1B: could you devise a means to prevent proteosomal degradation of the tGFP cleaved part to assess whether this is formed?

      We have also used a FRET assay after tintoxication and obtained similar results

      (2) Line 152: Where it reads "was not surprising", maybe I missed something, but to me, this is indeed surprising. If the toxin is rapidly internalized and translocated (therefore, it is able to cleave SNAP25), the fact that tGFP requires 48 hours to be degraded seems surprising to me. Or does it mean that the toxin also slows down the degradation of the tGFP fragment? So, how can you differentiate between the effect being on cleavage of the fragment or in tGFP degradation?

      The reviewer is correct, the “not” was a typo due to re-writting; the long delay between adding the toxin and observing cleavage was suprising indeed. Our interpretation is that it is trafficking that takes time, indeed, the split-GFP data kinetics indicates that the toxin takes about 48h to fill up the entire cytosol (Fig. 2D).

      (3) Regarding the effect of Sec61G knockdown, is it possible that the observed effects are indirect and not due to the translocon being directly responsible for translocating the protein?

      As discussed in the last part of the results,Sec61 knock-down results in block of intoxication, but does not prevent BonT from reaching the lumen of the ER (Figure 6G,H). Thus, Sec61 is “is instrumental to the translocation of BoNT/A LC into the neuronal cytosol at the soma.”

      Minor comments:

      (1) Fig. 3E: in the legend I think one of the NT3+ should be NT3-.

      Yes, thanks for spotting it

      (2) Would you consider adding Figure S4 as a main figure?

      Thanks for the suggestion

      (3) Please, check that all microscopy image panels have scale bars.

      Done

      (4) Figure 6B (bottom panes): why does it seem that there is a lot of mNeonGreen positive signal in regions that are not positive for HA? Shouldn't complementation keep HA in the complemented protein.

      Our assumption i sthat there is an excess of receptor protein (HA tag) over reconstituted protein (GFP protein) given the relatively low concentration of toxin being internalized and translocated Refs: (1) Pirazzini M, Azarnia Tehran D, Leka O, Zanetti G, Rossetto O, Montecucco C. On the translocation of botulinum and tetanus neurotoxins across the membrane of acidic intracellular compartments. Biochim Biophys Acta. 2016 Mar;1858(3):467–474. PMID: 26307528

      (2) Pirazzini M, Rossetto O, Eleopra R, Montecucco C. Botulinum Neurotoxins: Biology, Pharmacology, and Toxicology. Pharmacol Rev. 2017 Apr;69(2):200–235. PMCID: PMC5394922

      (3) Dong M, Masuyer G, Stenmark P. Botulinum and Tetanus Neurotoxins. Annu Rev Biochem. Annual Reviews; 2019 Jun 20;88(1):811–837.

      (4) Rossetto O, Pirazzini M, Fabris F, Montecucco C. Botulinum Neurotoxins: Mechanism of Action. Handb Exp Pharmacol. 2021;263:35–47. PMCID: 6671090

      (5) Williams JM, Tsai B. Intracellular trafficking of bacterial toxins. Curr Opin Cell Biol. 2016 Aug;41:51–56. PMCID: PMC4983527

      (6) Mesquita FS, van der Goot FG, Sergeeva OA. Mammalian membrane trafficking as seen through the lens of bacterial toxins. Cell Microbiol. 2020 Apr;22(4):e13167. PMCID: PMC7154709

      (7) Hoch DH, Romero-Mira M, Ehrlich BE, Finkelstein A, DasGupta BR, Simpson LL. Channels formed by botulinum, tetanus, and diphtheria toxins in planar lipid bilayers: relevance to translocation of proteins across membranes. Proc Natl Acad Sci U S A. 1985 Mar;82(6):1692–1696. PMCID: PMC397338

      (8) Donovan JJ, Middlebrook JL. Ion-conducting channels produced by botulinum toxin in planar lipid membranes. Biochemistry. 1986 May 20;25(10):2872–2876. PMID: 2424493

      (9) Fischer A, Montal M. Single molecule detection of intermediates during botulinum neurotoxin translocation across membranes. Proc Natl Acad Sci U S A. 2007 Jun 19;104(25):10447–10452. PMCID: PMC1965533

      (10) Fischer A, Nakai Y, Eubanks LM, Clancy CM, Tepp WH, Pellett S, Dickerson TJ, Johnson EA, Janda KD, Montal M. Bimodal modulation of the botulinum neurotoxin protein-conducting channel. Proc Natl Acad Sci U S A. 2009 Feb 3;106(5):1330–1335. PMCID: PMC2635780

      (11) Fischer A, Montal M. Crucial role of the disulfide bridge between botulinum neurotoxin light and heavy chains in protease translocation across membranes. J Biol Chem. 2007Oct 5;282(40):29604–29611. PMID: 17666397

      (12) Koriazova LK, Montal M. Translocation of botulinum neurotoxin light chain protease through the heavy chain channel. Nature structural biology. 2003. p. 13–18. PMID: 12459720

      (13) Moreau D, Kumar P, Wang SC, Chaumet A, Chew SY, Chevalley H, Bard F.Genome-wide RNAi screens identify genes required for Ricin and PE intoxications. Dev Cell. 2011 Aug 16;21(2):231–244. PMID: 21782526

      (14) Bassik MC, Kampmann M, Lebbink RJ, Wang S, Hein MY, Poser I, Weibezahn J, Horlbeck MA, Chen S, Mann M, Hyman AA, Leproust EM, McManus MT, Weissman JS. A systematic mammalian genetic interaction map reveals pathways underlying ricin susceptibility. Cell. 2013 Feb 14;152(4):909–922. PMCID: PMC3652613

      (15) Tian S, Muneeruddin K, Choi MY, Tao L, Bhuiyan RH, Ohmi Y, Furukawa K, Furukawa K, Boland S, Shaffer SA, Adam RM, Dong M. Genome-wide CRISPR screens for Shiga toxins and ricin reveal Golgi proteins critical for glycosylation. PLoS Biol. 2018 Nov;16(11):e2006951. PMCID: PMC6258472

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript explores the impact of serotonin on olfactory coding in the antennal lobe of locusts and odor-evoked behavior. The authors use serotonin injections paired with an odor-evoked palp-opening response assay and bath application of serotonin with intracellular recordings of odor-evoked responses from projection neurons (PNs).

      Strengths:

      The authors make several interesting observations, including that serotonin enhances behavioral responses to appetitive odors in starved and fed animals, induces spontaneous bursting in PNs, and uniformly enhances PN responses to odors. Overall, I had no technical concerns. Weaknesses:

      While there are several interesting observations, the conclusions that serotonin enhanced sensitivity specifically and that serotonin had feeding-state-specific effects, were not supported by the evidence provided. Furthermore, there were other instances in which much more clarification was needed for me to follow the assumptions being made and inadequate statistical testing was reported.

      Major concerns.

      • To enhance olfactory sensitivity, the expected results would be that serotonin causes locusts to perceive each odor as being at a relatively higher concentration. The authors recapitulate a classic olfactory behavioral phenomenon where higher odor concentrations evoke weaker responses which is indicative of the odors becoming aversive. If serotonin enhanced the sensitivity to odors, then the dose-response curve should have shifted to the left, resulting in a more pronounced aversion to high odor concentrations. However, the authors show an increase in response magnitude across all odor concentrations. I don't think the authors can claim that serotonin enhances the behavioral sensitivity to odors because the locusts no longer show concentration-dependent aversion. Instead, I think the authors can claim that serotonin induces increased olfactory arousal.

      The reviewer makes a valid point. Bath application of serotonin increased POR behavioral responses across all odor concentrations, and concentration-dependent aversion was also not observed. Furthermore, the monotonic relationship between projection neuron responses and the intensity of current injection is altered when serotonin is exogenously introduced (see Author response image 1; see below for more explanation). Hence, our data suggests that serotonin alters the dose-response relationship between neural/behavioral responses and odor intensity. As recommended, we have followed what the reviewer has suggested and revised our claim to serotonin inducing increase in olfactory arousal. The new physiology data has been added as Supplementary Figure 3 to the revised manuscript.

      • The authors report that 5-HT causes PNs to change from tonic to bursting and conclude that this stems from a change in excitability. However, excitability tests (such as I/V plots) were not included, so it's difficult to disambiguate excitability changes from changes in synaptic input from other network components.

      To confirm that the PN excitability did indeed change after serotonin application, we performed a new set of current-clamp recordings. In these experiments, we monitored the spiking activities in individual PNs as we injected different levels of current injections (200 – 1000 pico Amperes). Note that locust LNs that provide recurrent inhibition arborize and integrate inputs from a large number of sensory neurons and projection neurons. Therefore, activating a single PN should not activate the local neurons and therefore the antennal lobe network.

      We found that the total spiking activity monotonically increased with the magnitude of the current injection in all four PNs recorded (Author response image 1). However, after serotonin injection, we found that the spiking activity remained relatively stable and did not systematically vary with the magnitude of the current injection. While the changes in odor-evoked responses may incorporate both excitability changes in individual PNs and recurrent feedback inhibition through GABAergic LNs, these results from our current injection experiments unambiguously indicate that there are changes in excitability at the level of individual PNs. We have added this result to the revised manuscript.

      Author response image 1.

      Current-injection induced spiking activity in individual PNs is altered after serotonin application. (A) Representative intracellular recordings showing membrane potential fluctuations as a function of time for one projection neuron (PNs) in the locust antennal lobe. A two-second window when a positive 200-1000pA current was applied is shown. Firing patterns before (left) and after (right) serotonin application are shown for comparison. Note, the spiking activity changes after the 5HT application. The black bar represents the 20mV scale. (B) Dose-response curves showing the average number of action potentials (across 5 trials) during the 2second current pulse before (green) and after (purple) serotonin for each recorded PN. Note that the current intensity was systematically increased from 200 pA to 1000 pA. The (C) The mean number of spikes across the four recorded cells during current injection is shown. The color progression represents the intensity of applied current ranging 200pA (leftmost bar) to 1000pA (rightmost bar). The dose-response trends before (green) and after (purple) 5HT application are shown for comparison.. The error bars represent SEM across the four cells.

      • There is another explanation for the theoretical discrepancy between physiology and behavior, which is that odor coding is further processing in higher brain regions (ie. Other than the antennal lobe) not studied in the physiological component of this study. This should at least be discussed.

      This is a valid argument. For our model of neural mapping onto behavior to work, we only need the odorant that evokes or suppresses PORs to activate a distinct set of neurons. Having said that, our extracellular recording results (Fig. 6E) indicate that hexanol (high POR) and linalool (low POR) do activate highly non-overlapping sets of PNs in the antennal lobe. Hence, our results suggest that the segregation of neural activity based on behavioral relevance already begins in the antennal lobe. We have added this clarification to the discussion section.

      • The authors cannot claim that serotonin underlies a hunger state-dependent modulation, only that serotonin impacts responses to appetitive odors. Serotonin enhanced PORs for starved and fed locusts, so the conclusion would be that serotonin enhances responses regardless of the hunger state. If the authors had antagonized 5-HT receptors and shown that feeding no longer impacts POR, then they could make the claim that serotonin underlies this effect. As it stands, these appear to be two independent phenomena.

      This is also a valid point. We have clarified this in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors investigate the influence of serotonin on feeding behavior and electrophysiological responses in the antennal lobe of locusts. They find that serotonin injection changes behavior in an odorspecific way. In physiology experiments, they can show that antennal lobe neurons generally increase their baseline firing and odor responses upon serotonin injection. Using a modeling approach the authors propose a framework on how a general increase in antennal lobe output can lead to odorspecific changes in behavior. The authors finally suggest that serotonin injection can mimic a change in a hunger state.

      Strengths:

      This study shows that serotonin affects feeding behavior and odor processing in the antennal lobe of locusts, as serotonin injection increases activity levels of antennal lobe neurons. This study provides another piece of evidence that serotonin is a general neuromodulator within the early olfactory processing system across insects and even phyla. Weaknesses:

      I have several concerns regarding missing control experiments, unclear data analysis, and interpretation of results.

      A detailed description of the behavioral experiments is lacking. Did the authors also provide a mineral oil control and did they analyze the baseline POR response? Is there an increase in baseline response after serotonin exposure already at the behavioral output level? It is generally unclear how naturalistic the chosen odor concentrations are. This is especially important as behavioral responses to different concentrations of odors are differently modulated after serotonin injection (Figure 2: Linalool and Ammonium).

      POR protocol: Sixth instar locusts (Schistocera americana) of either sex were starved for 24-48 hours before the experiment or taken straight from the colony and fed blades of grass for the satiated condition. Locusts were immobilized by placing them in the plastic tube and securing their body with black electric tape (see Author response image 2). Locusts were given 20 - 30 minutes to acclimatize after placement in the immobilization tube. As can be noted, the head of the locusts along with the antenna and maxillary palps protruded out of this immobilization tube so they can be freely moved by the locusts. Note that the maxillary palps are sensory organs close to the mouth parts that are used to grab food and help with the feeding process.

      It is worth noting that our earlier studies had shown that the presentation of ‘appetitive odorants’ triggers the locust to open their maxillary palps even when no food is presented (Saha et al., 2017; Nizampatnam et al., 2018; Nizampatnam et al., 2022; Chandak and Raman, 2023.) Furthermore, our earlies results indicate that the probability of palp opening varies across different odorants (Chandak and Raman, 2023). We chose four odorants that had a diverse range of palp-opening: supra-median (hexanol), median (benzaldehyde), and sub-median (linaool). Therefore, each locust in our experiments was presented with one concentration of four odorants (hexanol, benzaldehyde, linalool, and ammonium) in a pseudorandomized order. The odorants were chosen based on our physiology results such that they evoked different levels of spiking activities.

      The odor pulse was 4 s in duration and the inter-pulse interval was set to 60 s. The experiments were recorded using a web camera (Microsoft) placed right in front of the locusts. The camera was fully automated with the custom MATLAB script to start recording 2 seconds before the odor pulse and end recording at odor termination. An LED was used to track the stimulus onset/offset. The POR responses were manually scored offline. Responses to each odorant were scored a 0 or 1 depending on if the palps remained closed or opened. A positive POR was defined as a movement of the maxillary palps during the odor presentation time window as shown on the locust schematic (Main Paper Figure 1).

      Author response image 2.

      Pictures showing the behavior experiment setup and representative palp-opening responses in a locust.

      As the reviewer inquired, we performed a new series of POR experiments, where we explored POR responses to mineral oil and hexanol, before and after serotonin injection. For this study, we used 10 locusts that were starved 24-48 hours before the experiment. Note that hexanol was diluted at 1% (v/v) concentration in mineral oil. Our results reveal that locusts PORs to hexanol (~ 50% PORs) were significantly higher than those triggered by mineral oil (~10% PORs). Injection of serotonin increased the POR response rate to hexanol but did not alter the PORs evoked by mineral oil (Author response image 3).

      Author response image 3.

      Serotonin does not alter the palp-opening responses evoked by paraffin oil. The PORs before and after (5HT) serotonin injection are summarized and shown as a bar plot for hexanol and paraffin oil. Striped bars signify the data collected after 5HT injection. Significant differences are identified in the plot (one-tailed paired-sample t-test; (*p<0.05).

      Regarding recordings of potential PNs - the authors do not provide evidence that they did record from projection neurons and not other types of antennal lobe neurons. Thus, these claims should be phrased more carefully.

      In the locust antennal lobe, only the cholinergic projection neurons fire full-blown sodium spikes. The GABAergic local neurons only fire calcium ‘spikelets’ (Laurent, TINS, 1996; Stopfer et al., 2003; see Author response image 4 for an example). Hence, we are pretty confident that we are only recording from PNs. Furthermore, due to the physiological properties of the LNs, their signals being too small, they are also not detected in the extracellular recordings from the locust antennal lobe. Hence, we are confident with our claims and conclusion.

      Author response image 4.

      PN vs LN physiological differences: Left: A representative raw voltage traces recorded from a local neuron before, during, and after a 4-second odor pulse are shown. Note that the local neurons in the locust antennal lobe do not fire full-blown sodium spikes but only fire small calcium spikelets. On the right: A representative raw voltage trace recorded from a representative projection neuron is shown for comparison. Clear sodium spikes are clearly visible during spontaneous and odor-evoked periods. The gray bar represents 4 seconds of odor pulse. The vertical black bar represents the 40mV.

      The presented model suggests labeled lines in the antennal lobe output of locusts. Could the presented model also explain a shift in behavior from aversion to attraction - such as seen in locusts when they switch from a solitarious to a gregarious state? The authors might want to discuss other possible scenarios, such as that odor evaluation and decision-making take place in higher brain regions, or that other neuromodulators might affect behavioral output. Serotonin injections could affect behavior via modulation of other cell types than antennal lobe neurons. This should also be discussed - the same is true for potential PNs - serotonin might not directly affect this cell type, but might rather shut down local inhibitory neurons.

      There are multiple questions here. First, regarding solitary vs. gregarious states, we are currently repeating these experiments on solitary locusts. Our preliminary results (not included in the manuscript) indicate that the solitary animals have increased olfactory arousal and respond with a higher POR but are less selective and respond similarly to multiple odorants. We are examining the physiology to determine whether the model for mapping neural responses onto behavior could also explain observations in solitary animals.

      Second, this reviewer makes the point raised by Reviewer 1. We agree that odor evaluation and decisionmaking might take place in higher brain regions. All we could conclude based on our data is that a segregation of neural activity based on behavioral relevance might provide the simplest approach to map non-specific increase in stimulus-evoked neural responses onto odor-specific changes in behavioral outcome. Furthermore, our results indicate that hexanol and linalool, two odorants that had an increase and decrease in PORs after serotonin injection, had only minimal neural response overlap in the antennal lobe. These results suggest that the formatting of neural activity to support varying behavioral outcomes might already begin in the antennal lobe. We have added this to our discussion.

      Third, regarding serotonin impacting PNs, we performed a new set of current-clamp experiments to examine this issue (Author response image 1). Our results clearly show that projection neuron activity in response to current injections (that should not incorporate feedback inhibition through local neurons) was altered after serotonin injection. Therefore, the observed changes in the odor-evoked neural ensemble activity should incorporate modulation at both individual PN level and at the network level. We have added this to our discussion as well.

      Finally, the authors claim that serotonin injection can mimic the starved state behavioral response. However, this is only shown for one of the four odors that are tested for behavior (HEX), thus the data does not support this claim.

      We note that Hex is the only appetitive odorant in the panel. But, as reviewer 1 has also brought up a similar point, we have toned down our claims and will investigate this carefully in a future study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • Was the POR of the locusts towards linalool and ammonium higher than towards a blank odor cartridge? I ask because the locusts appear to be less likely to respond to these odors and so I am concerned that this assay is not relevant to the ecological context of these odors. In other words, perhaps serotonin did not enhance the responses to these odors in this assay, because this is not a context in which locusts would normally respond to these odors.

      The POR response to linalool and ammonium is lower and comparable to that of paraffin oil. Serotonin does not increase POR responses to paraffin oil but does increase response to hexanol (an appetitive odorant). We have clarified this using new data (Author response image 5).

      • It seems to me that Figure 5C is the crux for understanding the potential impact of 5-HT on odor coding, but it is somewhat confusing and underutilized. Is the implication that 5-HT decorrelates spontaneous activity such that when an odor stimulus arrives, the odor-evoked activity deviates to a greater degree? The authors make claims about this figure that require the reader to guess as to the aspect of the figure to which they are referring.

      The reviewer makes an astute observation. Yes, the spontaneous activity in the antennal lobe network before serotonin introduction is not correlated with the ensemble spontaneous activity after serotonin bath application. Remarkably, the odor-evoked responses were highly similar, both in the reduced PCA space and when assayed using high-dimensional ensemble neural activity vectors. Whether the changes in network spontaneous activity have a function in odor detection and recognition is not fully understood and cannot be convincingly answered using our data. But this is something that we had pondered.

      • The modeling component summarized in Figure 6 needs clarification and more detail. Perhaps example traces associated with positive weighting within neural ensemble 1 relative to neural ensemble 2? I struggled to understand conceptually how the model resolved the theoretical discrepancy between physiology and behavior.

      As recommended, here is a plot showing the responses of four PNs that had positive weights to hexanol and linalool. As can be expected, each PN in this group had higher responses to hexanol and no response to linalool. Further, the four PNs that received negative weights had response only to linalool.

      Author response image 5.

      Odor-evoked responses of four PNs that received positive weights in the model (top panel), and four PNs that were assigned negative weights in the model (bottom).

      • Was there a significant difference between the PORs of hungry vs. fed locusts? The authors state that they differ and provide statistics for the comparisons to locusts injected with 5-HT, but then don't provide any statistical analyses of hungry vs. fed animals.

      The POR responses to HEX (an appetitive odorant) were significantly different between the hungry and starved locusts.

      Author response image 6.

      A bar plot summarizing PORs to all four odors for satiated locust (highlighted with stripes), before (dark shade), and after 5HT injection (lighter shade). To allow comparison before 5HT injection for starved locust plotted as well (without stripes). The significance was determined using a one-tailed paired-sample ttest(*p<0.05).

      • Were any of the effects of 5-HT on odor-evoked PN responses significant? No statistics are provided.

      We examined the distribution of odor-evoked responses in PNs before and after 5HT introduction. We found that the overall distribution was not significantly different between the two (one-tailed pairedsample t-test; p = 0.93).

      Author response image 7.

      Comparison of the distribution of odor-evoked PN responses before (green) and after (purple) 5HT introduction. One-tailed paired sample t-test was used to compare the two distributions.

      • The authors interchangeably use "serotonin", "5HT" and "5-HT" throughout the manuscript, but this should be consistent.

      This has been fixed in the revised manuscript.

      • On page 2 the authors provide an ecological relevance for linalool as being an additive in pesticides, however, linalool is a common floral volatile chemical. Is the implication that locusts have learned to associate linalool with pesticides?

      Linalool is a terpenoid alcohol that has a floral odor but has also been used as a pesticide and insect repellent [Beier et al., 2014]. As shown in Author response image 2, it evoked the least POR responses amongst a diverse panel of 22 odorants that were tested. We have clarified how we chose odorants based on the prior dataset in the Methods section.

      • In Figure 1, there should be a legend in the figure itself indicating that the black box indicates the absence of POR and the white box indicates presence, rather than just having it in the legend text.

      Done.

      • In Figure 2, the raw data from each animal can be moved to the supplements. The way it is presented is overwhelming and the order of comparisons is difficult to follow.

      Done.

      • For the induction of bursting in PNs by the application of 5-HT, were there any other metrics observed such as period, duration of bursts, or peak burst frequency? The authors rely on ISI, but there are other bursting metrics that could also be included to understand the nature of this observation. In particular, whether the bursts are likely due to changes in intrinsic biophysical properties of the PNs or polysynaptic effects.

      We could use other metrics as the reviewer suggests. Our main point is that the spontaneous activity of individual PNs changed. We have added a new current-injection experiments to show that the PNs output to square pulses of current becomes different after serotonin application (Author response image 1)

      • Were 4-vinyl anisole, 1-nonanol, and octanoic acid selected as additional odors because they had particular ecological relevance, or was it for the diversity of chemical structure?

      These odorants were selected based on both, chemical structure and ecological relevance. The logic behind this was to have a very diverse odor panel that consisted of food odorant – Hexanol, aggregation pheromone – 4-vinyl anisole, sex pheromone – benzaldehyde, acid – octanoic acid, base – ammonium, and alcohol – 1-nonanol. Additionally, we selected these odors based on previous neural and behavioral data on these odorants (Chandak and Raman, 2023, Traner and Raman, 2023, Nizampatnam et al, 2022 & 2018; Saha et al., 2017 & 2013).

      Reviewer #2 (Recommendations For The Authors):

      The electrophysiology dataset combines all performed experiments across all tested different PN-odor pairs. How many odors have been tested in a single PN and how many PNs have been tested for a single odor? This information is not present in the current manuscript. Can the authors exclude that there are odor-specific modulations?

      In total, our dataset includes recordings from 19 PNs. Seven PNs were tested on a panel of seven odorants (4-vinyl anisole, 1-nonanol, octanoic acid, Hex, Bza, Lool, and Amn), and the remaining twelve were tested with the four main odorants used in the study (Hex, Bza, Lool, and Amn). This information has been added to the Methods section

      How did the authors choose the concentrations of serotonin injections and bath applications - is this a naturalistic amount?

      The serotonin concentration for ephys experiments was chosen based on trial-error experiments:

      0.01mM was the highest concentration that did not cause cell death. For the behavioral experiments, we increased the concentration (0.1 M) due to the presence of anatomical structures in the locust's head such as air sacks, sheath as well as hemolymph which causes some degree of dilution that we cannot control.

      Behavior experiments were performed 3 hours after injection - ephys experiments 5-10 minutes following bath application. Can the authors exclude that serotonin affects neural processing differently on these different timescales?

      We cannot exclude this possibility. We did ePhys experiments 5-10 minutes after bath application as it would be extremely hard to hold cells for that long.

      A longer delay was required for our behavioral experiments as the locusts tended to be a bit more agitated with larger spontaneous movements of palps as well as exhibited unprompted vomiting. A 3hour period allowed the locust to regain its baseline level movements after 5HT introduction. [This information has been added to the methods section of the revised manuscript]

      Concerning the analysis of electrophysiological data. The authors should correct for changes in the baseline before performing PCA analysis. And how much of the variance is explained by PC1 and PC2?

      We did not correct for baseline changes or subtract baseline as we wanted to show that the odor-evoked neural responses still robustly encoded information about the identity of the odorant.

      The authors should perform dye injections after recordings to visualize the cell type they recorded from. Serotonin might affect also other cell types in the antennal lobe.

      As mentioned above, in the locust antennal lobe only PNs fire full-blown sodium spikes, and LNs only fire calcium spikelets (Author response image 4). Since these signals are small, they will be buried under the noise floor when using extracellular recording electrodes for monitoring responses in the AL antennal lobe.

      Hence we are pretty certain what type of cells we are recording from.

      There were several typos in the manuscript, please check again.

      We have fixed many of the grammatical errors and typos in the revised version.

    1. Author response:

      We would like to thank the reviewers for their helpful comments. We note that both reviews are strongly supportive with comments including, “a biophysical tour de force” (rev #1), “the study is exemplary” (rev #2), and “represents a roadmap for future work” (rev #2). Below we respond to each reviewer comment.

      Reviewer #1

      This study provides a detailed and quantitative description of the allosteric mechanisms resulting in the paradoxical activation of BRAF kinase dimers by certain kinase inhibitors. The findings provide a much needed quantiative basis for this phenomenon and may lay the foundation for future drug development efforts aimed at the important cancer target BRAF. The study builds on very evidence obtained by multiple independent biophysical methods.

      Summary:

      The authors quantitatively describe the complex binding equilibria of BRAF and its inhibitors resulting in some cases in the paradoxical activation of BRAF dimer when bound to ATP competitive inhibitors. The authors use a biophysical tour de force involving FRET binding assays, NMR, kinase activity assays and DEER spectroscopy.

      We are gratified by the reviewer’s supportive summary.

      Strengths:

      The strengths of the study are the beautifully conducted assays that allow for a thorough characterization of the allostery in this complex system. Additionally, the use of F-NMR and DEER spectroscopy provide important insights into the details of the process. The resulting model for binding of inhibitors and dimerization (Fig.4) is very helpful.

      Weaknesses:

      This is a complex system and its communication is inherently challenging. It might be of interest to the broader readership to understand the implications of the model for drug development and therapy.

      We agree with the reviewer that this is a complicated system. With regard to inhibitor development, a key insight is that designing aC-in state inhibitors that avoid paradoxical activation may be non-trivial because these molecules not only induce dimers but also tend to bind the second dimer subunit more weakly than the first, due to allosteric asymmetry and/or inherently different affinities for each RAF isoform. We feel the full implications for future therapeutic development are an extensive topic that is beyond the scope of our work, which is focused on the properties of current inhibitors.

      Recommendations for the author:

      The experimental work, analysis and resulting model are excellent. I had some difficulty following the complex model in some instances and it may be useful to review the description of the model and see whether it can be made more palatable to the broader readership. I think it would be useful to discuss the model presented in reference 40 (Kholodenko) and to compare it to the presented model here.

      We regret any confusion with regards to the nature of the model. Our analysis was built upon the model developed by Boris Kholodenko as reported in his 2015 Cell Reports paper. This formed the theoretical framework that combined with our experimental data allowed us to parameterize this model to obtain experimental values for the equilibrium constants and allosteric coupling factors.

      Reviewer #2

      This manuscript combines elegant biophysical solution measurements to address paradoxical kinase activation by Type II BRAF inhibitors. The novel findings challenge prevailing models, through experiments that are rigorous and carefully controlled. The study is exemplary in the breadth of strategies it uses to address protein kinase dynamics and inhibitor allostery.

      Summary:

      This manuscript uses FRET, 19F-NMR and DEER/EPR solution measurements to examine the allosteric effects of a panel of BRAF inhibitors (BRAFi). These include first-generation aC-out BRAFi, and more recent Type I and Type II aC-in inhibitors. Intermolecular FRET measurements quantify Kd for BRAF dimerization and inhibitor binding to the first and second subunits. Distinct patterns are found between aC-in BRAFi, where Type I BRAFi bind equally well to the first and second subunits within dimeric BRAF. In contrast, Type II BRAFi show stronger affinity for the first subunit and weaker affinity for the second subunit, an effect named "allosteric asymmetry". Allosteric asymmetry has the potential for Type II inhibitors to promote dimerization while favoring occupancy of only one subunit (BBD form), leading to enrichment of an active dimer.

      Measurements of in vitro BRAF kinase activity correlate amazingly well with the calculated amounts of the half site-inhibited BBD forms with Type II inhibitors. This suggests that the allosteric asymmetry mechanism explains paradoxical activation by this class of inhibitors. DEER/EPR measurements further examine the positioning of helix aC. They show systematic outward movement of aC with Type II inhibitors, relative to the aC-in state with Type I inhibitors, and further show that helix aC adopts multiple states and is therefore dynamic in apo BRAF. This makes a strong case that negative cooperativity between sites in the BRAF dimer can account for paradoxical kinase activation by Type II inhibitors by creating a half site-occupied homodimer, BBD. In contrast, Type I inhibitors and aC-out inhibitors do not fit this model, and are therefore proposed to be explained by previous proposed models involving negative allostery between subunits in BRAF-CRAF heterodimers, RAS priming, and transactivation.

      Strengths:

      This study integrates orthogonal spectroscopic and kinetic strategies to characterize BRAF dynamics and determine how it impacts inhibitor allostery. The unique combination of approaches presented in this study represents a road map for future work in the important area of protein kinase dynamics. The work represents a worthy contribution not only to the field of BRAF regulation but protein kinases in general.

      Weaknesses:

      Some questions remain regarding the proposed model for Type II inhibitors and its comparison to Type I and aC-out inhibitors that would be useful to clarify. Specifically, it would be helpful to address whether the activation of BRAF by Type II inhibitors, while strongly correlated with BBD model predictions in vitro, also depends on CRAF via BRAF-CRAF in cells and therefore overlaps with the mechanisms of paradoxical activation by Type I and aC-out inhibitors.

      We agree with the reviewer that this is a worthy question to be pursued. However, given the substantial experimental effort required for such an endeavor, and the highly supportive nature of the reviewer comments, including that “This is a strong manuscript that I feel is well above the bar for publication”, we believe this effort is more appropriate for a future study.

      This is a strong manuscript that I feel is well above the bar for publication. Nevertheless, it is recommended that the authors consider addressing the following points in order to support their major conclusions.

      (1) Fig 3D shows similar effects of Type II and Type I inhibitors in the biphasic increase of cellular pMEK/pERK. From this, the authors argue that Type II inhibitors are explained by negative allostery in the BRAF homodimer (based on Fig 2E), while Type I inhibitors are not. But it seems possible that despite the terrific correlation between BBD and BRAF kinase activities measured in vitro, CRAF is still important to explain pathway activation in cells. It also seems conceivable that the calculated %BBD between different Type II inhibitors may not correlate as well with their effects on pathway activation in cells. These possibilities should be addressed.

      We agree with the reviewer that it is likely that CRAF contributes to paradoxical activation by type II inhibitors in cells. It is also likely that other cellular factors such as RAS-priming and membrane recruitment play a role in activation. However, we note that for the type II inhibitors there is good agreement between the biophysical predictions and the concentration regimes in which activation is observed in cells, suggesting that these predictions are capturing a key part of the activation process that occurs in cells.

      (2) In Fig 2A, is it possible to report the activity of dimeric BRAF-WT in the absence of inhibitor? This would help confirm that the maximal activity measured after titrating inhibitor is indeed consistent with the predicted %BBD population, which would be expected to have half of the specific activity of BB.

      In principle, it is possible to determine the catalytic activity of apo dimers (BB) by combining our model predictions for the concentration of BB dimers and our activity measurements. However, because the activity assays are performed at nanomolar kinase concentrations, whereas the baseline dimerization affinity of BRAF is in the micromolar range, the observed activity of apo BRAF arises from a small subpopulation of dimers (on the order of 4 percent under the conditions of our experiments) and is therefore difficult to define accurately. As a result, we deemed it more suitable to compare our results to published activity measurements derived from 14-3-3-activated dimers which should represent fully dimerized BRAF. This analysis, as reported in Figure 2E, suggests that the BBD activity is approximately half of that of BB.

      (3) The 19F-NMR experiments make a good case for broadening of the helix aC signal in the BRAF dimer. From this, the study proposes that after inhibitor binds one subunit, the second unoccupied subunit retains dynamics. It would be useful to address this experimentally, if possible. For example, can the 19F-NMR signal be measured in the presence of inhibitor, to support the prediction that the unoccupied subunit is indeed dynamic and samples multiple conformations as in apo BRAF?

      We agree with the reviewer that it would be interesting to determine the dynamic response of BRAF to inhibitor binding. However, this is a challenging undertaking due to the biochemical heterogeneity that occurs at sub saturating inhibitor concentrations. For example, at any given inhibitor concentration, BRAF exists as a mixture of monomers, apo dimers, dimers with one inhibitor molecule, and dimers with two inhibitor molecules bound. This makes it challenging to relate the 19F NMR signal to a single biochemical state. Addressing this would require a substantial experimental effort that we feel is beyond the scope of this study.

    1. Author response:

      Reviewer 1:

      The paper “Quantifying gliding forces of filamentous cyanobacteria by self-buckling” combines experiments on freely gliding cyanobacteria, buckling experiments using two-dimensional V-shaped corners, and micropipette force measurements with theoretical models to study gliding forces in these organisms. The aim is to quantify these forces and use the results to perhaps discriminate between competing mechanisms by which these cells move. A large data set of possible collision events are analyzed, bucking events evaluated, and critical buckling lengths estimated. A line elasticity model is used to analyze the onset of buckling and estimate the effective (viscous type) friction/drag that controls the dynamics of the rotation that ensues post-buckling. This value of the friction/drag is compared to a second estimate obtained by consideration of the active forces and speeds in freely gliding filaments. The authors find that these two independent estimates of friction/drag correlate with each other and are comparable in magnitude. The experiments are conducted carefully, the device fabrication is novel, the data set is interesting, and the analysis is solid. The authors conclude that the experiments are consistent with the propulsion being generated by adhesion forces rather than slime extrusion. While consistent with the data, this conclusion is inferred.

      We thank the reviewer for the positive evaluation of our work.

      Summary:

      The paper addresses important questions on the mechanisms driving the gliding motility of filamentous cyanobacteria. The authors aim to understand these by estimating the elastic properties of the filaments, and by comparing the resistance to gliding under a) freely gliding conditions, and b) in post-buckled rotational states. Experiments are used to estimate the propulsion force density on freely gliding filaments (assuming over-damped conditions). Experiments are combined with a theoretical model based on Euler beam theory to extract friction (viscous) coefficients for filaments that buckle and begin to rotate about the pinned end. The main results are estimates for the bending stiffness of the bacteria, the propulsive tangential force density, the buckling threshold in terms of the length, and estimates of the resistive friction (viscous drag) providing the dissipation in the system and balancing the active force. It is found that experiments on the two bacterial species yield nearly identical values of f (albeit with rather large variations). The authors conclude that the experiments are consistent with the propulsion being generated by adhesion forces rather than slime extrusion.

      We appreciate this comprehensive summary of our work.

      Strengths of the paper:

      The strengths of the paper lie in the novel experimental setup and measurements that allow for the estimation of the propulsive force density, critical buckling length, and effective viscous drag forces for movement of the filament along its contour – the axial (parallel) drag coefficient, and the normal (perpendicular) drag coefficient (I assume this is the case, since the post-buckling analysis assumes the bent filament rotates at a constant frequency). These direct measurements are important for serious analysis and discrimination between motility mechanisms.

      We thank the reviewer for this positive assessment of our work.

      Weaknesses:

      There are aspects of the analysis and discussion that may be improved. I suggest that the authors take the following comments into consideration while revising their manuscript.

      The conclusion that adhesion via focal adhesions is the cause for propulsion rather than slime protrusion is consistent with the experimental results that the frictional drag correlates with propulsion force. At the same time, it is hard to rule out other factors that may result in this (friction) viscous drag - (active) force relationship while still being consistent with slime production. More detailed analysis aiming to discriminate between adhesion vs slime protrusion may be outside the scope of the study, but the authors may still want to elaborate on their inference. It would help if there was a detailed discussion on the differences in terms of the active force term for the focal adhesion-based motility vs the slime motility.

      We appreciate this critical assessment of our conclusions. Of course we are aware that many different mechanisms may lead to similar force/friction characteristics, and that a definitive conclusion on the mechanism would require the combination of various techniques, which is beyond the scope of this work. Therefore, we were very careful in formulating the discussion of our findings, refraining, in particular, from a singular conclusion on the mechanism but instead indicating “support” for one hypothesis over another, and emphasizing “that many other possibilities exist”.

      The most common concurrent hypotheses for bacterial gliding suggest that either slime extrusion at the junctional pore complex [A1], rhythmic contraction of fibrillar arrays at the cell wall [A2], focal adhesion sites connected to intracellular motor-microtubule complexes [A3], or modified type-IV pilus apparati [A4] provide the propulsion forces. For the slime extrusion hypothesis, which is still abundant today, one would rather expect an anticorrelation of force and friction: more slime extrusion would generate more force, but also enhance lubrication. The other hypotheses are more conformal to the trend we observed in our experiments, because both pili and focal adhesion require direct contact with a substrate. How contraction of fibrilar arrays would micromechanically couple to the environment is not clear to us, but direct contact might still facilitate force transduction. Please note that these hypotheses were all postulated without any mechanical measurements, solely based on ultra-structural electron microscopy and/or genetic or proteomic experiments. We see our work as complementary to that, providing a mechanical basis for evaluating these hypotheses.

      We agree with the referee that narrowing down this discussion to focal adhesion should have been avoided. We rewrote the concluding paragraph (page 8):

      “…it indicates that friction and propulsion forces, despite being quite vari able, correlate strongly. Thus, generating more force comes, inevitably, at the expense of added friction. For lubricated contacts, the friction coefficient is proportional to the thickness of the lubricating layer (Snoeijer et al., 2013 ), and we conjecture active force and drag both increase due to a more intimate contact with the substrate. This supports mechanisms like focal adhesion (Mignot et al., 2007 ) or a modified type-IV pilus (Khayatan et al., 2015 ), which generate forces through contact with extracellular surfaces, as the underlying mechanism of the gliding apparatus of filamentous cyanobacteria: more contacts generate more force, but also closer contact with the substrate, thereby increasing friction to the same extent. Force generation by slime extrusion (Hoiczyk and Baumeister, 1998 ), in contrast, would lead to the opposite behavior: More slime generates more propulsion, but also reduces friction. Besides fundamental fluid-mechanical considerations (Snoeijer et al., 2013 ), this is rationalized by two experimental observations: i. gliding velocity correlates positively with slime layer thickness (Dhahri et al., 2013 ) and ii. motility in slime-secretion deficient mutants is restored upon exogenous addition of polysaccharide slime. Still we emphasize that many other possibilities exist. One could, for instance, postulate a regulation of the generated forces to the experienced friction, to maintain some preferred or saturated velocity.”

      Can the authors comment on possible mechanisms (perhaps from the literature) that indicate how isotropic friction may be generated in settings where focal adhesions drive motility? A key aspect here would probably be estimating the extent of this adhesion patch and comparing it to a characteristic contact area. Can lubrication theory be used to estimate characteristic areas of contact (knowing the radius of the filament, and assuming a height above the substrate)? If the focal adhesions typically cover areas smaller than this lubrication area, it may suggest the possibility that bacteria essentially present a flat surface insofar as adhesion is concerned, leading to a transversely isotropic response in terms of the drag. Of course, we will still require the effective propulsive force to act along the tangent.

      We thank the referee for suggesting to estimate the dimensions of the contact region. Both pili and focal adhesion sites would be of sizes below one micron [A3, A4], much smaller than the typical contact region in the lubricated contact, which is on the order of the filament radius (few microns). So indeed, isotropic friction may be expected in this situation [A5] and is assumed frequently in theoretical work [A6–A8]. Anisotropy may then indeed be induced by active forces [A9], but we are not aware of measurements of the anisotropy of friction in bacterial gliding.

      For a more precise estimate using lubrication theory, rheology and extrusion rate of the secreted polysaccharides would have to be known, but we are not aware of detailed experimental characterizations.

      We extended the paragraph in the buckling theory on page 5 regarding the assumption of isotropic friction:

      “We use classical Kirchhoff theory for a uniform beam of length L and bending modulus B, subject to a force density ⃗b = −f ⃗t− η ⃗v, with an effective active force density f along the tangent ⃗t, and an effective friction proportional to the local velocity ⃗v, analog to existing literature (Fily et al., 2020; Chelakkot et al., 2014; Sekimoto et al., 1995 ). Presumably, this friction is dominated by the lubrication drag from the contact with the substrate, filled by a thin layer of secreted polysaccharide slime which is much more viscous than the surrounding bulk fluid. Speculatively, the motility mechanism might also comprise adhering elements like pili (Khayatan et al., 2015 ) or foci (Mignot et al., 2007 ) that increase the overall friction (Pompe et al., 2015 ). Thus, the drag due to the surrounding bulk fluid can be neglected (Man and Kanso, 2019 ), and friction is assumed to be isotropic, a common assumption in motility models (Fei et al., 2020; Tchoufag et al., 2019; Wada et al., 2013 ). We assume…”

      We also extended the discussion regarding the outcome of isotropic friction (page 7):

      “…Thus we plot f/v over η in Figure 4 D, finding nearly identical values over about two decades. Since f and η are not correlated with v0, this is due to a correlation between f and η. This relation is remarkable in two aspects: On the one hand, it indicates that friction is mainly isotropic. This suggests that friction is governed by an isotropic process like bond friction or lubrication from the slime layer in the contact with the substrate, the latter being consistent with the observation that mutations deficient of slime secretion do not glide but exogenous addition of slime restores motility (Khayatan et al., 2015 ). In contrast, hydrodynamic drag from the surrounding bulk fluid (Man and Kanso, 2019 ), or the internal friction of the gliding apparatus would be expected to generate strongly anisotropic friction. If the latter was dominant, a snapping-like transition into the buckling state would be expected, rather than the continuously growing amplitude that is observed in experiments. On the other hand, it indicates that friction and propulsion forces…”

      I am not sure why the authors mention that the power of the gliding apparatus is not rate-limiting. The only way to verify this would be to put these in highly viscous fluids where the drag of the external fluid comes into the picture as well (if focal adhesions are on the substrate-facing side, and the upper side is subject to ambient fluid drag). Also, the friction referred to here has the form of a viscous drag (no memory effect, and thus not viscoelastic or gel-like), and it is not clear if forces generated by adhesion involve other forms of drag such as chemical friction via temporary bonds forming and breaking. In quasi-static settings and under certain conditions such as the separation of chemical and elastic time scales, bond friction may yield overall force proportional to local sliding velocities.

      We agree with the referee that the origin of the friction is not easily resolved. Lubrication yields an isotropic force density that is proportional to the velocity, and the same could be generated by bond friction. Importantly, both types of friction would be assumed to be predominantly isotropic. We explicitly referred to lubrication drag because it has been shown that mutations deficient of slime extrusion do not glide [A4].

      Assuming, in contrast, that in free gliding, friction with the environment is not rate limiting, but rather the internal friction of the gliding apparatus, i.e., the available power, we would expect a rather different behavior during early-buckling evolution. During early buckling, the tangential motion is stalled, and the dynamics is dominated by the growing buckling amplitude of filament regions near the front end, which move mainly transversely. For geometric reasons, in this stage the (transverse) buckling amplitude grows much faster than the rear part of the filament advances longitudinally. Thus that motion should not be impeded much by the internal friction of the gliding apparatus, but by external friction between the buckling parts of the filament and the ambient. The rate at which the buckling amplitude initially grows should be limited by the accumulated compressive stress in the filament and the transverse friction with the substrate. If the latter were much smaller than the (logitudinal) internal friction of the gliding apparatus, we would expect a snapping-like transition into the buckled state, which we did not observe.

      In our paper, we do not intend to evaluate the exact origin of the friction, quantifying the gliding force is the main objective. A linear force-velocity relation agrees with our observations. A detailed analysis of friction in cyanobacterial gliding would be an interesting direction for future work.

      To make these considerations more clear, we rephrased the corresponding paragraph on page 7 & 8:

      “…Thus we plot f/v over η in Figure 4 D, finding nearly identical values over about two decades. Since f and η are not correlated with v0, this is due to a correlation between f and η. This relation is remarkable in two aspects: On the one hand, it indicates that friction is mainly isotropic. This suggests that friction is governed by an isotropic process like bond friction or lubrication from the slime layer in the contact with the substrate, the latter being consistent with the observation that mutations deficient of slime secretion do not glide but exogenous addition of slime restores motility (Khayatan et al., 2015 ). In contrast, hydrodynamic drag from the surrounding bulk fluid (Man and Kanso, 2019 ), or the internal friction of the gliding apparatus would be expected to generate strongly anisotropic friction. If the latter was dominant, a snapping-like transition into the buckling state would be expected, rather than the continuously growing amplitude that is observed in experiments. On the other hand, it indicates that friction and propulsion forces…”

      For readers from a non-fluids background, some additional discussion of the drag forces, and the forms of friction would help. For a freely gliding filament if f is the force density (per unit length), then steady gliding with a viscous frictional drag would suggest (as mentioned in the paper) f ∼ v! L η||. The critical buckling length is then dependent on f and on B the bending modulus. Here the effective drag is defined per length. I can see from this that if the active force is fixed, and the viscous component resulting from the frictional mechanism is fixed, the critical buckling length will not depend on the velocity (unless I am missing something in their argument), since the velocity is not a primitive variable, and is itself an emergent quantity.

      We are not sure what “f ∼ v! L η||” means, possibly the spelling was corrupted in the forwarding of the comments.

      We assumed an overdamped motion in which the friction force density ff (per unit length of the filament) is proportional to the velocity v0, i.e. ff ∼ η v0, with a friction coefficient η. Overdamped means that the friction force density is equal and opposite to the propulsion force density, so the propulsion force density is f ∼ ff ∼ η v0. The total friction and propulsion forces can be obtained by multiplication with the filament length

      L, which is not required here. In this picture, v0 is an emergent quantity and f and η are assumed as given and constant. Thus, by observing v0, f can be inferred up to the friction coefficient η. Therefore, by using two descriptive variables, L and v0, with known B, the primitive variable η can be inferred by logistic regression, and f then follows from the overdamped equation of motion.

      To clarify this, we revised the corresponding section on page 5 of the paper:

      “The substrate contact requires lubrication from polysaccharide slime to enable bacteria to glide (Khayatan et al., 2015 ). Thus we assume an over- damped motion with co-linear friction, for which the propulsion force f and the free gliding velocity v0 of a filament are related by f = η v0, with a friction coefficient η. In this scenario, f can be inferred both from the observed Lc ∼ (f/B)−1/3 and, up to the proportionality coefficient η, from the observed free gliding velocity. Thus, by combining the two relations, one may expect also a strong correlation between Lc and v0. In order to test this relation for consistency with our data, we include v0 as a second regressor, by setting x = (L−Lc(v0))/∆Lc in Equation 1, with Lc(v0) = (η v0/(30.5722 B))−1/3, to reflect our expectation from theory (see below). Now, η rather than f is the only unknown, and its ensemble distribution will be determined in the regression. Figure 3 E,F show the buckling behavior…”

      Reviewer 2:

      In the presented manuscript, the authors first use structured microfluidic devices with gliding filamentous cyanobacteria inside in combination with micropipette force measurements to measure the bending rigidity of the filaments.

      Next, they use triangular structures to trap the bacteria with the front against an obstacle. Depending on the length and rigidity, the filaments buckle under the propulsive force of the cells. The authors use theoretical expressions for the buckling threshold to infer propulsive force, given the measured length and stiffnesses. They find nearly identical values for both species, f ∼ (1.0 ± 0.6) nN/µm, nearly independent of the velocity.

      Finally, they measure the shape of the filament dynamically to infer friction coefficients via Kirchhoff theory. This last part seems a bit inconsistent with the previous inference of propulsive force. Before, they assumed the same propulsive force for all bacteria and showed only a very weak correlation between buckling and propulsive velocity. In this section, they report a strong correlation with velocity, and report propulsive forces that vary over two orders of magnitude. I might be misunderstanding something, but I think this discrepancy should have been discussed or explained.

      We regret the misunderstanding of the reviewer regarding the velocity dependence, which indicates that the manuscript should be improved to convey these relations correctly.

      First, in the Buckling Measurements section, we did not assume the same propulsion force for all bacteria. The logistic regression yields an ensemble median for Lc (and thus an ensemble median for f ), along with the width ∆Lc of the distribution (and thus also the width of the distribution of f ). Our result f ∼ (1.0 ± 0.6) nN/µm indicates the median and the width of the distribution of the propulsion force densities across the ensemble of several hundred filaments used in the buckling measurements. The large variability of the forces found in the second part is consistently reflected by this very wide distribution of active forces detected in the logistic regression in the first part.

      We did small modifications to the buckling theory paragraph to clarify that in the first part, a distribution of forces rather than a constant value is inferred (page 6)

      “Inserting the population median and quartiles of the distributions of bending modulus and critical length, we can now quantify the distribution of the active force density for the filaments in the ensemble from the buckling measurements. We obtain nearly identical values for both species, f ∼ (1.0±0.6) nN/µm, where the uncertainty represents a wide distribution of f across the ensemble rather than a measurement error.”

      The same holds, of course, when inferring the distribution of the friction coefficients (page 5):

      “The substrate contact requires lubrication from polysaccharide slime to enable bacteria to glide (Khayatan et al., 2015 ). Thus we assume an over- damped motion with co-linear friction, for which the propulsion force f and the free gliding velocity v0 of a filament are related by f = η v0, with a friction coefficient η. In this scenario, f can be inferred both from the observed Lc ∼ (f/B)−1/3 and, up to the proportionality coefficient η, from the observed free gliding velocity. Thus, by combining the two relations, one may expect also a strong correlation between Lc and v0. In order to test this relation for consistency with our data, we include v0 as a second regressor, by setting x = (L−Lc(v0))/∆Lc in Equation 1, with Lc(v0) = (η v0/(30.5722 B))−1/3, to reflect our expectation from theory (see below). Now, η rather than f is the only unknown, and its ensemble distribution will be determined in the regression. Figure 3 E,F show the buckling behavior…”

      The (naturally) wide distribution of force (and friction) leads to a distribution of Lc as well. However, due to the small exponent of 1/3 in the buckling threshold Lc ∼ f 1/3, the distribution of Lc is not as wide as the distributions of the individually inferred f or η. This is visualized in panel G of Figure 3, plotting Lc as a function of v0 (v0 is equivalent to f , up to a proportionality coefficient η). The natural length distribution, in contrast, is very wide. Therefore, the buckling propensity of a filament is most strongly characterized by its length, while force variability, which alters Lc of the individual, plays a secondary role.

      In order to clarify this, we edited the last paragraph of the Buckling Measurements section on page 5 of the manuscript:

      “…Within the characteristic range of observed velocities (1 − 3 µm/s), the median Lc depends only mildly on v0, as compared to its rather broad distribution, indicated by the bands in Figure 3 G. Thus a possible correlation between f and v0 would only mildly alter Lc. The natural length distribution (cf. Appendix 1—figure 1 ), however, is very broad, and we conclude that growth rather than velocity or force distributions most strongly impacts the buckling propensity of cyanobacterial colonies. Also, we hardly observed short and fast filaments of K. animale, which might be caused by physiological limitations (Burkholder, 1934 ).”

      Second, in the Profile analysis section, we did not report a correlation between force and velocity. As can be seen in Figure 4—figure Supplement 1, neither the active force nor the friction coefficient, as determined from the analysis of individual filaments, show any significant correlation with the velocity. This is also written in the discussion (page 7):

      We see no significant correlation between L or v0 and f or η, but the observed values of f and η cover a wide range (Figure 4 B, C and Figure 4—figure Supplement 1 ).

      Note that this is indeed consistent with the logistic regression: Using v0 as a second regressor did not significantly reduce the width of the distribution of Lc as compared to the simple logistic regression, indicating that force and velocity are not strongly correlated.

      In order to clarify this in the manuscript, we modified that part (page 7):

      “…We see no significant correlation between L or v0 and f or η, but the observed values of f and η cover a wide range (Figure 4 B,C and Figure 4— figure Supplement 1 ). This is consistent with the logistic regression, where using v0 as a second regressor did not significantly reduce the width of the distribution of critical lengths or active forces. The two estimates of the friction coefficient, from logistic regression and individual profile fits, are measured in (predominantly) orthogonal directions: tangentially for the logistic regression where the free gliding velocity was used, and transversely for the evolution of the buckling profiles. Thus we plot f/v over η in Figure 4 D, finding nearly identical values over about two decades. Since f and η are not correlated with v0, this is due to a correlation between f and η. This relation is remarkable in two aspects: On the one hand, it indicates that friction is mainly isotropic…”

      From a theoretical perspective, not many new results are presented. The authors repeat the well-known calculation for filaments buckling under propulsive load and arrive at the literature result of buckling when the dimensionless number (f L3/B) is larger than 30.6 as previously derived by Sekimoto et al in 1995 [1] (see [2] for a clamped boundary condition and simulations). Other theoretical predictions for pushed semi-flexible filaments [1–4] are not discussed or compared with the experiments. Finally, the Authors use molecular dynamics type simulations similar to [2–4] to reproduce the buckling dynamics from the experiments. Unfortunately, no systematic comparison is performed.

      [1]        Ken Sekimoto, Naoki Mori, Katsuhisa Tawada, and Yoko Y Toyoshima. Symmetry breaking instabilities of an in vitro biological system. Physical review letters, 75(1):172, 1995.

      [2]       Raghunath Chelakkot, Arvind Gopinath, Lakshminarayanan Mahadevan, and Michael F Hagan. Flagellar dynamics of a connected chain of active, polar, brownian particles. Journal of The Royal Society Interface, 11(92):20130884, 2014.

      [3]       Rolf E Isele-Holder, Jens Elgeti, and Gerhard Gompper. Self-propelled worm-like filaments: spontaneous spiral formation, structure, and dynamics. Soft matter, 11(36):7181–7190, 2015.

      [4]       Rolf E Isele-Holder, Julia J¨ager, Guglielmo Saggiorato, Jens Elgeti, and Gerhard Gompper. Dynamics of self-propelled filaments pushing a load. Soft Matter, 12(41):8495–8505, 2016.

      We thank the reviewer for pointing us to these publications, in particular the work by Sekimoto we were not aware of. We agree with the referee that the calculation is straight forward (basically known since Euler, up to modified boundary conditions). Our paper focuses on experimental work, the molecular dynamics simulations were included mainly as a consistency check and not intended to generate the beautiful post-buckling patterns observed in references [2-4]. However, such shapes do emerge in filamentous cyanobacteria, and with the data provided in our manuscript, simulations can be quantitatively matched to our experiments, which will be covered by future work.

      We included the references in the revision of our manuscript, and a statement that we do not claim priority on these classical theoretical results.

      Introduction, page 2:

      “…Self-Buckling is an important instability for self-propelling rod-like micro-organisms to change the orientation of their motion, enabling aggregation or the escape from traps (Fily et al., 2020; Man and Kanso, 2019; Isele-Holder et al., 2015; Isele-Holder et al., 2016 ). The notion of self-buckling goes back to work of Leonhard Euler in 1780, who described elastic columns subject to gravity (Elishakoff, 2000 ). Here, the principle is adapted to the self-propelling, flexible filaments (Fily et al., 2020; Man and Kanso, 2019; Sekimoto et al., 1995 ) that glide onto an obstacle. Filaments buckle if they exceed a certain critical length Lc ∼ (B/f)1/3, where B is the bending modulus and f the propulsion force density…”

      Buckling theory, page 5:

      “…The buckling of gliding filaments differs in two aspects: the propulsion forces are oriented tangentially instead of vertically, and the front end is supported instead of clamped. Therefore, with L < Lc all initial orientations are indifferently stable, while for L > Lc, buckling induces curvature and a resultant torque on the head, leading to rotation (Fily et al., 2020; Chelakkot et al., 2014; Sekimoto et al., 1995 ). Buckling under concentrated tangential end-loads has also been investigated in literature (de Canio et al., 2017; Wolgemuth et al., 2005 ), but leads to substantially different shapes of buckled filaments. We use classical Kirchhoff theory for a uniform beam of length L and bending modulus B, subject to a force density ⃗b = −f ⃗t − η ⃗v, with an effective active force density f along the tangent ⃗t, and an effective friction proportional to the local velocity ⃗v, analog to existing literature (Fily et al., 2020; Chelakkot et al., 2014; Sekimoto et al., 1995 )…”

      Further on page 6:

      “To derive the critical self-buckling length, Equation 5 can be linearized for two scenarios that lead to the same Lc: early-time small amplitude buckling and late-time stationary rotation at small and constant curvature (Fily et al., 2020; Chelakkot et al., 2014 ; Sekimoto et al., 1995 ). […] Thus, in physical units, the critical length is given by Lc = (30.5722 B/f)1/3, which is reproduced in particle based simulations (Appendix Figure 2 ) analogous to those in Isele-Holder et al. (2015, 2016).”

      Discussion, page 7 & 8:

      “…This, in turn, has dramatic consequences on the exploration behavior and the emerging patterns (Isele-Holder et al., 2015, 2016; Abbaspour et al., 2021; Duman et al., 2018; Prathyusha et al., 2018; Jung et al., 2020 ): (L/Lc)3 is, up to a numerical prefactor, identical to the flexure number (Isele-Holder et al., 2015, 2016; Duman et al., 2018; Winkler et al., 2017 ), the ratio of the Peclet number and the persistence length of active polymer melts. Thus, the ample variety of non-equilibrium phases in such materials (Isele-Holder et al., 2015, 2016; Prathyusha et al., 2018; Abbaspour et al., 2021 ) may well have contributed to the evolutionary success of filamentous cyanobacteria.”

      Reviewer 3:

      Summary:

      This paper presents novel and innovative force measurements of the biophysics of gliding cyanobacteria filaments. These measurements allow for estimates of the resistive force between the cell and substrate and provide potential insight into the motility mechanism of these cells, which remains unknown.

      We thank the reviewer for the positive evaluation of our work. We have revised the manuscript according to their comments and detail our replies and modifications next to the individual points below.

      Strengths:

      The authors used well-designed microfabricated devices to measure the bending modulus of these cells and to determine the critical length at which the cells buckle. I especially appreciated the way the authors constructed an array of pillars and used it to do 3-point bending measurements and the arrangement the authors used to direct cells into a V-shaped corner in order to examine at what length the cells buckled at. By examining the gliding speed of the cells before buckling events, the authors were able to determine how strongly the buckling length depends on the gliding speed, which could be an indicator of how the force exerted by the cells depends on cell length; however, the authors did not comment on this directly.

      We thank the referee for the positive assessment of our work. Importantly, we do not see a significant correlation between buckling length and gliding speeds, and we also do not see a correlation with filament length, consistent with the assumption of a propulsion force density that is more or less homogeneously distributed along the filament. Note that each filament consists of many metabolically independent cells, which renders cyanobacterial gliding a collective effort of many cells, in contrast to gliding of, e.g., myxobacteria.

      In response also to the other referees’ comments, we modified the manuscript to reflect more on the absence of a strong correlation between velocity and force/critical length. We modified the Buckling measurements section on page 5 of the paper:

      “The substrate contact requires lubrication from polysaccharide slime to enable bacteria to glide (Khayatan et al., 2015 ). Thus we assume an over-damped motion with co-linear friction, for which the propulsion force f and the free gliding velocity v0 of a filament are related by f = η v0, with a friction coefficient η. In this scenario, f can be inferred both from the observed Lc ∼ (f/B)−1/3 and, up to the proportionality coefficient η, from the observed free gliding velocity. Thus, by combining the two relations, one may expect also a strong correlation between Lc and v0. In order to test this relation for consistency with our data, we include v0 as a second regressor, by setting x = (L−Lc(v0))/∆Lc in Equation 1, with Lc(v0) = (η v0/(30.5722 B))−1/3, to reflect our expectation from theory (see below). Now, η rather than f is the only unknown, and its ensemble distribution will be determined in the regression. Figure 3 E, F show the buckling behavior…”

      Further, we edited the last paragraph of the Buckling measurements section on page 5 of the manuscript:

      “Within the characteristic range of observed velocities (1 − 3 µm/s), the median Lc depends only mildly on v0, as compared to its rather broad distribution, indicated by the bands in Figure 3 G. Thus a possible correlation between f and v0 would only mildly alter Lc. The natural length distribution (cf. Appendix 1—figure 1 ), however, is very broad, and we conclude that growth rather than velocity or force distributions most strongly impacts the buckling propensity of cyanobacterial colonies. Also, we hardly observed short and fast filaments of K. animale, which might be caused by physiological limitations (Burkholder, 1934 ).”

      We also rephrased the corresponding discussion paragraph on page 7:

      “…Thus we plot f/v over η in Figure 4 D, finding nearly identical values over about two decades. Since f and η are not correlated with v0, this is due to a correlation between f and η. This relation is remarkable in two aspects: On the one hand, it indicates that friction is mainly isotropic. This suggests that friction is governed by an isotropic process like bond friction or lubrication from the slime layer in the contact with the substrate, the latter being consistent with the observation that mutations deficient of slime secretion do not glide but exogenous addition of slime restores motility (Khayatan et al., 2015 ). In contrast, hydrodynamic drag from the surrounding bulk fluid (Man and Kanso, 2019 ), or the internal friction of the gliding apparatus would be expected to generate strongly anisotropic friction. If the latter was dominant, a snapping-like transition into the buckling state would be expected, rather than the continuously growing amplitude that is observed in experiments. On the other hand, it indicates that friction and propulsion forces…”

      Weaknesses:

      There were two minor weaknesses in the paper.

      First, the authors investigate the buckling of these gliding cells using an Euler beam model. A similar mathematical analysis was used to estimate the bending modulus and gliding force for Myxobacteria (C.W. Wolgemuth, Biophys. J. 89: 945-950 (2005)). A similar mathematical model was also examined in G. De Canio, E. Lauga, and R.E Goldstein, J. Roy. Soc. Interface, 14: 20170491 (2017). The authors should have cited these previous works and pointed out any differences between what they did and what was done before.

      We thank the reviewer for pointing us to these references. The paper by Wolgemuth is theoretical work, describing A-motility in myxobacteria by a concentrated propulsion force at the rear end of the bacterium, possibly stemming from slime extrusion. This model was a little later refuted by [A3], who demonstrated that focal adhesion along the bacterial body and thus a distributed force powers A-motility, a mechanism that has by now been investigated in great detail (see [A10]). The paper by Canio et al. contains a thorough theoretical analysis of a filament that is clamped at one end and subject to a concentrated tangential load on the other. Since both models comprise a concentrated end-load rather than a distributed propulsion force density, they describe a substantially different motility mechanism, leading also to substantially different buckling profiles. Consequentially, these models cannot be applied to cyanobacterial gliding.

      We included both citations in the revision and pointed out the differences to our work in the introduction (page 2):

      “…A few species appear to employ a type-IV-pilus related mechanism (Khayatan et al., 2015; Wilde and Mullineaux, 2015 ), similar to the better- studied myxobacteria (Godwin et al., 1989; Mignot et al., 2007; Nan et al., 2014; Copenhagen et al., 2021; Godwin et al., 1989 ), which are short, rod-shaped single cells that exhibit two types of motility: S (social) motility based on pilus extension and retraction, and A (adventurous) motility based on focal adhesion (Chen and Nan, 2022 ) for which also slime extrusion at the trailing cell pole was earlier postulated as mechanism (Wolgemuth et al., 2005 ). Yet, most gliding filamentous cyanobacteria do not exhibit pili and their gliding mechanism appears to be distinct from myxobacteria (Khayatan et al., 2015 ).”

      And in Buckling theory, page 5:

      “….The buckling of gliding filaments differs in two aspects: the propulsion forces are oriented tangentially instead of vertically, and the front end is supported instead of clamped. Therefore, with L < Lc all initial orientations are indifferently stable, while for L > Lc, buckling induces curvature and a resultant torque on the head, leading to rotation (Fily et al., 2020; Chelakkot et al., 2014; Sekimoto et al., 1995 ). Buckling under concentrated tangential end-loads has also been investigated in literature (de Canio et al., 2017; Wolgemuth et al., 2005 ), but leads to substantially different shapes of buckled filaments.”

      The second weakness is that the authors claim that their results favor a focal adhesion-based mechanism for cyanobacterial gliding motility. This is based on their result that friction and adhesion forces correlate strongly. They then conjecture that this is due to more intimate contact with the surface, with more contacts producing more force and pulling the filaments closer to the substrate, which produces more friction. They then claim that a slime-extrusion mechanism would necessarily involve more force and lower friction. Is it necessarily true that this latter statement is correct? (I admit that it could be, but is it a requirement?)

      We thank the referee for raising this interesting question. Our claim regarding slime extrusion is based on three facts: i. mutations deficient of slime extrusion do not glide, but start gliding as soon as slime is provided externally [A4]. ii. A positive correlation between speed and slime layer thickness was observed in Nostoc [A11]. iii. The fluid mechanics of lubricated sliding contacts is very well understood and predicts a decreasing resistance with increasing layer thickness.

      We included these considerations in the revision of our manuscript (page 8):

      “…it indicates that friction and propulsion forces, despite being quite variable, correlate strongly. Thus, generating more force comes, inevitably, at the expense of added friction. For lubricated contacts, the friction coefficient is proportional to the thickness of the lubricating layer (Snoeijer et al., 2013 ), and we conjecture active force and drag both increase due to a more intimate contact with the substrate. This supports mechanisms like focal adhesion (Mignot et al., 2007 ) or a modified type-IV pilus (Khayatan et al., 2015 ), which generate forces through contact with extracellular surfaces, as the underlying mechanism of the gliding apparatus of filamentous cyanobacteria: more contacts generate more force, but also closer contact with the substrate, thereby increasing friction to the same extent. Force generation by slime extrusion (Hoiczyk and Baumeister, 1998 ), in contrast, would lead to the opposite behavior: More slime generates more propulsion, but also reduces friction. Besides fundamental fluid-mechanical considerations (Snoeijer et al., 2013 ), this is rationalized by two experimental observations: i. gliding velocity correlates positively with slime layer thickness (Dhahri et al., 2013 ) and ii. motility in slime-secretion deficient mutants is restored upon exogenous addition of polysaccharide slime. Still we emphasize that many other possibilities exist. One could, for instance, postulate a regulation of the generated forces to the experienced friction, to maintain some preferred or saturated velocity.”

      Related to this, the authors use a model with isotropic friction. They claim that this is justified because they are able to fit the cell shapes well with this assumption. How would assuming a non-isotropic drag coefficient affect the shapes? It may be that it does equally well, in which case, the quality of the fits would not be informative about whether or not the drag was isotropic or not.

      The referee raises another very interesting point. Given the typical variability and uncertainty in experimental measurements (cf. error Figure 4 A), a model with a sightly anisotropic friction could be fitted to the observed buckling profiles as well, without significant increase of the mismatch. Yet, strongly anisotropic friction would not be consistent with our observations.

      Importantly, however, we did not conclude on isotropic friction based on the fit quality, but based on a comparison between free gliding and early buckling (Figure 4 D). In early buckling, the dominant motion is in transverse direction, while longitudinal motion is insignificant, due to geometric reasons. Thus, independent of the underlying model, mostly the transverse friction coefficiont is inferred. In contrast, free gliding is a purely longitudinal motion, and thus only the friction coefficient for longitudinal motion can be inferred. These two friction coefficients are compared in Figure 4 D. Still, the scatter of that data would allow to fit a certain anisotropy within the error margins. What we can exclude based on out observation is the case of a strongly anisotropic friction. If there is no ab-initio reason for anisotropy, nor a measurement that indicates it, we prefer to stick with the simplest

      assumption. We carefully chose our wording in the Discussion as “mainly isotropic” rather

      than “isotropic” or “fully isotropic”.

      We added a small statement to the Discussion on page 7 & 8:

      “... Thus we plot f/v over η in Figure 4 D, finding nearly identical values over about two decades. Since f and η are not correlated with v0, this is due to a correlation between f and η. This relation is remarkable in two aspects: On the one hand, it indicates that friction is mainly isotropic. This suggests that friction is governed by an isotropic process like bond friction or lubrication from the slime layer in the contact with the substrate, the latter being consistent with the observation that mutations deficient of slime secretion do not glide but exogenous addition of slime restores motility (Khayatan et al., 2015 ). In contrast, hydrodynamic drag from the surrounding bulk fluid (Man and Kanso, 2019 ), or the internal friction of the gliding apparatus would be expected to generate strongly anisotropic friction. If the latter was dominant, a snapping-like transition into the buckling state would be expected, rather than the continuously growing amplitude that is observed in experiments. On the other hand, it indicates that friction and propulsion forces ...”

      Recommendations for the authors

      The discussion regarding how the findings of this paper imply that cyanobacteria filaments are propelled by adhesion forces rather than slime extrusion should be improved, as this conclusion seems questionable. There appears to be an inconsistency with a buckling force said to be only weakly dependent on the gliding velocity, while its ratio with the velocity correlates with a friction coefficient. Finally, data and source code should be made publicly available.

      In the revised version, we have modified the discussion of the force generating mechanism according to the reviewer suggestions. The perception of inconsistency in the velocity dependence of the buckling force was based on a misunderstanding, as we detailed in our reply to the referee. We revised the corresponding section to make it more clear. Data and source code have been uploaded to a public data repository.

      Reviewer #2 (recommendations for the authors)

      Despite eLife policy, the authors do not provide a Data Availability Statement. For the presented manuscript, data and source code should be provided “via trusted institutional or third-party repositories that adhere to policies that make data discoverable, accessible and usable.” https://elifesciences.org/inside-elife/51839f0a/for-authors-updates- to-elife-s-data-sharing-policies

      Most of the issues in this reviewer’s public review should be easy to correct, so I would strongly support the authors to provide an amended manuscript.

      We added the Data Availability Statement in the amended manuscript.

      References

      [A1] E. Hoiczyk and W. Baumeister. “The junctional pore complex, a prokaryotic secretion organelle, is the molecular motor underlying gliding motility in cyanobacteria”. In: Curr. Biol. 8.21 (1998), pp. 1161–1168. doi: 10.1016/s0960-9822(07)00487-3.

      [A2] N. Read, S. Connell, and D. G. Adams. “Nanoscale Visualization of a Fibrillar Array in the Cell Wall of Filamentous Cyanobacteria and Its Implications for Gliding Motility”. In: J. Bacteriol. 189.20 (2007), pp. 7361–7366. doi: 10.1128/jb.00706- 07.

      [A3] T. Mignot, J. W. Shaevitz, P. L. Hartzell, and D. R. Zusman. “Evidence That Focal Adhesion Complexes Power Bacterial Gliding Motility”. In: Science 315.5813 (2007), pp. 853–856. doi: 10.1126/science.1137223.

      [A4] Behzad Khayatan, John C. Meeks, and Douglas D. Risser. “Evidence that a modified type IV pilus-like system powers gliding motility and polysaccharide secretion in filamentous cyanobacteria”. In: Mol. Microbiol. 98.6 (2015), pp. 1021–1036. doi: 10.1111/mmi.13205.

      [A5] Tilo Pompe, Martin Kaufmann, Maria Kasimir, Stephanie Johne, Stefan Glorius, Lars Renner, Manfred Bobeth, Wolfgang Pompe, and Carsten Werner. “Friction- controlled traction force in cell adhesion”. In: Biophysical journal 101.8 (2011), pp. 1863–1870.

      [A6] Hirofumi Wada, Daisuke Nakane, and Hsuan-Yi Chen. “Bidirectional bacterial gliding motility powered by the collective transport of cell surface proteins”. In: Physical Review Letters 111.24 (2013), p. 248102.

      [A7] Jo¨el Tchoufag, Pushpita Ghosh, Connor B Pogue, Beiyan Nan, and Kranthi K Mandadapu. “Mechanisms for bacterial gliding motility on soft substrates”. In: Proceedings of the National Academy of Sciences 116.50 (2019), pp. 25087–25096.

      [A8] Chenyi Fei, Sheng Mao, Jing Yan, Ricard Alert, Howard A Stone, Bonnie L Bassler, Ned S Wingreen, and Andrej Kosmrlj. “Nonuniform growth and surface friction determine bacterial biofilm morphology on soft substrates”. In: Proceedings of the National Academy of Sciences 117.14 (2020), pp. 7622–7632.

      [A9] Arja Ray, Oscar Lee, Zaw Win, Rachel M Edwards, Patrick W Alford, Deok-Ho Kim, and Paolo P Provenzano. “Anisotropic forces from spatially constrained focal adhesions mediate contact guidance directed cell migration”. In: Nature communications 8.1 (2017), p. 14923.

      [A10] Jing Chen and Beiyan Nan. “Flagellar motor transformed: biophysical perspectives of the Myxococcus xanthus gliding mechanism”. In: Frontiers in Microbiology 13 (2022), p. 891694.

      [A11] Samia Dhahri, Michel Ramonda, and Christian Marliere. “In-situ determination of the mechanical properties of gliding or non-motile bacteria by atomic force microscopy under physiological conditions without immobilization”. In: PLoS One 8.4 (2013), e61663.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      I will summarize my comments and suggestions below.

      (1) Abstract:

      "Non-catalytic (pseudo)kinase signaling mechanisms have been described in metazoans, but information is scarce for plants." To the best of my understanding EFR is an active protein kinase in vitro and in vivo and cannot be considered a pseudokinase. Consider rephrasing.

      We rephrased to: “Non-catalytic signaling mechanisms of protein kinase domains have been described in metazoans, but information is scarce for plants.”

      (2) Page 4: It should be noted, that while membrane associated Rap-RiD systems have been used in planta to activate receptor kinase intracellular domains by promoting interaction with a co-receptor kinase domain, this system does not resemble the actual activation mechanism in the plasma membrane. This would be worth discussing when introducing the system. For example, the first substrates of the RK signaling complex may also be membrane associated and not freely diffuse in solution, which may be important for enzyme-substrate interaction.

      We inserted on page 4: “The RiD system was previously applied in planta, maintaining membrane-association by N-terminal myristoylation (Kim et al., 2021). For the in vitro experiments, the myristoylation sites were excluded to facilitate the production of recombinant protein.”

      (3) Page 4 and Fig 1: The catalytic Asp in BRI1 is D1027 and not D1009 (https://pubmed.ncbi.nlm.nih.gov/21289069/). Please check and prepare the correct mutant protein if needed.

      We clarified this in the text by stating that we mutated the HRD-aspartate to asparagine in all our catalytic-dead mutants: “Kinase-dead variants with the catalytic residue (HRD-aspartate) replaced by asparagine (EFRD849N and BRI1D1009N), had distinct effects […]”. D1027 in BRI1 is the DFG-Asp, which was not mutated in our study.

      (4) Page 4 and Fig 1: Is BIK1 a known component of the BR signaling pathway and a direct BRI1 substrate? Or in other words how specific is the trans-phosphorylation assay? In my opinion, a more suitable substrate for BRI1/BAK1 would be BSK1 or BSK3 (for example https://pubmed.ncbi.nlm.nih.gov/30615605/).

      Kinase-dead BIK1 is a reported substrate of BRI1. We clarified this in the results section by inserting: “BIK1 was chosen as it is reported substrate of both, EFR/BAK1 and BRI1/BAK1 complexes (Lin et al., 2013).”

      (5) Fig. 1B Why is BIK1 D202N partially phosphorylated in the absence of Rap? I would suggest to add control lanes showing BRI1, EFR, FLS2, BAK1 and BIK1 in isolation. Given that a nice in vitro activation system with purified components is available, why not compare the different enzyme kinetics rather than band intensities at only 1 enzyme : substrate ratio?

      BIK1 D202N is partially phosphorylated due to the presence of active BAK1 that is capable of transphosphorylating BIK1 D202N as it has been reported in a previous study: (DOI: 10.1038/s41586-018-0471-x).

      (6) Page 4 and Fig 1: Is the kinase dead variant of EFR indeed kinase dead? I could still see a decent autorad signal for this mutant when expressed in E. coli (Fig 1 A in Bender et al., 2021; https://pubmed.ncbi.nlm.nih.gov/34531323/)? If this mutant is not completely inactive, could this change the interpretation of the experiments performed with the mutant protein in vitro and in planta in the current manuscript? In my opinion, it could be possible that a partially active EFR mutant can be further activated by BAK1, and in turn can phosphorylate BIK1 D202N. The differences in autorad signal for BRI1D1009?N and EFRD849N is very small, and the entire mechanism hinges on this difference.

      We would like to emphasize that the mechanism hinges on the difference between non-dimerized and dimerized kinase domains in the in vitro kinase assay. BRI1 D1009N fails to enhance BIK1 D202N trans-phosphorylation compared to the non-dimerized sample, while EFR D849N is still capable of enhancing BIK1 transphosphorylation upon dimerization as indicated by quantification of autorads (Figure 1B/C). We have also addressed this point in a section on the limitations of our study.

      (7) Fig 1B. "Our findings therefore support the hypothesis that EFR increases BIK1 phosphorylation by allosterically activating the BAK1 kinase domain." To the best of my understanding presence of wild-type EFR in the EFR-BAK1 signaling complex leads to much better phosphorylation of BIK1D202N when compared to the EFRD849N mutant. How does that support the allosteric mechanism? By assuming that the D849N mutant is in an inactive conformation and fully catalytically inactive (see above)? Again, I think the data could also be interpreted in such a way that the small difference in autorad signal for BIK1 between BRI1 inactive (but see above) and ERF inactive are due to EFR not being completely kinase dead (see above), rather than EFR being an allosteric regulator. To clarify this point I would suggest to a) perform quantitative auto- and trans-(generic substrate) phosphorylation assays with wt and D849N EFR to derive enzyme kinetic parameters, to (2) include the EFRD849 mutant in the HDX analysis and (3) to generate transgenic lines for EFRD489N/F761H/Y836F // EFRD489N/F761H/SSAA and compare them to the existing lines in Fig. 3.

      Mutations of proteins, especially those that require conformational plasticity for their function can have pleiotropic effects as the mutation may affect the conformational plasticity and consequently catalytic and non-catalytic functions that depend on the conformational plasticity. In such cases, it is difficult to fully untangle catalytic and non-catalytic functions. Coming back to EFR D849N, the D849N mutation may also impact the non-catalytic function by altering the conformational plasticity, explaining the difference observed in EFR vs EFR D849N. As you rightly suggested, HDX would be a way to address this but would still not clarify whether catalytic activity contributes to activation. We instead attempted to produce analog sensitive EFR variants for in vivo characterization of EFR-targeted catalytic inhibition. Unfortunately, we failed in producing an analog-sensitive variant for which we could show ATP-analog binding. To address your concern, we inserted a section on limitations of the study.

      (8) Fig. 2B,C, supplement 3 C,D. Has it been assessed if the different EFR versions were expressed to similar protein levels and still localized to the PM?

      Localization of the mutant receptors has not been explicitly evaluated by confocal microscopy. However, the selected mutation EFRF761H is shown to accumulate in stable Arabidopsis lines (Figure 3 – Supplement 1C) and BAK1 could be coIPed by all EFR variants upon elf18-treatment (Figure 3 B), indicating plasma membrane localization.

      (9) How the active-like conformation of EFR is in turn activating BAK1 is poorly characterized, but appears to be the main step in the activation of the receptor complex. Extending the HDX analyses to resting and Rap-activated receptor complexes could be a first step to address this question. I tried to come up with an experimental plan to test if indeed the kinase activity of BAK1 and not of EFR is essential for signal propagation, but this is a complex issue. You would need to be able to mimic an activated form of EFR (which you can), to make sure its inactive (possibly, see above) and likewise to engineer a catalytically inactive form of BAK1 in an active-like state (difficult). As such a decisive experiment is difficult to implement, I would suggest to discuss different possible interpretations of the existing data and alternative scenarios in the discussion section of the manuscript.

      We addressed your concern whether BAK1 kinase activity is essential for signaling propagation by pairing EFRF761H and BAK1D416N (Figure 4 Supplement 2 C) which fails to induce signaling. In this case, EFRF761H is in its activated conformation but cannot activate downstream signaling. We also attempted to address your concern by an in vitro kinase assay by pairing EFR and BAK1D416N and using a range of concentrations of the substrate BIK1D202N. We observed that catalytic activity of BAK1 but not EFR was essential for BIK1 phosphorylation. However, this experiment does not address whether activated EFR can efficiently propagate signaling in the absence of BAK1 catalytic activity. In the limitations of the study section, we now discuss the catalytic importance of EFR for signaling activation.

      Author response image 1.

      BIK1 trans-phosphorylation depends on BAK1 catalytic activity. Increasing concentrations of BIK1 D202N were used as substrate for Rap-induced dimers of EFR-BAK1, EFR D849N-BAK1, and EFR-BAK1 D416N respectively. BIK1 trans-phosphorylation depended on the catalytic activity of BAK1. Proteins were purified from E. coli λPP cells. Three experiments yielded similar results of which a representative is shown here.

      Reviewer #2:

      All of my suggestions are minor.

      Figure 1B, I think it would be more useful to readers to explain the amino acid in the D-N change, rather than just call it D-to-N? Also, please label the bands on the stained gel; the shift on FKBP-BRI1 and FKBP-EFR are noticeable on the Coomassie stain.

      We implemented your suggestions.

      Figure 1-Supplement 1. There is still a signal in pS612 BAK1 (it states 'also failed to induce BAK1 S612 phosphorylation' in the text, which is not quite correct). Also, could mention the gel shift seen in BAK1, which appears absent in Y836F.

      We corrected the text which now states: “To test whether the requirement for Y836 phosphorylation is similar, we immunoprecipitated EFR-GFP and EFRY836F-GFP from mock- or elf18-treated seedlings and probed co-immunoprecipitated BAK1 for S612 phosphorylation. EFRY836F also obstructed the induction of BAK1 S612 phosphorylation (Figure 1 – Supplement 1), indicating that EFRY836F and EFRSSAA impair receptor complex activation.” The gel shift of BAK1 you pointed out was not observed in replications and thus we prefer not to comment on it.

      Figure 2 and 3 are full of a, b, c,d's, which I don't understand. Sorry

      We used uppercase letters to indicate subpanels and lowercase letters to indicate the results of the statistical testing. In the figure caption, we have clarified that the lowercase letters refer to statistical comparisons.

      Figure 2 A. If each point on the x-axis is one amino acid, I think it would again be useful to name the amino acids that the gold or purple or blue colored lines extend through.

      Each point stands for a peptide which are sorted by position of their starting amino acid from N-terminus to C-terminus. We now added plots of HDX for individual peptides that correspond to the highlighted region in subpanel A.

      Figure Supplement 1 is very small for what it is trying to show, even on the printed page. If this residue were to be phosphorylated, what would happen to the H-bond?

      We suppose that VIa-Tyr phosphorylation would break the H-bond and causes displacement of the aC-b4 loop. Recent studies, published after our submission, highlight the importance of this loop for substrate coordination and ATP binding. Thus, phosphorylation of VIa-Tyr and displacing this loop may render the kinase rather unproductive. We have expanded the discussion to include this point.

      Figure 2B: Tyr 836 is not present in any of the alignments in Figure 2A. This should be rectified, because the text talks about the similarity to Tyr 156 in PKA.

      We have adjusted the alignments such that they now contain the VIa-Tyr residues of EFR and PKA.

      Figure 4D. Is there any particular reason that these Blots are so hard to compare or FKBP and BAK1?

      We assume it is referred to Figure 4 – Supplement 2 D. FKBP-EFR and FRB-BAK1 both are approximately the size of RubisCo, the most abundant protein in plant protein samples and which overlay the FKBP- and FRB-tagged kinase. Thus, it is difficult to detect these proteins.

      Reviewer #3:

      (1) The paper reporting the allosteric activation mechanism of EGFR should be cited.

      Will be included.

      (2)The authors showed that "Rap addition increased BIK1 D202N phosphorylation when the BRI1 or EFR kinase domains were dimerized with BAK1, but no such effect was observed with FLS2". Please explain why FLS2 failed to enhance BIK1 transphosphorylation by Rap treatment?

      Even though BIK1 is a reported downstream signaling component of FLS2/BAK1, it might be not the most relevant downstream signaling component and rather related RLCKs, like PBL1, might be better substrates for dimerized FLS2/BAK1. We haven’t tested this, however. Alternatively, the purified FLS2 kinase domain might be labile and quickly unfolds even though it was kept on ice until the start of the assay, or the N-terminal FKBP-tag may disrupt function. As the reason for our observation is not clear, we have removed FLS2 in vitro dimerization experiments from the manuscript.

      (3) Based solely on the data presented in Figure 1, it can be concluded that EFR's kinase activity is not required to facilitate BIK1 transphosphorylation. Therefore, the title of Figure 1, "EFR Allosterically Activates BAK1," may be inappropriate.

      We have changed the figure title to: “EFR facilitates BIK1 trans-phosphorylation by BAK1 non-catalytically.”

      (4) In Figure 1- Supplement 1, I could not find any bands in anti-GFP and anti-BAK1 pS612 of input. Please redo it.

      Indeed, we could not detect protein in the input samples of this experiment. BAK1 S612 phosphorylation is an activation mark and not necessarily expected to be abundant enough for detection in input samples. EFR-GFP, however, is usually detected in input samples and is reported in Macho et al. 2014 from which manuscript these lines come. Why EFR-GFP is not detected in this set of experiments is unclear but, in our opinion, does not detract from the conclusions drawn since similar amounts of EFR-GFP are pulled-down across all samples.

      (5) For Figure 2A, please mark the structure represented by each color directly in the figure.

      We have made the suggested change.

      (6) Please modify "EFRF761/Y836F and EFRF761H/SSAA restore BIK1 trans-phosphorylation" to "EFRF761H/Y836F and EFRF761H/SSAA restore BIK1 trans-phosphorylation".

      Thank you for spotting this. We changed it.

      (7) The HDX-MS analysis demonstrated that the EFR (Y836F) mutation inhibits the formation of the active-like conformation. Conversely, the EFR (F761H) mutation serves as a potent intragenic suppressor, significantly stabilizing the active-like conformation. Confirming through HDX-MS conformational testing that the EFR (Y836F F761H) double mutation does not hinder the formation of the active-like EFR kinase conformation would greatly strengthen the conclusions of the article.

      Response: We agree that this is beneficial, and we attempted to do it but failed to produce enough protein for HDX-MS analysis. We stated this now in an extra section of the paper (“Limitations of the study”).

    1. Author response:

      eLife assessment

      This study provides valuable evidence indicating that Syngap1 regulates the synaptic drive and membrane excitability of parvalbumin- and somatostatin-positive interneurons in the auditory cortex. Since haplo-insufficiency of Syngap1 has been linked to intellectual disabilities without a well-defined underlying cause, the central question of this study is timely. However, the support for the authors' conclusions is incomplete in general and some parts of the experimental evidence are inadequate. Specifically, the manuscript requires further work to properly evaluate the impact on synaptic currents, intrinsic excitability parameters, and morphological features.

      We are happy that the editors found that our study provides valuable evidence and that the central question is timely. We thank the reviewers for their detailed comments and suggestions. Below, we provide a point-by-point answer (in blue) to the specific comments and indicate the changes to the manuscript and the additional experiments we plan to perform to answer these comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study is designed to assess the role of Syngap1 in regulating the physiology of the MGE-derived PV+ and SST+ interneurons. Syngap1 is associated with some mental health disorders, and PV+ and SST+ cells are the focus of many previous and likely future reports from studies of interneuron biology, highlighting the translational and basic neuroscience relevance of the authors' work.

      Strengths of the study are using well-established electrophysiology methods and the highly controlled conditions of ex vivo brain slice experiments combined with a novel intersectional mouse line, to assess the role of Syngap1 in regulating PV+ and SST+ cell properties. The findings revealed that in the mature auditory cortex, Syngap1 haploinsufficiency decreases both the intrinsic excitability and the excitatory synaptic drive onto PV+ neurons from Layer 4. In contrast, SST+ interneurons were mostly unaffected by Syngap1 haploinsufficiency. Pharmacologically manipulating the activity of voltage-gated potassium channels of the Kv1 family suggested that these channels contributed to the decreased PV+ neuron excitability by Syngap insufficiency. These results therefore suggest that normal Syngap1 expression levels are necessary to produce normal PV+ cell intrinsic properties and excitatory synaptic drive, albeit, perhaps surprisingly, inhibitory synaptic transmission was not affected by Syngap1 haploinsufficiency.

      Since the electrophysiology experiments were performed in the adult auditory cortex, while Syngap1 expression was potentially affected since embryonic stages in the MGE, future studies should address two important points that were not tackled in the present study. First, what is the developmental time window in which Syngap1 insufficiency disrupted PV+ neuron properties? Albeit the embryonic Syngap1 deletion most likely affected PV+ neuron maturation, the properties of Syngap-insufficient PV+ neurons do not resemble those of immature PV+ neurons. Second, whereas the observation that Syngap1 haploinsufficiency affected PV+ neurons in auditory cortex layer 4 suggests auditory processing alterations, MGE-derived PV+ neurons populate every cortical area. Therefore, without information on whether Syngap1 expression levels are cortical area-specific, the data in this study would predict that by regulating PV+ neuron electrophysiology, Syngap1 normally controls circuit function in a wide range of cortical areas, and therefore a range of sensory, motor and cognitive functions. These are relatively minor weaknesses regarding interpretation of the data in the present study that the authors could discuss.

      We agree with the reviewer on the proposed open questions, which we will certainly discuss in the revised manuscript we are preparing. We do have experimental evidence suggesting that Syngap1 mRNA is expressed by PV+ and SST+ neurons in different cortical areas, during early postnatal development and in adulthood; therefore, we agree that it will be important, in future experiments, to tackle the question of when the observed phenotypes arise.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors investigated how partial loss of SynGap1 affects inhibitory neurons derived from the MGE in the auditory cortex, focusing on their synaptic inputs and excitability. While haplo-insufficiently of SynGap1 is known to lead to intellectual disabilities, the underlying mechanisms remain unclear.

      Strengths:

      The questions are novel

      Weaknesses:

      Despite the interesting and novel questions, there are significant concerns regarding the experimental design and data quality, as well as potential misinterpretations of key findings. Consequently, the current manuscript fails to contribute substantially to our understanding of SynGap1 loss mechanisms and may even provoke unnecessary controversies.

      Major issues:

      (1) One major concern is the inconsistency and confusion in the intermediate conclusions drawn from the results. For instance, while the sEPSC data indicates decreased amplitude in PV+ and SOM+ cells in cHet animals, the frequency of events remains unchanged. In contrast, the mEPSC data shows no change in amplitudes in PV+ cells, but a significant decrease in event frequency. The authors conclude that the former observation implies decreased excitability. However, traditionally, such observations on mEPSC parameters are considered indicative of presynaptic mechanisms rather than changes of network activity.‎ The subsequent synapse counting experiments align more closely with the traditional conclusions. This issue can be resolved by rephrasing the text. However, it would remain unexplained why the sEPSC frequency shows no significant difference. If the majority of sEPSC events were indeed mediated by spiking (which is blocked by TTX), the average amplitudes and frequency of mEPSCs should be substantially lower than those of sEPSCs. Yet, they fall within a very similar range, suggesting that most sEPSCs may actually be independent of action potentials. But if that was indeed the case, the changes of purported sEPSC and mEPSC results should have been similar.

      We understand the reviewer’s perspective; indeed, we asked ourselves the very same question regarding why the sEPSC and mEPSC frequency fall within a similar range when we analysed neuron means (bar graphs). We have already recorded sEPSCs followed by mEPSCs from several PV neurons (control and cHet) and are in the process of analyzing the data. We will add this data to the revised version of the manuscript. We will also rephrase the manuscript to present multiple potential interpretations of the data.

      We hope that we have correctly interpreted the reviewer's concern. However, if the question is why sEPSC amplitude but not frequency is affected in cHet vs ctrl then the reviewer’s comment is perhaps based on the assumption that the amplitude and frequency of miniature events should be lower for all events compared to those observed for spontaneous events. However, it's essential to note that changes in the mean amplitude of sEPSCs are primarily driven by alterations in large sEPSCs (>9-10pA, as shown in cumulative probability in Fig. 1b right), with smaller ones being relatively unaffected. Consequently, a reduction in sEPSC amplitude may not necessarily result in a significant decrease in frequency since their values likely remain above the detection threshold of 3 pA. This could explain the lack of a significant decrease in average inter-interval event of sEPSCs (as depicted in Fig. 1b left).

      If the question is whether we should see the same parameters affected by the genetic manipulation in both sEPSC and mEPSC, then another critical consideration is the involvement of the releasable pool in mEPSCs versus sEPSCs. Current knowledge suggests that activity-dependent and -independent release may not necessarily engage the same pool of vesicles or target the same postsynaptic sites. This concept has been extensively explored (reviewed in Kavalali, 2015). Consequently, while we may have traditionally interpreted activity-dependent and -independent data assuming they utilize the same pool, this is no longer accurate. The current discussion in the field revolves around understanding the mechanisms underlying such phenomena. Therefore, comparisons between sEPSCs and mEPSCs may not yield conclusive data but rather speculative interpretations. For a rigorous analysis, particularly in this context involving thousands of events, it is essential to assess these data sets (mEPSCs vs sEPSCs) separately and provide cumulative probability curves. This approach allows for a more comprehensive understanding of the underlying distributions and helps to elucidate any potential differences between the two types of events. We will rephrase the text, and as mentioned above, add additional data, to better reflect these considerations.

      (2) Another significant concern is the quality of synapse counting experiments. The authors attempted to colocalize pre- and postsynaptic markers Vglut1 and PSD95 with PV labelling. However, several issues arise. Firstly, the PV labelling seems confined to soma regions, with no visible dendrites. Given that the perisomatic region only receives a minor fraction of excitatory synapses, this labeling might not accurately represent the input coverage of PV cells. Secondly, the resolution of the images is insufficient to support clear colocalization of the synaptic markers. Thirdly, the staining patterns are peculiar, with PSD95 puncta appearing within regions clearly identified as somas by Vglut1, hinting at possible intracellular signals. Furthermore, PSD95 seems to delineate potential apical dendrites of pyramidal cells passing through the region, yet Vglut1+ partners are absent in these segments, which are expected to be the marker of these synapses here. Additionally, the cumulative density of Vglut2 and Vglut1 puncta exceeds expectations, and it's surprising that subcortical fibers labeled by Vglut2 are comparable in number to intracortical Vglut1+ axon terminals. Ideally, N(Vglut1)+N(Vglut2) should be equal or less than N(PSD95), but this is not the case here. Consequently, these results cannot be considered reliable due to these issues.

      We apologize, as it appears that the images we provided have caused confusion. The selected images represent a single focal plane of a confocal stack, which was visually centered on the PV cell somata. We chose just one confocal plane because we thought it showed more clearly the apposition of presynaptic and postsynaptic immunolabeling around the somata. In the revised version of the manuscript, we will provide higher magnification images, which will clearly show how we identified and selected the region of interest for the quantification of colocalized synaptic markers. In our confocal stacks, we can also identify PV immunolabeled dendrites and colocalized vGlut1/PSD95 or vGlut2/PSD95 puncta on them; but these do not appear in the selected images because, as explained, only one focal plane, centered on the PV cell somata, was shown.

      We acknowledge the reviewer's point that in PV+ cells the majority of excitatory inputs are formed onto dendrites; however, we focused on the somatic excitatory inputs to PV cells, because despite their lower number, they produce much stronger depolarization in PV neurons than dendritic excitatory inputs (Hu et al., 2010; Norenberg et al., 2010). Further, quantification of perisomatic putative excitatory synapses is more reliable since by using PV immunostaining, we can visualize the soma and larger primary dendrites, but smaller, higher order dendrites are not be always detectable. Of note, PV positive somata receive more excitatory synapses than SST positive and pyramidal neuron somata as found by electron microscopy studies in the visual cortex (Hwang et al., 2021; Elabbady et al., 2024).

      Regarding the comment on the density of vGlut1 and vGlut2 puncta, the reason that the numbers appear high and similar between the two markers is because we present normalized data (cHet normalized to their control values for each set of immunolabelling) to clearly represent the differences between genotypes. This information is present in the legends but we apologize for not clearly explaining it the methods section. We will provide a more detailed explanation of our methods in the revised manuscript.

      Briefly, immunostained sections were imaged using a Leica SP8-STED confocal microscope, with a 63x (NA 1.4) at 1024 X 1024, z-step =0.3 μm, stack size of ~15 μm. Images were acquired from the auditory cortex from at least 3 coronal sections per animal. All the confocal parameters were maintained constant throughout the acquisition of an experiment. All images shown in the figures are from a single confocal plane. To quantify the number of vGlut1/PSD95 or vGlut2/PSD95 putative synapses, images were exported as TIFF files and analyzed using Fiji (Image J) software. We first manually outlined the profile of each PV cell soma (identified by PV immunolabeling). At least 4 innervated somata were selected in each confocal stack. We then used a series of custom-made macros in Fiji as previously described (Chehrazi et al, 2023). After subtracting background (rolling value = 10) and Gaussian blur (σ value = 2) filters, the stacks were binarized and vGlut1/PSD95 or vGlut2/PSD95 puncta were independently identified around the perimeter of a targeted soma in the focal plane with the highest soma circumference. Puncta were quantified after filtering particles for size (included between 0-2μm2) and circularity (included between 0-1). Data quantification was done by investigators blind to the genotype, and presented as normalized data over control values for each experiment.

      (3) One observation from the minimal stimulation experiment was concluded by an unsupported statement. Namely, the change in the onset delay cannot be attributed to a deficit in the recruitment of PV+ cells, but it may suggest a change in the excitability of TC axons.

      We agree with the reviewer, please see answer to point below.

      (‎4) The conclusions drawn from the stimulation experiments are also disconnected from the actual data. To make conclusions about TC release, the authors should have tested release probability using established methods, such as paired-pulse changes. Instead, the only observation here is a change in the AMPA components, which remained unexplained.

      We agree with the reviewer and we will perform additional paired-pulse ratio experiments at different intervals. We will rephrase the discussion and our interpretation and potential hypothesis according to the data obtained from this new experiment.

      (5) The sampling rate of CC recordings is insufficient ‎to resolve the temporal properties of the APs. Therefore, the phase-plots cannot be interpreted (e.g. axonal and somatic AP components are not clearly separated), raising questions about how AP threshold and peak were measured. The low sampling rate also masks the real derivative of the AP signals, making them apparently faster.

      We acknowledge that a higher sampling rate could offer a more detailed analysis of the action potential waveform. However, in the context of action potential analysis, it is acceptable to use sampling rates ranging from 10 kHz to 20 kHz (Golomb et al., 2007; Stevens et al., 2021; Zhang et al., 2023), which are considered adequate in the context of the present study. Indeed, our study aims to evaluate "relative" differences in the electrophysiological phenotype when comparing groups following a specific genetic manipulation. A sampling rate of 10 kHz is commonly employed in similar studies, including those conducted by our collaborator and co-author S. Kourrich (e.g., Kourrich and Thomas 2009, Kourrich et al., 2013), as well as others (Russo et al., 2013; Ünal et al., 2020; Chamberland et al., 2023).

      Despite being acquired at a lower sampling rate than potentially preferred by the reviewer, our data clearly demonstrate significant differences between the experimental groups, especially for parameters that are negligibly or not affected by the sampling rate used here (e.g., #spikes/input, RMP, Rin, Cm, Tm, AP amplitude, AP latency, AP rheobase).

      Regarding the phase-plots, we agree that a higher sampling rate would have resulted in smoother curves and more accurate absolute values. However, the differences were sufficiently pronounced to discern the relative variations in action potential waveforms between the experimental groups.

      A related issue is that the Methods section lacks essential details about the recording conditions, such as bridge balance and capacitance neutralization.

      We indeed performed bridge balance and neutralized the capacitance before starting every recording. We will add the information in the methods.

      (6) Interpretation issue: One of the most fundamental measures of cellular excitability, the rheobase, was differentially affected by cHet in BCshort and BCbroad. Yet, the authors concluded that the cHet-induced changes in the two subpopulations are common.

      We are uncertain if we have correctly interpreted the reviewer's comment. While we observed distinct impacts on the rheobase (Fig. 7d and 7i), there seems to be a common effect on the AP threshold (Fig. 7c and 7h), as interpreted and indicated in the final sentence of the results section for Figure 7 (page 12). If our response does not address the reviewer's comment adequately, we would greatly appreciate it if the reviewer could rephrase their feedback.

      (7) Design issue:

      The Kv1 blockade experiments are disconnected from the main manuscript. There is no experiment that shows the causal relationship between changes in DTX and cHet cells. It is only an interesting observation on AP halfwidth and threshold. However, how they affect rheobase, EPSCs, and other topics of the manuscript are not addressed in DTX experiments.

      Furthermore, Kv1 currents were never measured in this work, nor was the channel density tested. Thus, the DTX effects are not necessarily related to changes in PV cells, which can potentially generate controversies.

      While we acknowledge the reviewer's point that Kv1 currents and density weren't specifically tested, an important insight provided by Fig. 5 is the prolonged action potential latency. This delay is significantly influenced by slowly inactivating subthreshold potassium currents, namely the D-type K+ current. It's worth noting that D-type current is primarily mediated by members of the Kv1 family. The literature supports a role for Kv1.1-containing channels in modulating responses to near-threshold stimuli in PV cells (Wang et al., 1994; Goldberg et al., 2008; Zurita et al., 2018). However, we recognize that besides the Kv1 family, other families may also contribute to the observed changes.

      To address this concern, we will revise our interpretation. We will opt for the more accurate term "D-type K+ current" and only speculate about the involved channel family in the discussion. It is not our intention to open unnecessary controversy, but present the data we obtained. We believe this approach and rephrasing the discussion as proposed will prevent unnecessary controversy and instead foster fruitful discussions.

      (8) Writing issues:

      Abstract:

      The auditory system is not mentioned in the abstract.

      One statement in the abstract is unclear‎. What is meant by "targeting Kv1 family of voltage-gated potassium channels was sufficient..."? "Targeting" could refer to altered subcellular targeting of the channels, simple overexpression/deletion in the target cell population, or targeted mutation of the channel, etc. Only the final part of the Results revealed that none of the above, but these channels were blocked selectively.

      We agree with the reviewer and we will rephrase the abstract accordingly.

      Introduction:

      There is a contradiction in the introduction. The second paragraph describes in detail the distinct contribution of PV and SST n‎eurons to auditory processing. But at the end, the authors state that "relatively few reports on PV+ and SST+ cell-intrinsic and synaptic properties in adult auditory cortex". Please be more specific about the unknown properties.

      We agree with the reviewer and we will rephrase more specifically.

      (9) The introduction emphasizes the heterogeneity of PV neurons, which certainly influences the interpretation of the results of the current manuscript. However, the initial experiments did not consider this and handled all PV cell data as a pooled population.

      In the initial experiments, we handled all PV cell data together because we wanted to be rigorous and not make assumptions/biases on the different PV cells, which in later experiments we were to distinguish based on the intrinsic properties alone. We will make this point clear in the revised manuscript.

      (10) The interpretation of the results strongly depends on unpublished work, which potentially provide the physiological and behavioral contexts about the role of GABAergic neurons in SynGap-haploinsufficiency. The authors cite their own unpublished work, without explaining the specific findings and relation to this manuscript.

      We agree with the reviewer and apologize for the lack of clarity. Our unpublished work is in revision right now. We will provide more information and update references in the revised version of this manuscript.

      (11) The introduction of Scholl analysis ‎experiments mentions SOM staining, however, there is no such data about this cell type in the manuscript.

      We apologize for the error, we will change SOM with SST (SOM and SST are two commonly used acronyms for Somatostatin expressing interneurons).

      Reviewer #3 (Public Review):

      This paper compares the synaptic and membrane properties of two main subtypes of interneurons (PV+, SST+) in the auditory cortex of control mice vs mutants with Syngap1 haploinsufficiency. The authors find differences at both levels, although predominantly in PV+ cells. These results suggest that altered PV-interneuron functions in the auditory cortex may contribute to the network dysfunction observed in Syngap1 haploinsufficiency-related intellectual disability. The subject of the work is interesting, and most of the approach is direct and quantitative, which are major strengths. There are also some weaknesses that reduce its impact for a broader field.

      (1) The choice of mice with conditional (rather than global) haploinsufficiency makes the link between the findings and Syngap1 relatively easy to interpret, which is a strength. However, it also remains unclear whether an entire network with the same mutation at a global level (affecting also excitatory neurons) would react similarly.

      The reviewer raises an interesting and pertinent open question which we will address in the discussion of the revised paper.

      (2) There are some (apparent?) inconsistencies between the text and the figures. Although the authors appear to have used a sophisticated statistical analysis, some datasets in the illustrations do not seem to match the statistical results. For example, neither Fig 1g nor Fig 3f (eNMDA) reach significance despite large differences.

      We respectfully disagree, we do not think the text and figures are inconsistent. In the cited example, large apparent difference in mean values does not show significance due to the large variability in the data; further, we did not exclude any data points, because we wanted to be rigorous. In particular, for Fig.1g, statistical analysis shows a significant increase in the inter-mEPSC interval (*p=0.027, LMM) when all events are considered (cumulative probability plots), while there is no significant difference in the inter-mEPSCs interval for inter-cell mean comparison (inset, p=0.354, LMM). Inter-cell mean comparison does not show difference with Mann-Whitney test either (p=0.101, the data are not normally distributed, hence the choice of the Mann-Whitney test). For Fig. 3f (eNMDA), the higher mean value for the cHet versus the control is driven by two data points which are particularly high, while the other data points overlap with the control values. The Mann-Whitney test show also no statistical difference (p=0.174).

      In the manuscript, discussion of the data is based on the results of the LMM analysis, which takes in account both the number of cells and the numbers of mice from which these cells are recorded. We chose this statistical approach because it does not rely on the assumption that cells recorded from same mouse are independent variables. In the supplemental tables, we provided the results of the statistical analysis done with both LMM and the most commonly used Mann Whitney (for not normally distributed) or t-test (for normally distributed), for each data set.

      Also, the legend to Fig 9 indicates the presence of "a significant decrease in AP half-width from cHet in absence or presence of a-DTX", but the bar graph does not seem to show that.

      We apologize for our lack of clarity. In legend 9, we reported the statistical comparisons between 1) cHET mice in absence of a-DTX and control mice and 2) cHET mice in presence of a-DTX and control mice. We will rephrase result description and the legend of the figure to avoid confusion.

      (3) The authors mention that the lack of differences in synaptic current kinetics is evidence against a change in subunit composition. However, in some Figures, for example, 3a, the kinetics of the recorded currents appear dramatically different. It would be important to know and compare the values of the series resistance between control and mutant animals.

      We agree with the reviewer that there appears to be a qualitative difference in eNMDA decay between conditions, although quantified eNMDA decay itself is similar between groups. We have used a cutoff of 15 % for the series resistance (Rs), which is significantly more stringent as compared to the cutoff typically used in electrophysiology, which are for the vast majority between 20 and 30%. To answer this concern, we re-examined the Rs, we compared Rs between groups and found no difference for Rs in eAMPA (13.2±0.5 in WT n=16 cells, 7 mice vs 13.7±0.3 in cHet n=14 cells, 7 mice, p=0.432 LMM) and eNMDA (12.7±0.7 in WT n=6 cells, 3 mice vs 13.8±0.7 in cHet n=6 cells, 5 mice, p=0.231, LMM). Thus, the apparent qualitative difference in eNMDA decay stems from inter-cell variability rather than inter-group differences. Notably, this discrepancy between the trace (Fig. 3a) and the data (Fig. 3f, right) is largely due to inter-cell variability, particularly in eNMDA, where a higher but non-significant decay rate is driven by a couple of very high values (Fig. 3f, right). In the revised manuscript, we will show traces that better represent our findings.

      (4) A significant unexplained variability is present in several datasets. For example, the AP threshold for PV+ includes points between -50-40 mV, but also values at around -20/-15 mV, which seems too depolarized to generate healthy APs (Fig 5c, Fig7c).

      We acknowledge the variability in AP threshold data, with some APs appearing too depolarized to generate healthy spikes. However, we meticulously examined each AP that spiked at these depolarized thresholds and found that other intrinsic properties (such as Rin, Vrest, AP overshoot, etc.) all indicate that these cells are healthy. Therefore, to maintain objectivity and provide unbiased data to the community, we opted to include them in our analysis. It's worth noting that similar variability has been observed in other studies (Bengtsson Gonzales et al., 2020; Bertero et al., 2020).

      Further, we conducted a significance test on AP threshold excluding these potentially unhealthy cells and found that the significant differences persist. After removing two outliers from the cHet group with values of -16.5 and 20.6 mV, we obtain: -42.6±1.01 mV in control, n=33, 15 mice vs -36.2±1.1 mV in cHet, n=38 cells, 17 mice, ***p<0.001, LMM. Thus, whether these cells are included or excluded, our interpretations and conclusions remain unchanged.

      We would like to clarify that these data have not been corrected with the junction potential. We will add this info in the revised version.

      (5) I am unclear as to how the authors quantified colocalization between VGluts and PSD95 at the low magnification shown in Supplementary Figure 2.

      We apologize for our lack of clarity. Although the analysis was done at high resolution, the figures were focused on showing multiple PV somata receiving excitatory inputs. We will add higher magnification figures and more detailed information in the methods of the revised version. Please also see our response to reviewer #2.

      (6) The authors claim that "cHet SST+ cells showed no significant changes in active and passive membrane properties", but this claim would seem to be directly refused by the data of Fig 8f. In the absence of changes in either active or passive membrane properties shouldn't the current/#AP plot remain unchanged?

      While we acknowledge the theoretical expectation that changes in intrinsic parameters should correlate with alterations in neuronal firing, the absence of differences in the parameters analyzed in this study should not overshadow the clear and significant decrease in firing rate observed in cHet SST+ cells. This decrease serves as a compelling indication of reduced intrinsic neuronal excitability. It's certainly possible that other intrinsic factors, not assessed in this study, may have contributed to this effect. However, exploring these mechanisms is beyond the scope of our current investigation. We will rephrase the discussion and add this limitation of our study in the revised version.

      (7) The plots used for the determination of AP threshold (Figs 5c, 7c, and 7h) suggest that the frequency of acquisition of current-clamp signals may not have been sufficient, this value is not included in the Methods section.

      This study utilized a sampling rate of 10 kHz, which is a standard rate for action potential analysis in the present context. We will describe more extensively the technical details in the method section of the revised manuscript we are preparing. While we acknowledge that a higher sampling rate could have enhanced the clarity of the phase plot, our recording conditions, as detailed in our response to Rev#2/comment#5, were suitable for the objectives of this study.

      Reference list

      Bengtsson Gonzales C, Hunt S, Munoz-Manchado AB, McBain CJ, Hjerling-Leffler J (2020) Intrinsic electrophysiological properties predict variability in morphology and connectivity among striatal Parvalbumin-expressing Pthlh-cells. Scientific Reports, 10, 15680. https://doi.org/10.1038/s41598-020-72588-1

      Bertero A, Zurita H, Normandin M, Apicella AJ (2020) Auditory long-range parvalbumin cortico-striatal neurons. Frontiers in Neural Circuits, 14, 45. http://doi.org/ 10.3389/fncir.2020.00045

      Chamberland S, Nebet ER, Valero M, Hanani M, Egger R, Larsen SB, Eyring KW, Buzsáki G, Tsien RW (2023) Brief synaptic inhibition persistently interrupts firing of fast-spiking interneurons. Neuron, 111, 1264–1281. http://doi.org/10.1016/j.neuron.2023.01.017

      Chehrazi P, Lee KKY, Lavertu-Jolin M, Abbasnejad Z, Carreño-Muñoz MI, Chattopadhyaya B, Di Cristo G (2023). The p75 Neurotrophin Receptor in Preadolescent Prefrontal Parvalbumin Interneurons Promotes Cognitive Flexibility in Adult Mice. Biol Psychiatry, 94, 310-321. doi: 10.1016/j.biopsych.2023.04.019.

      Elabbady L, Seshamani S, Mu S, Mahalingam G, Schneider-Mizell C, Bodor AL, Bae JA, Brittain D, Buchanan J, Bumbarger DJ, Castro MA, Dorkenwald S, Halageri A, Jia Z, Jordan C, Kapner D, Kemnitz N, Kinn S, Lee K, Li K…Collman F (2024) Perisomatic features enable efficient and dataset wide cell-type classifications across large-scale electron microscopy volumes. bioRxiv, https://doi.org/10.1101/2022.07.20.499976

      Goldberg EM, Clark BD, Zagha E, Nahmani M, Erisir A, Rudy B (2008) K+ Channels at the axon initial segment dampen near-threshold excitability of neocortical fast-spiking GABAergic interneurons. Neuron, 58, 387–400. https://doi.org/10.1016/j.neuron.2008.03.003

      Golomb D, Donner K, Shacham L, Shlosberg D, Amitai Y, Hansel D. (2007). Mechanisms of firing patterns in fast-spiking cortical interneurons. PLoS Computational Biology, 38, e156. http://doi.org/10.1371/journal.pcbi.0030156

      Hu H, Martina M, Jonas P (2010). Dendritic mechanisms underlying rapid synaptic activation of fast-spiking hippocampal interneurons. Science, 327, 52–58. http://doi.org/10.1126/science.1177876

      Hwang YS, Maclachlan C, Blanc J, Dubois A, Petersen CH, Knott G, Lee SH (2021). 3D ultrastructure of synaptic inputs to distinct gabaergic neurons in the mouse primary visual cortex. Cerebral Cortex, 31, 2610–2624. http://doi.org/10.1093/cercor/bhaa378

      Kavalali E (2015) The mechanisms and functions of spontaneous neurotransmitter release. Nature Reviews Neuroscience, 16, 5–16. https://doi.org/10.1038/nrn3875

      Kourrich S, Thomas MJ (2009) Similar neurons, opposite adaptations: psychostimulant experience differentially alters firing properties in accumbens core versus shell. Journal of Neuroscience, 29, 12275-12283. http://doi.org:10.1523/JNEUROSCI.3028-09.2009

      Kourrich S, Hayashi T, Chuang JY, Tsai SY, Su TP, Bonci A (2013) Dynamic interaction between sigma-1 receptor and Kv1.2 shapes neuronal and behavioral responses to cocaine. Cell, 152, 236–247. http://doi.org/10.1016/j.cell.2012.12.004

      Norenberg A, Hu H, Vida I, Bartos M, Jonas P (2010) Distinct nonuniform cable properties optimize rapid and efficient activation of fast-spiking GABAergic interneurons. Proceedings of the National Academy of Sciences, 107, 894–9. http://doi.org/10.1073/pnas.0910716107

      Stevens SR, Longley CM, Ogawa Y, Teliska LH, Arumanayagam AS, Nair S, Oses-Prieto JA, Burlingame AL, Cykowski MD, Xue M, Rasband MN (2021) Ankyrin-R regulates fast-spiking interneuron excitability through perineuronal nets and Kv3.1b K+ channels. Elife, 10, e66491. http://doi.org/10.7554/eLife.66491

      Russo G, Nieus TR, Maggi S, Taverna S (2013) Dynamics of action potential firing in electrically connected striatal fast-spiking interneurons. Frontiers in Cellular Neuroscience, 7, 209. https://doi.org/10.3389/fncel.2013.00209

      Ünal CT, Ünal B, Bolton MM (2020) Low-threshold spiking interneurons perform feedback inhibition in the lateral amygdala. Brain Structure and Function, 225, 909–923. http://doi.org/10.1007/s00429-020-02051-4

      Wang H, Kunkel DD, Schwartzkroin PA, Tempel BL (1994) Localization of Kv1.1 and Kv1.2, two K channel proteins, to synaptic terminals, somata, and dendrites in the mouse brain. The Journal of Neuroscience, 14, 4588-4599. https://doi.org/10.1523/JNEUROSCI.14-08-04588.1994

      Zhang YZ, Sapantzi S, Lin A, Doelfel SR, Connors BW, Theyel BB (2023) Activity-dependent ectopic action potentials in regular-spiking neurons of the neocortex. Frontiers in Cellular Neuroscience, 17. https://doi.org/10.3389/fncel.2023.1267687

      Zurita H, Feyen PLC, Apicella AJ (2018) Layer 5 callosal parvalbumin-expressing neurons: a distinct functional group of GABAergic neurons. Frontiers in Cellular Neuroscience, 12, 53. https://doi.org/10.3389/fncel.2018.00053

    1. For example, the person in charge of the donations seemed to be overwhelmed and could not always answer questions we may have regarding delivery. She always sounded frustrated whenever she was asked questions because she indicated no one informed her of what and where things were going. I think if they provided her with more information, she would feel more comfortable answering questions as well as feeling more motivated to seek out the answers.

      insightful observation

    1. Author response:

      The following is the authors’ response to the original reviews.

      Thank you very much for the careful and positive reviews of our manuscript. We have addressed each comment in the attached revised manuscript. We describe the modifications below. To avoid confusion, we've changed supplementary figure and table captions to start with "Supplement Figure" and "Supplementary Table," instead of "Figure" and "Table."

      We have modified/added:

      ● Supplementary Table S1: AUC scores for the top 10 frequent epitope types (pathogens) in the testing set of epitope split.

      ● Supplementary Table S5: AUCs of TCR-epitope binding affinity prediction models with BLOSUM62 to embed epitope sequences.

      ● Supplementary Table S6: AUCs of TCR-epitope binding affinity prediction models trained on catELMo TCR embeddings and random-initialized epitope embeddings.

      ● Supplementary Table S7: AUCs of TCR-epitope binding affinity prediction models trained on catELMo and BLOSUM62 embeddings.

      ● Supplementary Figure 4: TCR clustering performance for the top 34 abundant epitopes representing 70.55% of TCRs in our collected databases.

      ● Section Discussion.

      ● Section 4.1 Data: TCR-epitope pairs for binding affinity prediction.

      ● Section 4.4.2 Epitope-specific TCR clustering.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, the authors described a computational method catELMo for embedding TCR CDR3 sequences into numeric vectors using a deep-learning-based approach, ELMo. The authors applied catELMo to two applications: supervised TCR-epitope binding affinity prediction and unsupervised epitope-specific TCR clustering. In both applications, the authors showed that catELMo generated significantly better binding prediction and clustering performance than other established TCR embedding methods. However, there are a few major concerns that need to be addressed.

      (1) There are other TCR CDR3 embedding methods in addition to TCRBert. The authors may consider incorporating a few more methods in the evaluation, such as TESSA (PMCID: PMC7799492), DeepTCR (PMCID: PMC7952906) and the embedding method in ATM-TCR (reference 10 in the manuscript). TESSA is also the embedding method in pMTnet, which is another TCR-epitope binding prediction method and is the reference 12 mentioned in this manuscript.

      TESSA is designed for characterizing TCR repertoires, so we initially excluded it from the comparison. Our focus was on models developed specifically for amino acid embedding rather than TCR repertoire characterization. However, to address the reviewer's inquiry, we conducted further evaluations. Since both TESSA and DeepTCR used autoencoder-based models to embed TCR sequences, we selected one used in TESSA for evaluation in our downstream prediction task, conducting ten trials in total. It achieved an average AUC of 75.69 in TCR split and 73.3 in epitope split. Notably, catELMo significantly outperformed such performance with an AUC of 96.04 in TCR split and 94.10 in epitope split.

      Regarding the embedding method in ATM-TCR, it simply uses BLOSUM as an embedding matrix which we have already compared in Section 2.1. Furthermore, we have provided the comparison results between our prediction model trained on catELMo embeddings with the state-of-the-art prediction models such as netTCR and ATM-TCR in Table 6 of the Discussion section.

      (2) The TCR training data for catELMo is obtained from ImmunoSEQ platform, including SARS-CoV2, EBV, CMV, and other disease samples. Meanwhile, antigens related to these diseases and their associated TCRs are extensively annotated in databases VDJdb, IEDB and McPAS-TCR. The authors then utilized the curated TCR-epitope pairs from these databases to conduct the evaluations for eptitope binding prediction and TCR clustering. Therefore, the training data for TCR embedding may already be implicitly tuned for better representations of the TCRs used in the evaluations. This seems to be true based on Table 4, as BERT-Base-TCR outperformed TCRBert. Could catELMo be trained on PIRD as TCRBert to demonstrate catELMo's embedding for TCRs targeting unseen diseases/epitopes?

      We would like to note that catELMo was trained exclusively on TCR sequences in an unsupervised manner, which means it has never been exposed to antigen information. We also ensured that the TCRs used in catELMo's training did not overlap with our downstream prediction data. Please refer to the section 4.1 Data where we explicitly stated, “We note that it includes no identical TCR sequences with the TCRs used for training the embedding models.”. Moreover, the performance gap (~1%) between BERT-Base-TCR and TCRBert, as observed in Table 4, is relatively small, especially when compared to the performance difference (>16%) between catELMo and TCRBert.

      To further address this concern, we conducted experiments using the same number of TCRs, 4,173,895 in total, sourced exclusively from healthy ImmunoSeq repertoires. This alternative catELMo model demonstrated a similar prediction performance (based on 10 trials) to the one reported in our paper, with an average AUC of 96.35% in TCR split and an average AUC of 94.03% in epitope split.

      We opted not to train catELMo on the PIRD dataset for several reasons. First, approximately 7.8% of the sequences in PIRD also appear in our downstream prediction data, which could be a potential source of bias. Furthermore, PIRD encompasses sequences related to diseases such as Tuberculosis, HIV, CMV, among others, which the reviewer is concerned about.

      (3) In the application of TCR-epitope binding prediction, the authors mentioned that the model for embedding epitope sequences was catElMo, but how about for other methods, such as TCRBert? Do the other methods also use catELMo-embedded epitope sequences as part of the binding prediction model, or use their own model to embed the epitope sequences? Since the manuscript focuses on TCR embedding, it would be nice for other methods to be evaluated on the same epitope embedding (maybe adjusted to the same embedded vector length).

      Furthermore, the authors found that catELMo requires less training data to achieve better performance. So one would think the other methods could not learn a reasonable epitope embedding with limited epitope data, and catELMo's better performance in binding prediction is mainly due to better epitope representation.

      Review 1 and 3 have raised similar concerns regarding the epitope embedding approach employed in our binding affinity prediction models. We address both comments together on page 6 where we discuss the epitope embedding strategies in detail.

      (4) In the epitope binding prediction evaluation, the authors generated the test data using TCR-epitope pairs from VDJdb, IEDB, McPAS, which may be dominated by epitopes from CMV. Could the authors show accuracy categorized by epitope types, i.e. the accuracy for TCR-CMV pair and accuracy for TCR-SARs-CoV2 separately?

      The categorized AUC scores have been added in Supplementary Table 7. We observed significant performance boosts from catELMo compared with other embedding models.

      (5) In the unsupervised TCR clustering evaluation, since GIANA and TCRdist direct outputs the clustering result, so they should not be affected by hierarchical clusters. Why did the curves of GIANA and TCRdist change in Figure 4 when relaxing the hierarchical clustering threshold?

      For fair comparisons, we performed GIANA and TCRdist with hierarchical clustering instead of the nearest neighbor search. We have clarified it in the revised manuscript as follows.

      “Both methods are developed on the BLOSUM62 matrix and apply nearest neighbor search to cluster TCR sequences. GIANA used the CDR3 of TCRβ chain and V gene, while TCRdist predominantly experimented with CDR1, CDR2, and CDR3 from both TCRα and TCRβ chains. For fair comparisons, we perform GIANA and TCRdist only on CDR3 β chains and with hierarchical clustering instead of the nearest neighbor search.”

      (6 & 7) In the unsupervised TCR clustering evaluation, the authors examined the TCR related to the top eight epitopes. However, there are much more epitopes curated in VDJdb, IEDB and McPAS-TCR. In real application, the potential epitopes is also more complex than just eight epitopes. Could the authors evaluate the clustering result using all the TCR data from the databases? In addition to NMI, it is important to know how specific each TCR cluster is. Could the authors add the fraction of pure clusters in the results? Pure cluster means all the TCRs in the cluster are binding to the same epitope, and is a metric used in the method GIANA.

      We would like to note that there is a significant disparity in TCR binding frequencies across different epitopes in current databases. For instance, the most abundant epitope (KLGGALQAK) has approximately 13k TCRs binding to it, while 836 out of 982 epitopes are associated with fewer than 100 TCRs in our dataset. Furthermore, there are 9347 TCRs having the ability to bind multiple epitopes. In order to robustly evaluate the clustering performance, we originally selected the top eight frequent epitopes from McPAS and removed TCRs binding multiple epitopes to create a more balanced dataset.

      We acknowledge that the real-world scenario is more complex than just eight epitopes. Therefore, we conducted clustering experiments using the top most abundant epitopes whose combined cognate TCRs make up at least 70% of TCRs across three databases (34 epitopes). This is illustrated in Supplementary Figure 5. Furthermore, we extended our analysis by clustering all TCRs after filtering out those that bind to multiple epitopes, resulting in 782 unique epitopes. We found that catELMo achieved the 3rd and 2nd best performance in NMI and Purity, respectively (see Table below). These are aligned with our previous observations of the eight epitopes.

      Author response table 1.

      Reviewer #2 (Public Review):

      In the manuscript, the authors highlighted the importance of T-cell receptor (TCR) analysis and the lack of amino acid embedding methods specific to this domain. The authors proposed a novel bi-directional context-aware amino acid embedding method, catELMo, adapted from ELMo (Embeddings from Language Models), specifically designed for TCR analysis. The model is trained on TCR sequences from seven projects in the ImmunoSEQ database, instead of the generic protein sequences. They assessed the effectiveness of the proposed method in both TCR-epitope binding affinity prediction, a supervised task, and the unsupervised TCR clustering task. The results demonstrate significant performance improvements compared to existing embedding models. The authors also aimed to provide and discuss their observations on embedding model design for TCR analysis: 1) Models specifically trained on TCR sequences have better performance than models trained on general protein sequences for the TCR-related tasks; and 2) The proposed ELMo-based method outperforms TCR embedding models with BERT-based architecture. The authors also provided a comprehensive introduction and investigation of existing amino acid embedding methods. Overall, the paper is well-written and well-organized.

      The work has originality and has potential prospects for immune response analysis and immunotherapy exploration. TCR-epitope pair binding plays a significant role in T cell regulation. Accurate prediction and analysis of TCR sequences are crucial for comprehending the biological foundations of binding mechanisms and advancing immunotherapy approaches. The proposed embedding method presents an efficient context-aware mathematical representation for TCR sequences, enabling the capture and analysis of their structural and functional characteristics. This method serves as a valuable tool for various downstream analyses and is essential for a wide range of applications. Thank you.

      Reviewer #3 (Public Review):

      Here, the authors trained catElMo, a new context-aware embedding model for TCRβ CDR3 amino acid sequences for TCR-epitope specificity and clustering tasks. This method benchmarked existing work in protein and TCR language models and investigated the role that model architecture plays in the prediction performance. The major strength of this paper is comprehensively evaluating common model architectures used, which is useful for practitioners in the field. However, some key details were missing to assess whether the benchmarking study is a fair comparison between different architectures. Major comments are as follows:

      • It is not clear why epitope sequences were also embedded using catELMo for the binding prediction task. Because catELMO is trained on TCRβ CDR3 sequences, it's not clear what benefit would come from this embedding. Were the other embedding models under comparison also applied to both the TCR and epitope sequences? It may be a fairer comparison if a single method is used to encode epitope sequence for all models under comparison, so that the performance reflects the quality of the TCR embedding only.

      In our study, we indeed used the same embedding model for both TCRs and epitopes in each prediction model, ensuring a consistent approach throughout.

      Recognizing the importance of evaluating the impact of epitope embeddings, we conducted experiments in which we used BLOSUM62 matrix to embed epitope sequences for all models. The results (Supplementary Table 5) are well aligned with the performance reported in our paper. This suggests that epitope embedding may not play as critical a role as TCR embedding in the prediction tasks. To further validate this point, we conducted two additional experiments.

      Firstly, we used catELMo to embed TCRs while employing randomly initialized embedding matrices with trainable parameters for epitope sequences. It yielded similar prediction performance as when catELMo was used for both TCR and epitope embedding (Supplementary Table 6). Secondly, we utilized BLOSUM62 to embed TCRs but employed catELMo for epitope sequence embedding, resulting in performance comparable to using BLOSUM62 for both TCRs and epitopes (Supplementary Table 4). These experiment results confirmed the limited impact of epitope embedding on downstream performance.

      We conjecture that these results may be attributed to the significant disparity in data scale between TCRs (~290k) and epitopes (less than 1k). Moreover, TCRs tend to exhibit high similarity, whereas epitopes display greater distinctiveness from one another. These features of TCRs require robust embeddings to facilitate effective separation and improve downstream performance, while epitope embedding primarily serves as a categorical encoding.

      We have included a detailed discussion of these findings in the revised manuscript to provide a comprehensive understanding of the role of epitope embeddings in TCR binding prediction.

      • The tSNE visualization in Figure 3 is helpful. It makes sense that the last hidden layer features separate well by binding labels for the better performing models. However, it would be useful to know if positive and negative TCRs for each epitope group also separate well in the original TCR embedding space. In other words, how much separation between these groups is due to the neural network vs just the embedding?

      It is important to note that we used the same downstream prediction model, a simple three-linear-layer network, for all the discussed embedding methods. We believe that the separation observed in the t-SNE visualization effectively reflects the ability of our embedding model. Also, we would like to mention that it can be hard to see a clear distinction between positive and negative TCRs in the original embedding space because embedding models were not trained on positive/negative labels. Please refer to the t-SNE of the original TCR embeddings below.

      Author response image 1.

      • To generate negative samples, the author randomly paired TCRs from healthy subjects to different epitopes. This could produce issues with false negatives if the epitopes used are common. Is there an estimate for how frequently there might be false negatives for those commonly occurring epitopes that most populations might also have been exposed to? Could there be a potential batch effect for the negative sampled TCR that confounds with the performance evaluation?

      Thank you for bringing this valid and interesting point up. Generating negative samples is non-trivial since only a limited number of non-binding TCR-pairs are publicly available and experimentally validating non-binding pairs is costly [1]. Standard practices for generating negative pairs are (1) paring epitopes with healthy TCRs [2, 3], and (2) randomly shuffling existing TCR-epitope pairs [4,5]. We used both approaches (the former included in the main results, and the latter in the discussion). In both scenarios, catELMo embeddings consistently demonstrated superior performance.

      We acknowledge the possibility of false negatives due to the finite-sized TCR database from which we randomly selected TCRs, however, we believe that the likelihood of such occurrences is low. Given the vast diversity of human TCR clonotypes, which can exceed 10^15[6], the chance of randomly selecting a TCR that specifically recognizes a target epitope is relatively small.

      In order to investigate the batch effect, we generated new negative pairs using different seeds and observed consistent prediction performance across these variations. However, we agree that there could still be a potential batch effect for the negative samples due to potential data bias.

      We have discussed the limitation of generative negative samples in the revised manuscript.

      • Most of the models being compared were trained on general proteins rather than TCR sequences. This makes their comparison to catELMO questionable since it's not clear if the improvement is due to the training data or architecture. The authors partially addressed this with BERT-based models in section 2.4. This concern would be more fully addressed if the authors also trained the Doc2vec model (Yang et al, Figure 2) on TCR sequences as baseline models instead of using the original models trained on general protein sequences. This would make clear the strength of context-aware embeddings if the performance is worse than catElmo and BERT.

      We agree it is important to distinguish between the effects of training data and architecture on model performance.

      In Section 2.4, as the reviewer mentioned, we compared catELMo with BERT-based models trained on the same TCR repertoire data, demonstrating that architecture plays a significant role in improving performance. Furthermore, in Section 2.5, we compared catELMo-shallow with SeqVec, which share the same architecture but were trained on different data, highlighting the importance of data on the model performance.

      To further address the reviewer's concern, we trained a Doc2Vec model on the TCR sequences that have been used for catELMo training. We observed significantly lower prediction performance compared to catELMo, with an average AUC of 50.24% in TCR split and an average AUC of 51.02% in epitope split, making the strength of context-aware embeddings clear.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) It is known that TRB CDR3, the CDR1, CDR2 on TRBV gene and the TCR alpha chain also contribute to epitope recognition, but were not modeled in catELMo. It would be nice for the authors to add this as a current limitation for catELMo in the Discussion section.

      We have discussed the limitation in the revised manuscript.

      “Our study focuses on modeling the TCRβ chain CDR3 region, which is known as the primary determinant of epitope binding. Other regions, such as CDR1 and CDR2 on the TRB V gene, along with the TCRα chain, may also contribute to specificity in antigen recognition. However, a limited number of available samples for those additional features can be a challenge for training embedding models. Future work may explore strategies to incorporate these regions while mitigating the challenges of working with limited samples.”

      (2) I tried to follow the instructions to train a binding affinity prediction model for TCR-epitope pairs, however, the cachetools=5.3.0 seems could not be found when running "pip install -r requirements.txt" in the conda environment bap. Is this cachetools version supported after Python 3.7 so the Python 3.6.13 suggested on the GitHub repo might not work?

      This has been fixed. We have updated the README.md on our github page.

      Reviewer #2 (Recommendations For The Authors):

      The article is well-constructed and well-written, and the analysis is comprehensive.

      The comments for minor issues that I have are as follows:

      (1) In the Methods section, it will be clearer if the authors interpret more on how the standard deviation is calculated in all tables. How to define the '10 trials'? Are they based on different random training and test set splits?

      ‘10 trials' refers to the process of splitting the dataset into training, validation, and testing sets using different seeds for each trial. Different trials have different training, validation, and testing sets. For each trial, we trained a prediction model on its training set and measured performance on its testing set. The standard deviation was calculated from the 10 measurements, estimating model performance variation across different random splits of the data.

      (2) The format of AUCs and the improvement of AUCs need to be consistent, i.e., with the percent sign.

      We have updated the format of AUCs.

      Reviewer #3 (Recommendations For The Authors):

      In addition to the recommendations in the public review, we had the following more minor questions and recommendations:

      • Could you provide some more background on the data, such as overlaps between the databases, and how the training and validation split was performed between the three databases? Also summary statistics on the length of TCR and epitope sequence data would be helpful.

      We have provided more details about data in our revision.

      • Could you comment on the runtime to train and embed using the catELMo and BERT models?

      Our training data is TCR sequences with relatively short lengths (averaging less than 20 amino acid residues). Such characteristic significantly reduces the computational resources required compared to training large-scale language models on extensive text corpora. Leveraging standard machines equipped with two GeForce RTX 2080 GPUs, we were able to complete the training tasks within a matter of days. After training, embedding one sequence can be accomplished in a matter of seconds.

      • Typos and wording:

      • Table 1 first row of "source": "immunoSEQ" instead of "immuneSEQ"

      This has been corrected.

      • L23 of abstract "negates the need of complex deep neural network architecture" is a little confusing because ELMo itself is a deep neural network architecture. Perhaps be more specific and add that the need is for downstream tasks.

      We have made it more specific in our abstract.

      “...negates the need for complex deep neural network architecture in downstream tasks.”

      References

      (1) Montemurro, Alessandro, et al. "NetTCR-2.0 enables accurate prediction of TCR-peptide binding by using paired TCRα and β sequence data." Communications biology 4.1 (2021): 1060.

      (2) Jurtz, Vanessa Isabell, et al. "NetTCR: sequence-based prediction of TCR binding to peptide-MHC complexes using convolutional neural networks." BioRxiv (2018): 433706.

      (3) Gielis, Sofie, et al. "Detection of enriched T cell epitope specificity in full T cell receptor sequence repertoires." Frontiers in immunology 10 (2019): 2820.

      (4) Cai, Michael, et al. "ATM-TCR: TCR-epitope binding affinity prediction using a multi-head self-attention model." Frontiers in Immunology 13 (2022): 893247.

      (5) Weber, Anna, et al. "TITAN: T-cell receptor specificity prediction with bimodal attention networks." Bioinformatics 37 (2021): i237-i244.

      (6) Lythe, Grant, et al. "How many TCR clonotypes does a body maintain?." Journal of theoretical biology 389 (2016): 214-224.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      General response of the authors to the editor and the reviewers:

      We thank the reviewers for their feedback, input and questions as these have helped us to (hopefully) improve the manuscript. We have rewritten several sections of the manuscript, moved methodological descriptions from the Results to the Methods section, and added imaging data for two cytoskeletal proteins, Shot and Cofilin/Twinstar, which confirm the predicted differential DV expression. Because the changes to the text were extensive, we did not mark them by track changes (the manuscript would have been illegible), but would be happy to provide an additional version that includes the tracked changes.

      We provide below the point-by-point response to each question and comment made by the reviewers. Our text is in blue.



      __Reviewer #1 __

      __Evidence, reproducibility and clarity __

      __Summary __

      This manuscript investigated changes in the proteome and phosphoproteome during dorsovental axis specification in the Drosophila embryo. To model the three regions in the embryo that are relevant for DV axis development, the authors used specific mutations to enrich for a single type of cells (ventral, lateral, or dorsal). The detected proteins and phosphopeptides were clustered according to the region of expression. There were differences between the protein and corresponding phosphopeptide abundance, suggesting that phosphorylation is a regulatory modification in DV axis establishment. Two different mutations that both result in a ventralized phenotype were found to change marker protein expression in different ways. Using inhibition of microtubule polymerization, this study also investigated the role of microtubules in epithelial folding.

      __Major comments __

      1. Generally, there is a lack of significance testing throughout the manuscript. Simply reporting fold changes can be misleading, if these changes are not significant. Examples:

      2. Rigor of the proteomics evidence showing changes for the expected markers is insufficient because no statistical evaluation is provided. Specifically, in Fig. 1D and Suppl Fig 2: are the fold changes statistically significant?

      3. Data in Fig. 4F, 5F need to be assessed for significance. There are other instances in the manuscript where significance should be tested.

      We did ANOVA testing for all proteome and phosphoproteome data, and the outcome of these analyses is reported in Supplementary Tables 2 and 3. We have added references to significance throughout, wherever possible and relevant and have included a table that summarizes all p values for all comparisons in all of the figures (Supplementary Table 2). However, note that we do our clustering independent of statistical significance, i.e., we include all values, as we explain in the manuscript.

      It is difficult to see the value of the obtained dataset for the community, in part because the data are analyzed by a linear model and cluster assignment developed by the authors, which is a somewhat arbitrary representation. Perhaps the authors could explain how their data could be used by other researchers, and maybe even develop an accessible portal for interacting with the data.

      We do provide the entire set of data in a formatted Excel Table as Supplementary Tables 3 and 4, which contain common pairwise comparisons and ANOVA tests that allow a researcher without a strong proteomics background to explore the data, and we also provide the raw proteomics datasets deposited in PRIDE, so any interested colleague can re-analyse them in the manner that suits their purposes best.

      We analysed the data in the way we did because it takes account of the knowledge from genetics that we have of all these cell populations. This also allowed us to include the important set of proteins and phosphosites that are completely absent from all but one mutant genotype, and would therefore have dropped out of the statistical analyses.

      For example, what does it mean biologically that a protein is a member of a specific cluster shown in Fig. 3C? Is there a predictive value in such an assignment, and how does it relate to the main question of DV axis regulation? An example of a novel insight obtained for specific protein(s) would be useful to illustrate the utility of this analysis.

      The clusters represent groups of proteins that are present at higher or lower abundance in subsets of cell populations. So, for example, being present in cluster 5 means (Fig. 3C) that this protein is predicted to be more abundant in the mesoderm than elsewhere (which includes being detected ONLY in the mesoderm, like Snail). This clustering therefore is the way for us to find new proteins that conform to these groups.

      We provide here the immunostainings of two cytoskeleton-associated proteins that our proteomic analyses predicted to be more abundant in the ectoderm (Cluster 6: dorsal+lateral):

      • The actin-microtubule crosslinker Short-stop (Shot), which is seen to be reduced in the mesoderm.
      • The actin-severing protein Cofilin/Twinstar, which was also found downregulated in the mesoderm in the work cited in Ref.:10 Gong L. et al., Development (2004). The staining shows that cofilin-GFP is abundant in the entire subapical region of ectodermal cells, but strongly reduced in ventral furrow cells, where it is only retained in a few apical membrane blebs. These proteins are targets for functional analyses in follow-up work.

      [Imaging Data for Reviewers]

      Figure: Physical cross-sections of fixed embryos showing the enrichment of proteins in the ectoderm (cluster 6: DL). Dorsal is top, ventral is bottom. Scale bar: 50 um Top panel: Staining for short-stop (shot; cyan / grayscale) and snail (yellow) in embryos expressing gap43-mCherry. Bottom panel: staining for discs large (dlg, magenta) and GFP (green / grayscale) in embryos expressing cofilin-GFP (Kyoto protein trap for Cofilin/Twinstar).

      Overall, at present the study appears to have limited novelty and mechanistic insight. The data generally align with prior expectations, but it is unclear how this work advances the field.

      We were reassured that the data align with previous studies, but as we state in the text, they go well beyond these valuable and important studies in several dimensions. We had made the following assumptions:

      1. DV patterning mutants recapitulate biological qualities of DV cell populations and the differential expression of DV fate determinants, as confirmed in Fig. 1 and Fig. 3D.
      2. The differential regulation of the proteomes and phosphoproteomes across DV patterning mutants recapitulates the abundances of proteins and phosphosites within DV cell populations of a wildtype embryo. We confirmed this in Fig. 3A and Fig. 5C with the implementation of a linear model for the abundances of detected proteins and phosphosites. The resulting analysis revealed new avenues for future functional studies, as intended. Most of the work on cell shape regulation at the gastrulation stage has focused on actomyosin and a subset of cell adhesion molecules. We have identified networks of proteins and phosphoproteins that may also control gastrulation (Fig. 6 and Supplementary Fig. 5), including microtubules, which were significantly enriched in networks of phosphoproteins (Fig. 7 and Supplementary Fig. 6).

      For example, the observed differences between marker proteins in Toll10B vs. spn27A data seem to confirm previous suggestions that spn27A has a stronger ventralizing effect.

      This suggestion was made by colleagues who had unpublished observations on a limited number of gene expression patterns that supported their contention. A correlation analysis (see figure below) of our results now shows that proteins with a restricted dorso-ventral pattern change more in spn27Aex mutants than in Toll10B. If we look at the known mesodermal genes such as Snail, Twist, Mdr49 and CG4500 we find them at higher abundance in spn27Aex than Toll10B , while the ectodermal genes Egr, Zen, Dtg, Tsg, Bsk, and Ptr are reduced more strongly in spn27Aex than in Toll10B. This takes the prior observation of a stronger ventralization of spn27Aex from an anecdotal to a systematic analysis.

      [Correlation analyses available for reviewers]

      Cross-correlation between the fold changes (FCs) in Toll10B/WT vs. spn27Aex/WT for all proteins detected in wildtype, Toll10B and spn27Aex. Each dot is a protein. The green line is the 'identity' function (slope = 1) that would be expected if the FCs for each protein in both ventralized mutants were exactly the same. A set of proteins with restricted dorso-ventral distribution are highlighted in yellow: mesodermal (ventral) and blue: ectodermal (dorsal).

      The role of microtubules in epithelial folding in the embryo has also been demonstrated before.

         The role of microtubules in epithelial folding in the *Drosophila *embryo has indeed been examined in three previous studies that studied dorsal fold formation (Ref.: 35, Takeda et al. NCB 2018), ventral furrow formation (VFF, Ref.: 36, Ko et al. JCB 2019), and salivary gland invagination (Booth et al. Dev Cell 2014). These data reveal diverse and non-conservative functional requirements, ranging from acto-myosin contractility during apical constriction (Booth et al. 2014), force transmission and repair of the supracellular contractile network (but not apical constriction per se, Ko et al 2019), to the generation of expansile forces during cell shape homeostasis (Takeda et al 2018). In light of this potentially broad functional spectrum, we sought to compare three epithelial folds that form within the context of gastrulation: ventral furrow, cephalic furrow and dorsal folds. We confirmed that the initiation of VFF was normal, but the final invagination failed, as per Ko et al. 2019, while dorsal fold initiation did not occur (extending conclusions from Takeda et al 2018). In contrast, cephalic furrow formation, though delayed, did not require microtubules. We also revealed a novel commonality of MT function. Specifically, prior to the initiation of all three epithelial folds, proper nuclear positioning requires MTs. We additionally discovered novel membrane abnormalities in two distinct types of blebs during ventral furrow and dorsal fold formation, respectively. Thus, our data provide insights into the roles of microtubules during epithelial folding that go beyond prior work.
      

      The shown phosphorylation changes (if they are significant) for Toll and Cactus are difficult to explain. In Suppl Fig 2B, E: why is Toll more phosphorylated in the lateralized than in ventralized embryos? (the provided reference 20 does not seem to clarify this).

         These changes are indeed significant (Toll-S871: Vtl vs. WT p = 0.01 , Vsp vs. WT p = 0.002; Cactus-S463: Vsp vs WT p = 0.03); see Supplementary Figure 2B and Supplementary Table 2).
      
         We have corrected Ref. 20 (Shen B. and Manley J.L., Development 1998). Ref. 20 only shows that Tl is phosphorylated by Pelle (Ref 20: Fig. 6A), although neither the exact position of Tl phosphosite(s) nor the function of Tl phosphorylation were explored in this article. A hallmark of Toll Like Receptor (TLR) regulation is these receptors are subject to tyrosine phosphorylation, which has been widely connected to the regulation of the binding of adaptor proteins to the cytoplasmic tail of TLRs. Both our finding of Serine phosphorylation in Tl, and the differential phosphorylation across cell populations is new, but since we do not know what this particular Serine phosphorylation site does in TLRs in general, we cannot speculate on the meaning of it occurring more in lateral than in ventral cells. In Ref. 20, the authors speculate that Tl phosphorylation by Pelle regulates the association between Tl and Pelle, which then enables Dorsal translocation to the nucleus. It might also be part of a feedback regulation loop, but this is entirely speculative.
      

      Also, certain Cactus phosphorylations appear higher in dorsalized and ventralized embryos, but not in lateralized embryos. Are such changes expected and do they make sense biologically? It is unclear why these phosphorylation data are used to validate the success of the approach.

         The three Cactus phosphosites S463, S467 and S468 were identified and characterised in the work cited in Ref. 19 (Liu Z.P. et al., Genes and Development, 1997), and we used these sites to validate that our approach was sensitive enough to detect known phosphosites in proteins that act on the dorso-ventral patterning pathway specifically at the point of gastrulation (Stage 6 of embryonic development). We also reported in this manuscript the detection of known phosphosites within the Rho-pathway (Fig. 5E,F, Myosin Light Chain: T21, S22; Cofilin: S3).
      
         Liu Z.P. et al. reported that these three sites map to the Cactus PEST domain, which is required for Cactus degradation in the mesoderm (Belvin M. et al, Genes and Development 1995).  Liu Z.P. et al. also showed that mutating these phosphosites impairs Cactus turnover without affecting the ability of Cactus to bind Dorsal. We can only speculate that the differential phosphorylation across dorso-ventral embryonic cell populations is associated with the regulation of Cactus turnover. Consistent with this, we find Cactus downregulated 1.5 log2 fold in ventralized embryos derived from *spn27Aex/def* mothers. Furthermore, there are a number of signalling pathways that act both in the dorsal and the ventral-lateral domain (e.g., rhomboid/EGF), so it is not surprising to find modifications that are shared by these regions.
      

      The rationale to use a diffusion algorithm for data analysis is not clear. How would the analysis differ if diffusion was not used?

      Phosphoproteomics data are often sparse and noisy for a number of reasons (technical; low abundance of phosphorylated peptides compared to other peptides in the cell; biological: not all phosphosites are functional). Network diffusion is a common way used for various data types to boost the signal-to-noise ratio. For example, if from a list of 10 phosphosites, 5 all fall in the same network region or process, and the rest are randomly distributed in the network, chances are that the first region is more representative of the regulated process in that dataset. Using network propagation, the signal coming from the first 5 phosphosites would give a higher score to that network region, marking it as the predominant signal. Our specific implementation, which uses the semantic similarity between nodes to model the edges in the network, further boosts the functional signal by preferentially including nodes that have a higher functional similarity to the initial phosphosites. Our approach therefore allows us to identify the processes that are predominantly ‘active’ in our dataset. We refer the reviewer to our recent preprint for more evidence that this strategy boosts the signal-to-noise ratio in phosphoproteomic datasets and further prioritises more functional phosphosites (https://www.biorxiv.org/content/10.1101/2023.08.07.552249v1). If this approach was not used and we based the identification of relevant processes only on the list of phosphosites, we would have acquired more spurious terms in our functional enrichment analysis. The above preprint also shows that different methods such as the Prize Collecting Steiner Forest algorithm perform worse for phosphoproteomics data.

      Generally, the discussion of enriched GO categories presented in Fig. 6 is not rigorous, and it is unclear what biological insight is provided by this figure, probably because the categories are extremely diverse and not clustered in a meaningful way. Despite stating that the work on microtubules came out as a result of proteomic analysis, there is no connection between proteomic data (e.g., data shown in Fig. 6) and microtubule analysis in Fig. 7.

         The connection is between the __phosphoproteomic__ data and the microtubules. The reviewer is correct about the fact there is little connection at the proteomic level with microtubules. Only the diffused network analyses performed on the phosphoproteomic data pointed in this direction. We have improved the writing about this point.
      

      The Discussion section touches on areas of differential protein degradation and mRNA regulation; however, these data are not presented in Results or Figures and so it is difficult to assess the relevance of this analysis.

           We present these data in Figure 6A,B. The network analyses of the clusters showed significant enrichment of cellular component terms that are connected with protein turnover and mRNA regulation. We have added a reference to figure 6 in the Discussion for clarity.
      

      There is insufficient citation of prior literature throughout the manuscript: many statements are lacking proper references.

      We have corrected the mistakes and added missing references.

      Proteomics data should be deposited into a standard repository that is a member of ProtomeXchange Consortium, such as PRIDE, etc.

      All proteomics and phosphoproteomics data have been uploaded to PRIDE:

      The raw files for the proteomics and phosphoproteomics experiments were deposited in PRIDE under separate identifiers:

      Proteome: Identifier PXD046050 (Reviewer account details: reviewer_pxd046050@ebi.ac.uk, pw: coJ9otiX).

      Phosphoproteome: Identifier PXD046192 (Reviewer account details: reviewer_pxd046192@ebi.ac.uk, pw: nvkbwClp).

      We have included a statement of raw data availability in the revised version of the manuscript with the PRIDE access information.

      __Minor comments __

      The text has several typos and should be proof-read, and references to figures and tables should be checked, as some of these are not correct.

      We have corrected typos, references to figures and tables in the revised version of the manuscript.

      The genotypes for the mutations used in this study should be accompanied by citation describing identification of these mutations and the resulting phenotypes. It would also be helpful to describe the nature of these alleles (molecular lesion, gain vs loss of function, etc.). Some of this information is included in the Discussion, but it would be useful for the reader to learn this early on, when the chosen genotypes are presented.

      All this information is and was provided in the methods section and in Table 1, including stock numbers and sources of the stocks. Please see 'Methods, Drosophila genetics and embryo collections'.

      2G,H - the X axis should be clearly labeled as logarithmic.

      We introduced the log2 label in the X-axis of Fig. 2G,H and any other panel in which this was not expressly made clear.

      In Fig. 2G the locations of lines showing fold changes for Twist and Snail seem incorrect. In Fig. 2H the dotted line does not appear to correspond to 50% of the number of phosphosites.

      We apologise for these errors, both have been corrected in the revised version of the manuscript.

      5D can be improved by adding letters for the coloured clusters.

      We have labelled the clusters in Fig. 3B and Fig. 5D. to ease the identification of biologically relevant clusters.

      It is unclear if any specific additional insight was obtained using SILAC, the authors may want to discuss this approach and outcomes more.

      SILAC has been widely used to deal with the inherent variability of proteomic analyses by introducing a standard that is metabolically labelled, in our case, w1118 flies fed with SILAC yeast were used as the standard. Because the inherent variability is larger in phosphoproteomic experiments (because protein identification is based on phosphorylated peptides only, see Methods), we used SILAC labelling only in the phosphoproteomic experiment.



      __Reviewer #2 __

      Evidence, reproducibility and clarity


      The present article by Gomez et al describes a deep proteomics analysis of the proteome and phosphoproteome of embryos mutated for key genes involved in the dorso-ventral axis in Drosophila melanogaster. Overall, this is a nice article showing new insight in this development process. The results are mainly descriptive, yet identifies potential new players in the definition of the dorso-ventral axis.

      The generation of mutants for genes found up- or down-regulated in each mutant strain would be a significant addition to this manuscript. But I think in its current form the data brings enough new information on this particular developmental step and would be of interest for the fly community.

      My main concern is that the manuscript can be difficult to read and overly convoluted at times even for experts in the field. I would suggest the author move some methodological explanations from the results to the methods section to further detail the goals of some results sections.

      We have followed these suggestions and hope we have made the manuscript more easily readable.

      As an example, the goal of part 3) « A linear model for quantitative interpretation of the proteomes » is not clear to me. Are the authors comparing the abundance of a protein in the WT versus a theoretical WT in order to determine which fractions of mesoderm, lateral ectoderm and dorsal region are actually present in the WT? (...)

      Yes, in part, but the main purpose was to compare how well the theoretical WT, as ‘reconstituted’ from the mutants, corresponds to the observed actual WT (for which we have at least approximate values).

      The question that we faced when we started these calculations was: what is the ‘correct’ fraction (or proportion) we should use to weight each protein (or phosphosite) measurement in the mutants. Theoretically, these values should be those that result in the best match between the theoretical WT and the measured WT abundance of each protein (or phosphosite). We knew from actual measurements only the mesodermal fraction, which was determined to be ~20% of the cross-sectional area (Ref. 21: Rahimi, N., et al Dev. Cell. 2016). The neuroectoderm and ectoderm fractions were estimated to be approx. 40% each (Ref.: 22, Jazwinska, A et al. Development 1999), but we lacked an exact number. The systematic exploration of these proportions led us to conclude that indeed both the neuroectoderm and ectoderm fractions should be around 40% each, provided the mesoderm is fixed at 20%. Thus, we used these fractions: D: 0.4 L: 0.4 V: 0.2 for our follow-up analyses.

      (...) Or are they using it as a reference to obtain a fold change for the different proteins quantified (in this case why not use the WT?)?

      yes, again, in part: as a reference for the EXPECTED fold changes, as would be predicted from the WT.

      Since we have moved some of the details of this approach from the main text to the methods section, we have also revised the remaining text and hope it is now clearer.

      The proteomics data must be deposited in a public repository. I did not see it stated in the methods section.

      All proteomics and phosphoproteomics data have been uploaded to PRIDE; see further comments above in response 13.

      The version of the uniprot database is quite old (2016) so is the version of MaxQuant used in this study. Any reasons for that (other than that the analysis was performed in 2016)?

      That is indeed the reason.

      The data were run on different MS platforms, how did the authors account for the variability in MS signals? What samples were run on which MS platform? Were the WT embryos ran on both?

      We measured three replicates, and all five genotypes (four mutant genotypes plus wildtype) for each of the replicates were measured on the same instrument. Specifically, for the whole proteome analyses, replicate one and three of all genotypes were measured on the QExactive Plus instrument and replicate 2 of all genotypes were measured on a QExactive HF-x instrument, as were the phosphoproteomes. So, indeed, the wildtype was measured on both instruments. We thus did not observe instrument-specific bias in the PCA analysis for the proteome data.

      We have added this in more detail to the method section:

      “Samples of replicate one and three were measured on the QE-Plus system and replicate two was measured on the QE-HF-x system.

      For phosphoproteome analysis, (…) Samples of all three replicates were measured on the QEx-HFx system. We added trial samples measured on the QEx-Plus system to increase the phosphosite coverage using the match between runs algorithm.”

      In the methods section the authors mention that a high-pH reverse phase fractionation was performed? How many fractions of High-pH reverse phase separation were injected per sample? Was this separation performed for all the samples?

      We have adjusted the Methods section regarding the high-pH fractionation by adding the following sentence: “Fractions were collected every 60s in a 96 well plate over 60 min gradient time collecting a total number of 8 fractions per sample.“

      Why did the authors used label-free (proteome) and SILAC (phosphoproteome) quantification methods?

      See our response to reviewer #1, point 19.

      Why is the threshold based on the Q3 of the standard deviation (if I got it right) ? Couldn't they be calculated directly on the distribution of the ratio?

      We could also have done it that way.

      However, we had wanted also to take into account the variation between the replicates, i.e., the quality of the individual measurements, and we therefore devised the procedure we used, by which the standard deviation of the individual technical replicates enters the calculation with the ratio of the averages, the variability between replicates would have been ignored and we considered it more appropriate to take the more conservative approach. But as it turns out, the cut-off would have ended up being very similar had we calculated it the way the referee suggests,

      Page 6: The supplementary figure 2E refers to the protein Cactus and the text to CKII, please modify one or the other to avoid any confusion. Page 7: A dot is missing at the end of the following sentence « if used with the assumed weightings for the populations »

      We have corrected these sentences.

      Page 19: Replace SppedVac by SpeedVac

      We have corrected the error in the manuscript and thank the reviewer for the detailed inspection.

      Page 8: why not using a z-score with thresholds directly instead of a -1/+1/0 system and then using the z-score?

      Because we wanted to compare the relative changes over wt between mutants (i.e. the similarity between 1 0 0 and 0 -1 -1) rather than the relationship of their absolute values to the wt, and to assign proteins with similar relationships into the same dorso-ventral regulation categories.

      The text states this (previously in main text, now in methods):

      “The reason for this is that this method takes into account that value sets that represent similar relative differences between the mutants (for example, 0 -1 -1 vs. 1 -1 -1 or 1, 0, 0) are biologically more similar to each other than the raw values indicate. The z-scores for all of these cases would be 1.1547 -0.5774 -0.5774.”

      In the abstract it is mentioned that 3,399 proteins are differentially regulated at the proteome level versus 1,699 significantly deregulated at a 10 % FDR in the main text (page 5). Is there a reason for this discrepancy? Same comment for the phosphopeptides.

      But we now also see the need to better clarify this point, and we have edited the text accordingly.

      The second number refers to those proteins that show statistically significant changes based on ANOVA (1699 proteins).

      The first number (3398; note that the number 3399 in the abstract was a typo, now corrected) includes all proteins that were detected in at least 1 replicate in the wildtype (5883/6111) minus those that do not change between the genotypes (2156/6111) and minus all those that change in the same direction in all mutants (329).

      This includes proteins that are automatically excluded from ANOVA, i.e., those that are detected only in the wildtype (35/6111 proteins) or in two or more genotypes but only in 1 technical replicate ANOVA negative ones.

      As we stated, we did this because it “allows us to include the important group of proteins that show a ‘perfect’ behaviour, like dMyc and WntD, in that they are undetectable in the mutants that correspond to the regions in the normal embryo where these genes are not expressed.”. This 'regulated' set consists of those proteins that exceed the |0.5| fold threshold.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This review is a list of many individual critiques. It is unclear what the expertise of the reviewer is (they do not provide the answer to that question in the review form, unlike the other referees), but several of the criticisms are unfounded. Three of the PIs of this work are researchers with extensive experience in Drosophila genetics and early development but are nevertheless confounded by some of the comments made by this referee.

      The mutants do not completely "flatten" the embryos.

      We do not claim that they do. Nor are the ventral, lateral and dorsal regions in the normal embryo completely ‘flat’ or homogeneous. But the mutants are good representations of the major fates in these regions, as a wealth of published literature from the last 30 years indicates.

      For instance, Tl10B broadly expresses snail but also expresses sog in the head. (i.e. Fig 1B - sog and sna expression in Figure 1B mutant backgrounds looks odd.) The sog expression likely relates to a deficiency specific effect.

      This ‘sensitive’ area is well known also from other genetic conditions – e.g. partial loss of dorsal and indeed in Spn27A mutants. It is therefore not specific to the Tl10B deficiency but says something about gene interactions in this region. Thus, this cannot be a deficiency-specific effect.

      Is sog seen in a Toll10B/+ mutant background?

      Yes, it is, and more frequently than in Toll10B/Def.

      The deficiency used for the Toll10B experiment is Df(3R)ro80b which is quite large and deletes 14+ genes.

      True. However, this does not matter: the mothers are heterozygous, so the genes are not missing, they are present in one wildtype copy! And these mothers are then mated with wildtype fathers, so if expression of these genes were needed in the embryo, then there would be another full wt copy of each. We appreciate that maternal effect genetics can be difficult to follow, but this is all work that has been done a long time ago, and is not the point of this paper at all.

      The deficiency used for the spn27A experiment is Df(2L)BSC7 and removes 4+ genes.

      Again, this would only matter if these were maternal effect genes that were needed for the establishment of the dorso-ventral axis, and they are not.

      Furthermore, the gd9 allele may not be a complete loss of function.

      It may not be – but what matters is the well characterized phenotype which has been shown to represent dorsal cell types.

      It is possible that the Toll10B allele picked up an accessory dominant mutation.

      This again would only matter if it was a dominant AND maternal effect mutation that affects the DV axis in the embryo – and there are very few of these known. And nothing in our analysis of these embryos, with which we have been working on and off over 3 decades and therefore know very well, indicates that our current stock is any different from those we have seen in the past.

      Unfortunately, these mutant phenotypes that affect DV and AP patterning mean that conclusions cannot be made that changes in protein relate to DV patterning.

      We simply do not understand this statement.

      Why do the mutant phenotypes (gene expression patterns and cell morphologies representative of the ventral, lateral and dorsal cell populations) not mean that the proteins downstream of the fate changes correspond to the cell fates?

      To get a better view of the ventralized phenotype, the authors should repeat the analysis by ectopically expressing Toll10B using the Gal4-UAS system; UAS-activate Toll transgenes are available.

      All Gal4-UAS maternal drivers, even the best and the strongest, result in mosaic expression. Our lab has extensive experience with this system and we know that, for example, the homogeneous, high levels of twist or snail expression that we see in spn or Tl10B embryos cannot be achieved with GAL4.

      Fig 1C-F - due to combined AP and DV effects seen with ventralizing mutants, it is important that the authors confirm that cross-section views relate to the middle to posterior of the embryo.

      We confirm this.

      Costaining with anti-Kr or -Caudal would help to ensure they are assaying the correct AP domain for pure DV effects.

      In our view, this is an unnecessary experiment. I know where the middle of the embryo is. If the reviewer does not believe when we say we are showing a section from the middle, they can see that the sections are not from the end region by, for example, the cell number, and the section angles.

      The authors refer to reference [60] for stages but there is no information regarding morphological criteria used under the microscope to stage the embryos.

      We have now added more detail in the methods section:

      Briefly. using a Zeiss binocular, the embryos were individually hand-selected on wet agar which made the embryos semi-transparent, allowing the assessment of a range of morphological features, of which at least some are visible in each of the mutants:

      • Yolk distance to embryonic surface: distinguishes between early (stage 5a) and late cellularisation (stage 5b).
      • Yolk distribution within the embryo: identification of large embryonic movements of the germ band (e.g.: Initiation of germ band extension, marking the initiation of stage 7). In DV patterning mutants this is seen as twisting of the embryo.
      • Change in the outline of the dorsal-posterior region: polar cell movement from the posterior most region of the embryo (stage 5a/b) to stage 6a/b.
      • Formation of the cephalic and dorsal folds: identification of stage 6 (initiation of cephalic fold) and stage 7 (dorsal folds). The combined use of these morphological criteria, together with the synchronised egg collections allows accurate staging of wild type and mutant embryos.

      Furthermore, what is stage 6a,b? Stage 6 is not typically divided in two stages nor is it clear what a,b relate to.

      We used a generally accepted standard for staging embryos: Campos-Ortega J.A. and Hartenstein V. ‘The embryonic development of Drosophila melanogaster’ book (ref. Nº 60). In this book, they describe the morphological criteria that can be followed in living embryos for proper staging. These stages, with these exact names, are shown on pages 11 and 12 of the 1997 edition (2nd edition).

      According to the published timetable of Drosophila development by Foe et al. 1993 (not cited), gastrulating embryos are 200 min or 3 hr 20'. It's unclear if this is the stage that was assayed.

      Foe is a beautiful paper, but we did not cite it because the commonly used nomenclature predates it (Campos-Ortega and Hartenstein 1985).

      In addition, timing depends on temperature whereas morphological criteria do not.

      The mutant embryos likely develop at different rates relative to wildtype. It seems important to provide details about the staging of embryos. If the mutant embryos take longer to gastrulate, for instance, might that also be a factor that impacts the proteome.

      As described above, we used a combination of criteria to accurately judge staging. DV patterning embryos could in principle develop faster or slower than wildtype. We performed synchronised egg collections (Methods: Embryo collections) for 15’. Therefore, any developmental timing defect would have become evident based on a difference in the number of embryos entering stage 6 and 7 at the point of visual inspection of the collections. This was not the case.

      How many replicates for each genotype? In the text it states, "replicates from the same genotype clustered together (Fig. 2E)....." Similar vague reference for phosphoproteome follows (Fig 2F). It is then stated that it was impossible to determine the experimental source for this variation. Could it relate to differences in timing of samples?

      We had given the numbers of replicates in the figure legend but have now also included them in the methods section for more clarity. We did 3 replicates for each genotype in each experiment, with the exception of gd9 and spn27aex mutants, for which we did 2 biological replicates each with 3 replicates, making a total of 6 replicates for these genotypes in the proteomic experiment. We have included an additional clarification in figure legend 2. The number of replicates per genotype per experiment can also be seen from the correlation matrices shown Fig. 2E and 2F, in which the replicates are shown individually. The measurements for each replicate for each genotype within each experiment were reported in Supplementary Tables 2 and 3, 'description' tabs of the worksheets.

      The lengthy discussion of ratio estimation on page 7 should be streamlined and made more clear. Are the authors throwing out data and only keeping samples that support their model? This seems like overfitting - if I am understanding correctly, you are selecting the samples that support the "majority of proteins fit the linear model" but this isn't necessarily the case.

      No, this is a misunderstanding. We do not select data.

      We have rephrased this section, but to explain here briefly: We do not select any samples, we state that the majority of proteins fit the theoretical model (and that is not even surprising, because any protein that does not change across the populations will automatically fit the model). We then discuss why some might NOT fit the model. The model doesn’t need to be supported, it simply is a calculation that allows us to stratify the data.

      They call this the 'correct' manner (see section 4 page 7) but it seems like a working model and presumptuous to imply that it is the correct way.

      We explained in the text why we refer to this as ‘correct’. It is a matter or definition, not presumption, and we even used quotes to be clear about this. ’Correct’ indicates a combination of values that is consistent with the biological model that the DV mutants are good representations of the corresponding embryonic cell populations in a wild type embryo. We do not in any way ‘throw out’ other data, we just note they don’t fit that model. Clarifications on the concept for the model have been added in various places in the text

      Figure 3C - it is confusing to use a circular diagram to show DV inferred position of the 14 clusters as their position on the circle does not correspond to where they are expressed on the embryos. Perhaps a stacked bar graph for 6 different domains would be better.

      This figure does not show positions of clusters. It is simply a pie chart, as is stated in the figure legend and as can be seen by the numbers and the corresponding sizes of the sectors. We have tried a stacked representation (shown below), but find it no clearer and have therefore stuck with this very common way of representing quantities, and in particular, proportions. We use the same representation with the same colour schemes in all subsequent figures, so proportions can be compared across figures.

      It is very hard to follow the text on page 9.

      We have rephrased this section

      It is very hard to see the gene expression patterns shown in Fig 4A with the color scheme/scale used.

      We appreciate this colour scheme does not correspond to the commonly used dark colour on a light background which would mimic histochemistry to show gene expression. The ‘inferno’ colour scheme was used because it allows better quantitative comparisons between subtly different patterns. However, to make these figures more similar to the types of in situ hybridisations that embryologists are used to seeing, we now use a different representation.

      In general, Figure 4 is uninterpretable - in particular, what do the numbers mean on the greyscale circle plots in panel D?

      We apologize for having failed to explicitly include the explanation for this in the figure legend. The reader will notice that these numbers add up to the number in the circle to the left, and the numbers indicate the number of proteins showing perfect matches (white), partial overlaps (grey) and mismatches (black). We have improved the graphic representation and added an explanation in the figure legend.

      Figure 5A. Why wasn't protein abundance and phosphosites identified from an individual, identical sample?

      This was because of the way the project developed over the course of the research, and the protein part was originally intended only as a proof of concept, with the intended focus being the phosphoproteome. We later decided to include a full analysis of the proteome, but did not consider it worthwhile and necessary to repeat the entire laborious and expensive experiment with both analyses being done from the same samples.

      How can one be sure that the phosphosites were correctly assigned if the proteins were not detected in the proteome but they were only identified in the phosphosite analysis?

      We are not sure we understand this question. The phosphoproteomic analysis identifies phosphopeptides of proteins that in turn allow one to identify the protein itself and the amino acid in that peptide that is phosphorylated. So the identification is done only WITHIN the phosphoproteomic analysis and does not relate directly to the proteomic analysis. This explains why we found some phosphopeptides for which we did not detect the full host protein in the proteomic analysis.

      Thus, if a protein was detected only in either of the experiments, this fact doesn’t modify the validity of the result, because the identification was done individually for each experiment.

      Page 16 - much discussion about the difference between Spn27A and Toll10b/def mutant background. One has half as much Toll receptor. The phenotype of Toll10b/+ should be examined.

      Both genotypes have been extensively examined in the past. Tl10B/def has only one copy of the gene from the mother, and the mutant protein is constitutively active. By putting it over a deficiency, we (and others in the past) made sure that the exclusive source for Tl signalling is from this gain of function Tl allele, and that the wildtype receptor, which would still be activated by the natural ligand in a graded pattern along the DV axis, does not confound the result.

      The Tl10B/+ combination creates a less ventralized phenotype which is not more similar to that of spn27Aex/def but in fact less similar.

      Page 12 - hard to follow the discussion of modeling (?) presented in Figure 6. The results (bottom of page 12 - #1 "most networks are enriched for cellular components associated with regulation of gene expression" and page 13 #2 - "cytoskleeton emerges as a major target of regulation") seem vague and unsubstantiated. Rhabdomere, P granule, micropyle, autophagosome?

      We agree with the reviewer that there are many cellular components that are enriched in the diffused network analyses, many of them unrelated to morphogenesis. We had highlighted this finding on page 12, paragraph 3. Nevertheless, we have rephrased the statements as ‘the heat maps illustrate that most of the enriched cellular components in both experiments were highly enriched with cellular components associated with DNA and RNA metabolism or the regulation of gene expression.’ and have now included numbers.

      We think ‘a major target’ for phosphorylation does in fact apply to the cytoskeleton, and we had already supplied the number to substantiate this in the manuscript (14/62).

      Readers will be able to evaluate these network analyses based on their own fields of interest or particular questions they may wish to address. We haven’t excluded any cellular component terms.

      Figure 7 seems like a separate study.

      Why were the phosphopeptides investigated to determine if they relate to phosphorylated proteins? Phosphoantibodies could have been generated for a subset. Instead the manuscript pivots to analysis of microtubules.

      We are reporting here one example of a proof-of-concept study that we carried out, chosen based on our own research interests and on available tools and reagents. There are clearly many other avenues that could have been explored and that others may want to explore, but that go well beyond this report. We have made this more explicit in the text.

      Page 14 - discussion first paragraph. Please cite ref[10] when discussing the "previous study" otherwise the reader will not understand which study you are referring to until the next paragraph.

      We have moved the reference from its current position to the one suggested by the reviewer.

      • In general, the study would benefit from more attention to references and citations of prior work. A comparison of this work to the Gong et al. Development 2004 study should be made earlier. This work is cited very early on, namely in the introduction.

      • The authors start off saying that no other study has looked at proteins from a spatial perspective. We are unsure what the reviewer refers to. We say precisely the opposite: we indicate that studies have been performed to look at differences in cell populations, including that by the lab of Jon Minden (Gong et al), a highly respected former co-author of one of the current authors (ML). We do state that the technologies at the time did not allow the same depth and temporal resolution as the methods that are available nowadays. For instance, Gong et al. used an excellent and original approach at the time, which however did not detect Snail and Twist in the ventralized mutants.

      The only time we say ‘no other study’ is about ‘region-specific post-translational regulation of proteins’ - though we do state in the discussion that Gong et al would have detected some of these cases because they used 2D gels.

      • Along these lines, there is another more recent proteomic study from Beati et al. Fly 2020 using similarly staged embryos. How do these other experiments compare to the current ones? As they apparently analyzed proteome and phosphopeptides from an identical sample, are the authors' new data using separate samples consistent? This study is actually about a later stage (stage 8 embryos, post-gastrulation). Again, an excellent study, but not directly relevant to our current analysis. It validates the use of SILAC in Drosophila, although it is not the first study to do this. Furthermore, it looks at a different question and biological process using a mutant, htl, to understand the effect of FGF signalling.

      • Furthermore, Adam Martin's lab has been studying microtubule action along the dorsoventral axis (Denk-Lobnig et al 2021) and this work is not cited. Denk-Lobnig et al 2021 is about spatial patterns of myosin and actin and how that is governed genetically on the ventral side of the embryo, pertaining primarily to ventral furrow formation. It does not analyse microtubules nor dorsal-ventral cell populations.

      It is possible there may be some confusion with another excellent study from Adam Martin’s lab, in which the role of microtubules is analysed. But this is exclusively in the ventral furrow, and the study did not look at the effect of microtubule depolymerisation on nuclear positioning nor membrane behaviour. We cite this work extensively (Ref.: 36, Ko et al. JCB 2019) and we compare our results to that paper. However, our work here goes beyond this study in that it looks at all cells along the DV axis.

      General comments:

      Typos throughout. For example, page .4 section heading "dorso-ventral cell..."

      We have scanned the entire document for typos.

      Font size extremely small - for example see Figure 1A gene names, and 1F magnified view.

      We have adjusted the fonts in the main figures.

      Scale bars not shown when showing magnified views. For example, see Fig 1E,

      We have added these.

      Reviewer #3 (Significance (Required)): This study by Gomez et al. uses a proteomic-centered approach to study proteomes associated with cell populations in the embryo that they argue relate to different positions along the dorso-ventral axis. They generate a proteomic resource, though it was unclear how anyone could use the data they produce. There is no searchable database and we have to trust that the authors will ultimately provide such a resource to the community.

      All proteomics and phosphoproteomics data have been uploaded to PRIDE. Also see responses to the other referees’ queries about this point.

      There is the potential for interesting insights but the work is not presented in a way that is accessible or useful. The presentation needs significant improvement.

      We have improved the presentation and way the results are presented as per the suggestion of all reviewers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Strengths:

      This work (almost didactically) demonstrates how to develop, calibrate, validate and analyze a comprehensive, spatially resolved, dynamical, multicellular model. Testable model predictions of (also non-monotonic) emergent behaviors are derived and discussed. The computational model is based on a widely-used simulation platform and shared openly such that it can be further analyzed and refined by the community.

      Weaknesses:

      While the parameter estimation approach is sophisticated, this work does not address issues of structural and practical non-identifiability (Wieland et al., 2021, DOI:10.1016/j.coisb.2021.03.005) of parameter values, given just tissue-scale summary statistics, and does not address how model predictions might change if alternative parameter combinations were used. Here, the calibrated model represents one point estimate (column "Value" in Suppl. Table 1) but there is specific uncertainty of each individual parameter value and such uncertainties need to be propagated (which is computationally expensive) to the model predictions for treatment scenarios.

      We thank the reviewer for the excellent suggestions and observations. The CaliPro parameterization technique applied puts an emphasis on finding a robust parameter space instead of a global optimum. To address structural non-identifiability, we utilized partial rank correlation coefficient with each iteration of the calibration process to ensure that the sensitivity of each parameter was relevant to model outputs. We also found that there were ranges of parameter values that would achieve passing criteria but when testing the ranges in replicate resulted in inconsistent outcomes. This led us to further narrow the parameters into a single parameter set that still had stochastic variability but did not have such large variability between replicate runs that it would be unreliable. Additional discussion on this point has been added to lines 623-628. We acknowledge that there are likely other parameter sets or model rules that would produce similar outcomes but the main purpose of the model was to utilize it to better understand the system and make new predictions, which our calibration scheme allowed us to accomplish.

      Regarding practical non-identifiability, we acknowledge that there are some behaviors that are not captured in the model because those behaviors were not specifically captured in the calibration data. To ensure that the behaviors necessary to answer the aims of our paper were included, we used multiple different datasets and calibrated with multiple different output metrics. We believe we have identified the appropriate parameters to recapitulate the dominating mechanisms underlying muscle regeneration. We have added additional discussion on practical non-identifiability to lines 621-623.

      Suggested treatments (e.g. lines 484-486) are modeled as parameter changes of the endogenous cytokines (corresponding to genetic mutations!) whereas the administration of modified cytokines with changed parameter values would require a duplication of model components and interactions in the model such that cells interact with the superposition of endogenous and administered cytokine fields. Specifically, as the authors also aim at 'injections of exogenously delivered cytokines' (lines 578, 579) and propose altering decay rates or diffusion coefficients (Fig. 7), there needs to be a duplication of variables in the model to account for the coexistence of cytokine subtypes. One set of equations would have unaltered (endogenous) and another one have altered (exogenous or drugged) parameter values. Cells would interact with both of them.

      Our perturbations did not include delivery of exogenously delivered cytokines and instead were focused on microenvironmental changes in cytokine diffusion and decay rates or specific cytokine concentration levels. For example, the purpose of the VEGF delivery perturbation was to test how an increase in VEGF concentrations would alter regeneration outcome metrics with the assumption that the delivered VEGF would act in the same manner as the endogenous VEGF. We have clarified the purpose of the simulations on line 410. We agree that exploring if model predictions would be altered if endogenous and exogenous were represented separately; however, we did not explore this type of scenario.

      This work shows interesting emergent behavior from nonlinear cytokine interactions but the analysis does not provide insights into the underlying causes, e.g. which of the feedback loops dominates early versus late during a time course.

      Indeed, analyzing the model to fully understand the time-varying interactions between the multiple feedback loops is a challenge in and of itself, and we appreciate the opportunity to elaborate on our approach to addressing this challenge. First: the crosstalk/feedback between cytokines and the temporal nature was analyzed in the heatmap (Fig. 6) and lines 474-482. Second: the sensitivity of cytokine parameters to specific outputs was included in Table 9 and full-time course sensitivity is included in Supplemental Figure 2. Further correlation analysis was also included to demonstrate how cytokine concentrations influenced specific output metrics at various timepoints (Supplemental Fig. 3). We agree that further elaboration of these findings is required; therefore, we added lines 504-509 to discuss the specific mechanisms at play with the combined cytokine interactions. We also added more discussion (lines 637-638) regarding future work that could develop more analysis methods to further investigate the complex behaviors in the model.

      Reviewer #2 (Public Review):

      Strengths:

      The manuscript identified relevant model parameters from a long list of biological studies. This collation of a large amount of literature into one framework has the potential to be very useful to other authors. The mathematical methods used for parameterization and validation are transparent.

      Weaknesses:>

      I have a few concerns which I believe need to be addressed fully.

      My main concerns are the following:

      (1) The model is compared to experimental data in multiple results figures. However, the actual experiments used in these figures are not described. To me as a reviewer, that makes it impossible to judge whether appropriate data was chosen, or whether the model is a suitable descriptor of the chosen experiments. Enough detail needs to be provided so that these judgements can be made.

      Thank you for raising this point. We created a new table (Supplemental table 6) that describes the techniques used for each experimental measurement.

      (2) Do I understand it correctly that all simulations are done using the same initial simulation geometry? Would it be possible to test the sensitivity of the paper results to this geometry? Perhaps another histological image could be chosen as the initial condition, or alternative initial conditions could be generated in silico? If changing initial conditions is an unreasonably large request, could the authors discuss this issue in the manuscript?

      We appreciate your insightful question regarding the initial simulation geometry in our model. The initial configuration of the fibers/ECM/microvascular structures was kept consistent but the location of the necrosis was randomly placed for each simulation. Future work will include an in-depth analysis of altered histology configuration on model predictions which has been added to lines 618-621. We did a preliminary example analysis by inputting a different initial simulation geometry, which predicted similar regeneration outcomes. We have added Supplemental Figure 5 that provides the results of that example analysis.

      (3) Cytokine knockdowns are simulated by 'adjusting the diffusion and decay parameters' (line 372). Is that the correct simulation of a knockdown? How are these knockdowns achieved experimentally? Wouldn't the correct implementation of a knockdown be that the production or secretion of the cytokine is reduced? I am not sure whether it's possible to design an experimental perturbation which affects both parameters.

      We appreciate that this important question has been posed. Yes, in order to simulate the knockout conditions, the cytokine secretion was reduced/eliminated. The diffusion and decay parameters were also adjusted to ensure that the concentration within the system was reduced. Lines 391-394 were added to clarify this assumption.

      (4) The premise of the model is to identify optimal treatment strategies for muscle injury (as per the first sentence of the abstract). I am a bit surprised that the implemented experimental perturbations don't seem to address this aim. In Figure 7 of the manuscript, cytokine alterations are explored which affect muscle recovery after injury. This is great, but I don't believe the chosen alterations can be done in experimental or clinical settings. Are there drugs that affect cytokine diffusion? If not, wouldn't it be better to select perturbations that are clinically or experimentally feasible for this analysis? A strength of the model is its versatility, so it seems counterintuitive to me to not use that versatility in a way that has practical relevance. - I may well misunderstand this though, maybe the investigated parameters are indeed possible drug targets.

      Thank you for your thoughtful feedback. The first sentence (lines 32-34) of the abstract was revised to focus on beneficial microenvironmental conditions to best reflect the purpose of the model. The clinical relevance of the cytokine modifications is included in the discussion (lines 547-558) with additional information added to lines 524-526. For example, two methods to alter diffusion experimentally are: antibodies that bind directly to the cytokine to prevent it from binding to its receptor on the cell surface and plasmins that induce the release of bound cytokines.

      (5) A similar comment applies to Figure 5 and 6: Should I think of these results as experimentally testable predictions? Are any of the results surprising or new, for example in the sense that one would not have expected other cytokines to be affected as described in Figure 6?

      We appreciate the opportunity to clarify the basis for these perturbations. The perturbations included in Figure 5 were designed to mimic the conditions of a published experiment that delivered VEGF in vivo (Arsic et al. 2004, DOI:10.1016/J.YMTHE.2004.08.007). The perturbation input conditions and experimental results are included in Table 8 and Supplemental Table 6 has been added to include experimental data and method description of the perturbation. The results of this analysis provide both validation and new predictions, because some the outputs were measured in the experiments while others were not measured. The additional output metrics and timepoints that were not collected in the experiment allow for a deeper understanding of the dynamics and mechanisms leading to the changes in muscle recovery (lines 437-454). These model outputs can provide the basis for future experiments; for example, they highlight which time points would be more important to measure and even provide predicted effect sizes that could be the basis for a power analysis (lines 639-640).

      Regarding Figure 6, the published experimental outcomes of cytokine KOs are included in Table 8. The model allowed comparison of different cytokine concentrations at various timepoints when other cytokines were removed from the system due to the KO condition. The experimental results did not provide data on the impact on other cytokine concentrations but by using the model we were able to predict temporally based feedback between cytokines (lines 474-482). These cytokine values could be collected experimentally but would be time consuming and expensive. The results of these perturbations revealed the complex nature of the relationship between cytokines and how removal of one cytokine from the system has a cascading temporal impact. Lines 533-534 have been added to incorporate this into the discussion.

      (6) In figure 4, there were differences between the experiments and the model in two of the rows. Are these differences discussed anywhere in the manuscript?

      We appreciate your keen observation and the opportunity to address these differences. The model did not match experimental results for CSA output in the TNF KO and antiinflammatory nanoparticle perturbation or TGF levels with the macrophage depletion. While it did align with the other experimental metrics from those studies, it is likely that there are other mechanisms at play in the experimental conditions that were not captured by simulating the downstream effects of the experimental perturbations. We have added discussion of the differences to lines 445-454.

      (7) The variation between experimental results is much higher than the variation of results in the model. For example, in Figure 3 the error bars around experimental results are an order of magnitude larger than the simulated confidence interval. Do the authors have any insights into why the model is less variable than the experimental data? Does this have to do with the chosen initial condition, i.e. do you think that the experimental variability is due to variation in the geometries of the measured samples?

      Thank you for your insightful observations and questions. The lower model variability is attributed to the larger sample size of model simulations compared to experimental subjects. By running 100 simulations it narrows in the confidence interval (average 2.4 and max 3.3) compared to the experiments that typically had a sample size of less than 15. If the number of simulations had been reduced to 15 the stochasticity within the model results in a larger confidence interval (average 7.1 and max 10). There are also several possible confounding variables in the experimental protocols (i.e. variations in injury, different animal subjects for each timepoint, etc.) that are kept constant in the model simulation. We have added discussion of this point to the manuscript (lines 517519). Future work with the model will examine how variations in conditions, such as initial muscle geometry, injury, etc, alter regeneration outcomes and overall variability. This discussion has been incorporated into lines 640-643.

      (8) Is figure 2B described anywhere in the text? I could not find its description.

      Thank you for pointing that out. We have added a reference for Fig. 2B on line 190.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The model code seems to be available from https://simtk.org/projects/muscle_regen but that website requests member status ("This is a private project. You must be a member to view its contents.") and applying for membership could violate eLife's blind review process. So, this reviewer liked to but couldn't run the model her/himself. To eLife: Can the authors upload their model to a neutral server that reviewers and editors can access anonymously?

      The code has been made publicly available on the following sites:

      SimTK: https://simtk.org/docman/?group_id=2635

      Zendo: https://zenodo.org/records/10403014

      GitHub: https://github.com/mh2uk/ABM-of-Muscle-Regeneration-with-MicrovascularRemodeling

      Line 121 has been updated with the new link and the additional resources were added to lines 654-657.

      (2) The muscle regeneration field typically studies 2D cross-sections and the present model can be well compared to these other 2D models but cells as stochastic and localized sources of diffusible cytokines may yield different cytokine fields in 3D vs. 2D. I would expect more broadened and smoothened cytokine fields (from sources in neighboring cross-sections) than what the 2D model predicts based on sources just within the focus cross-section. Such relations of 2D to 3D should be discussed.

      We thank the reviewer for the excellent suggestions and observations. It has been reported in other Compucell3D models (Sego et al. 2017, DOI:10.1088/17585090/aa6ed4) that the convergence of diffusion solutions between 2D and 3D model configurations had similar outcomes, with the 3D simulations presenting excessive computational cost without contributing any noticeable additional accuracy. Similarly, other cell-based ABMs that incorporate diffusion mechanisms (Marino et al. 2018, DOI:10.3390/computation6040058) have found that 2D and 3D versions of the model both predict the same mechanisms and that the 2D resolution was sufficient for determining outcomes. Lines 615-618 were added to elaborate on this topic.

      (3) Since the model (and title) focuses on "nonlinear" cytokine interactions, what would change if cytokine decay would not be linear (as modeled here) but saturated (with nonlinear Michaelis-Menten kinetics as ligand binding and endocytosis mechanisms would call for)?

      Thank you for raising an intriguing point. The model includes a combination of cytokine decay as well as ligand binding and endocytosis mechanisms that can be saturated. For a cytokine-dependent model behavior to occur the cytokines necessary to induce that action had to reach a minimum threshold. Once that threshold was reached, that amount of the cytokine would be removed at that location to simulate ligand-receptor binding and endocytosis. These ligand binding and endocytosis mechanisms behave in a saturated way, removing a set amount when above a certain threshold or a defined ratio when under the threshold. Lines 313-315 was revised to clarify this point. There were certain concentrations of cytokines where we saw a plateau in outputs likely as a result of reaching a saturation threshold (Supplemental Fig. 3). In future work, more robust mathematical simulation of binding kinetics of cytokines (e.g., using ODEs) could be included.

      (4) Limitations of the model should be discussed together with an outlook for model refinement. For example, fiber alignment and ECM ultrastructure may require anisotropic diffusion. Many of the rate equations could be considered with saturation parameters etc. There are so many model assumptions. Please discuss which would be the most urgent model refinements and, to achieve these, which would be the most informative next experiments to perform.

      We appreciate your thoughtful consideration of the model's limitations and the need for a comprehensive discussion on model refinements and potential future experiments. The future direction section was expanded to discuss additional possible model refinements (lines 635-643) and additional possible experiments for model validation (lines 630-634).

      (5) It is not clear how the single spatial arrangement that is used affects the model predictions. E.g. now the damaged area surrounds the lymphatic vessel but what if the opposite corner was damaged and the lymphatic vessel is deep inside the healthy area?

      Thank you for highlighting the importance of considering different spatial arrangements in the model and its potential impact on predictions. We previously tested model perturbations that included specifying the injury surrounding the lymphatic vessel versus on the side opposite the vessel. Since this paper focuses more on cytokine dynamics, we plan to include this perturbation, along with other injury alterations, in a follow-on paper. We added more context about this in the future efforts section lines 640-643.

      (6) It seems that not only parameter values but also the initial values of most of the model components are unknown. The parameter estimation strategy does not seem to include the initial (spatial) distributions of collagen and cytokines and other model components. Please discuss how other (reasonable) initial values or spatial arrangements will affect model predictions.

      We appreciate your thoughtful consideration of unknown initial values/spatial arrangements and their potential influence on predictions. Initial cytokine levels prior to injury had a low relative concentration compared to levels post injury and were assumed to be negligible. Initial spatial distribution of cytokines was not defined as initial spatial inputs (except in knockout simulations) but are secreted from cells (with baseline resident cell counts defined from the literature). The distribution of cytokines is an emergent behavior that results from the cell behaviors within the model. The collagen distribution is altered in response to clearance of necrosis by the immune cells (decreased collagen with necrosis removal) and subsequent secretion of collagen by fibroblasts. The secretion of collagen from fibroblast was included in the parameter estimation sweep (Supplemental Table 1).

      We are working on further exploring the model sensitivity to altered spatial arrangements and have added this to the future directions section (lines 618-621), as well as provided Supplemental Figure 5 to demonstrate that model outcomes are similar with altered initial spatial arrangements.

      (7) Many details of the CC3D implementation are missing: overall lattice size, interaction neighborhood order, and "temperature" of the Metropolis algorithm. Are the typical adhesion energy terms used in the CPM Hamiltonian and if so, then how are these parameter values estimated?

      Thank you for bringing attention to the missing details regarding the CC3D implementation in our manuscript. We have included supplemental information providing greater detail for CPM implementation (Lines 808-854). We also added two additional supplemental tables for describing the requested CC3D implementation details (Supplemental Table 4) and adhesion energy terms (Supplemental Table 5).

      (8) Extending the model analysis of combinations of altered cytokine properties, which temporal schedules of administration would be of interest, and how could the timing of multiple interventions improve outcomes? Such a discussion or even analysis would further underscore the usefulness of the model.

      In response to your valuable suggestion, lines 558-562 were added to discuss the potential of using the model as a tool to perturb different cytokine combinations at varying timepoints throughout regeneration. In addition, this is also included in future work in lines 636-637.

      (9) The CPM is only weakly motivated, just one sentence on lines 142-145 which mentions diffusion in a misleading way as the CPM just provides cells with a shape and mechanical interactions. The diffusion part is a feature of the hybrid CompuCell3D framework, not the CPM.

      Thank you for bringing up this distinction. We removed the statement regarding diffusion and updated lines 143-146 to focus on CPM representation of cellular behavior and interactions. We also added a reference to supplemental text that includes additional details on CPM.

      (10) On lines 258-261 it does not become clear how the described springs can direct fibroblasts towards areas of low-density collagen ECM. Are the lambdas dependent on collagen density?

      Thank you for highlighting this area for clarification. The fibroblasts form links with low collagen density ECM and then are pulled towards those areas based on a constant lambda value. The links between the fibroblast and the ECM will only be made if the collagen is below a certain threshold. We added additional clarification to lines 260-264.

      (11) On line 281, what does the last part in "Fibers...were regenerating but not fully apoptotic cells" mean? Maybe rephrase this.

      The last of part of that line indicates that there were some fibers surrounding the main injury site that were damaged but still had healthy portions, indicating that they were impacted by the injury and are regenerating but did not become fully apoptotic like the fiber cells at the main site of injury. We rephrased this line to indicate that the nearby fibers were damaged but not fully apoptotic.

      (12) Lines 290-293 describe interactions of cells and fields with localized structures (capillaries and lymphatic vessel). Please explain in more detail how "capillary agents...transport neutrophiles and monocytes" in the CPM model formalism. Are new cells added following rules? How is spatial crowding of the lattice around capillaries affecting these rules? Moreover, how can "lymphatic vessel...drain the nearby cytokines and cells"? How is this implemented in the CPM and how is "nearby" calculated? We appreciate your detailed inquiry into the interactions of cells and fields with localized structures. The neutrophils and monocytes are added to the simulation at the lattice sites above capillaries (within the cell layer Fig. 2B) and undergo chemotaxis up their respective gradients. The recruitment of the neutrophils and monocytes are randomly distributed among the healthy capillaries that do not have an immune cell at the capillary location (a modeling artifact that is a byproduct of only having one cell per lattice site). This approach helped to prevent an abundance of crowding at certain capillaries. Because immune cells in the simulation are sufficiently small, chemotactic gradients are sufficiently large, and the simulation space is sufficiently large, we do not see aggregation of recruited immune cells in the CPM.

      The lymphatic vessel uptakes cytokines at lattice locations corresponding to the lymphatic vessel and will remove cells located in lattice sites neighboring the lymphatic vessel. In addition, we have included a rule in our ABM to encourage cells to migrate towards the lymphatic vessel utilizing CompuCell3D External Potential Plugin. The influence of this rule is inversely proportional to the distance of the cells to the lymphatic vessel.

      We have updated lines 294-298 and 305-309 to include the above explanation.

      (13) Tables 1-4 define migration speeds as agent rules but in the typical CPM, migration speed emerges from random displacements biased by chemotaxis and other effects (like the slope of the cytokine field). How was the speed implemented as a rule while it is typically observable in the model?

      We appreciate your inquiry regarding the implementation of migration speeds. To determine the lambda parameters (Table 7) for each cell type, we tested each in a simplified control simulation with a concentration gradient for the cell to move towards. We tuned the lambda parameters within this simulation until the model outputted cell velocity aligned with the literature reported cell velocity for each cell type (Tables 1-4). We have incorporated clarification on this to lines 177-180.

      (14) Line 312 shows the first equation with number (5), either add eqn. (1-4) or renumber.

      We have revised the equation number.

      (15) Typos: Line 456, "expect M1 cell" should read "except M1 cell".

      Line 452, "thresholds above that diminish fibroblast response (Supplemental Fig 3)." remains unclear, please rephrase.

      Line 473, "at 28." should read "at 28 days.".

      Line 474, is "additive" correct? Was the sum of the individual effects calculated and did that match?

      Line 534, "complexity our model" should read "complexity in our model".

      We have corrected the typos and clarified line 452 (updated line 594) to indicate that the TNF-α concentration threshold results in diminished fibroblast response. We updated terminology line 474 (updated line 512) to indicate that there was a synergistic effect with the combined perturbation.

      (16) Table 7 defines cell target volumes with the same value as their diameter. This enforces a strange cell shape. Should there be brackets to square the value of the cell diameter, e.g. Value=(12µm)^2 ?

      The target volume parameter values were selected to reflect the relative differences in average cell diameter as reported in the literature; however, there are no parameters that directly enforce a diameter for the cells in the CPM formalism separate from the volume. We have observed that these relative cell sizes allow the ABM to effectively reproduce cell behaviors described in the literature. Single cells that are too large in the ABM would be unable to migrate far enough per time step to carry out cell behaviors, and cells that are too small in the CPM would be unstable in the simulation environment and not persist in the simulation when they should. We removed the units for the cell shape values in Table 7 since the target volume is a relative parameter and does not directly represent µm.

      (17) Table 7 gives estimated diffusion constants but they appear to be too high. Please compare them to measured values in the literature, especially for MCP-1, TNF-alpha and IL-10, or relate these to their molecular mass and compare to other molecules like FGF8 (Yu et al. 2009, DOI:10.1038/nature08391).

      We utilized a previously published estimation method (Filion et al. 2004, DOI:10.1152/ajpheart.00205.2004) for estimating cytokine diffusivity within the ECM. This method incorporates the molecular masses and accounts for the combined effects of the collagen fibers and glycosaminoglycans. The paper acknowledged that the estimated value is faster than experimentally determined values, but that this was a result of the less-dense matrix composition which is more reflective of the tissue environment we are simulating in contrast to other reported measurements which were done in different environments. Using this estimation method also allowed us to more consistently define diffusion constants versus using values from the literature (which were often not recorded) that had varied experimental conditions and techniques (such as being in zebrafish embryo Yu et al. 2009, DOI:10.1038/nature08391 as opposed to muscle tissue). This also allowed for recalculation of the diffusivity throughout the simulation as the collagen density changed within the model. Lines 318-326 were updated to help clarify the estimation method.

      (18) Many DOIs in the bibliography (Refs. 7,17,20,31,40,47...153) are wrong and do not resolve because the appended directory names are not allowed in the DOI, just with a journal's URL after resolution.

      Thank you for bringing this to our attention. The incorrect DOIs have been corrected.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments:

      (9) On line 174, the authors say "We used the CC3D feature Flip2DimRatio to control the number of times the Cellular-Potts algorithm runs per mcs." What does this mean? Isn't one monte carlo timestep one iteration of the Cellular Potts model? How does this relate to physical timescales?

      We appreciate your attention to detail and thoughtful question regarding the statement about the use of the CC3D feature Flip2DimRatio. Lines 175-177 were revised to simplify the meaning of Flip2DimRatio. That parameter alters the number of times the Cellular-Potts algorithm is run, which is the limiting factor for cell movement. The physical timescale is kept to a 15-minute timestep but a high Flip2DimRatio allows more flexibility and stability to allow the cells to move faster in one timestep.

      (10) Has the costum matlab script to process histology images into initial conditions been made available?

      The Matlab script along with CC3D code for histology initialization with documentation has been made available with the source code on the following sites:

      SimTK: https://simtk.org/docman/?group_id=2635

      Zendo: https://zenodo.org/records/10403014

      GitHub: https://github.com/mh2uk/ABM-of-Muscle-Regeneration-with-MicrovascularRemodeling

      (11) Equation 5 is provided without a reference or derivation. Where does it come from and what does it mean?

      Thank you for highlighting the diffusion equation and seeking clarification on its origin and significance. Lines 318-326 were revised to clarify where the equation comes from. This is a previously published estimation method that we applied to calculate the diffusivity of the cytokines considering both collagen and glycosaminoglycans.

      (12) Line 326: "For CSA, experimental fold-change from pre-injury was compared with fold-change in model-simulated CSA". Does this step rely on the assumption that the fold change will not depend on the CSA? If so, is this something that is experimentally known, or otherwise, can it be confirmed by simulations?

      We appreciate the opportunity to clarify our rationale. The fold change was used as a method to normalize the model and experiment so that they could be compared on the same scale. Yes, this step relies on the assumption that fold change does not depend on pre-injury CSA. Experimentally it is difficult to determine the impact of initial fiber morphology on altered regeneration time course. This fold-change allows us to compare percent recovery which is a common metric utilized to assess muscle regeneration outcomes experimentally. Line 340-343 was revised to clarify.

      (13) Line 355: "The final passing criteria were set to be within 1 SD for CSA recovery and 2.5 SD for SSC and fibroblast count" Does this refer to the experimental or the simulated SD?

      The model had to fit within those experimental SD. Lines 371-372 was edited to specify that we are referring the experimental SD.

      (14) "Following 8 iterations of narrowing the parameter space with CaliPro, we reached a set that had fewer passing runs than the previous iteration". Wouldn't one expect fewer passing runs with any narrowing of the parameter space? Why was this chosen as the stopping criterion for further narrowing?

      We appreciate your observation regarding the statement about narrowing the parameter space with CaliPro. We started with a wide parameter space, expecting that certain parameters would give outputs that fall outside of the comparable data. So, when the parameter space was narrowed to enrich parts that give passing output, initially the number of passing simulations increased.

      Once we have narrowed the set of possible parameters into an ideal parameter space, further narrowing will cut out viable parameters resulting in fewer passing runs. Therefore, we stopped narrowing once any fewer simulations passed the criteria that they had previously passed with the wider parameter set. Lines 375-379 have been updated to clarify this point.

      (15) Line 516: 'Our model could test and optimize combinations of cytokines, guiding future experiments and treatments." It is my understanding that this is communicated as a main strength of the model. Would it be possible to demonstrate that the sentence is true by using the model to make actual predictions for experiments or treatments?

      This is demonstrated by the combined cytokine alterations in Figure 7 and discussed in lines 509-513. We have also added in a suggested experiment to test the model prediction in lines 691-695.

      (16) Line 456, typo: I think 'expect' should be 'except'.

      Thank you for pointing that out. The typo has been corrected.

    1. All roads lead to progress.

      for - key insight - all roads lead to progress - progress trap - Prometheus complex - impuslive urge to invent

      Comment - This is fleshed out in the final three paragraphs of this article - I disagree with the closing sentence, however

      • “It’s not possible [to avoid invention],
        • because all knowledge is interconnected like a web,” Carlin told Big Think.
      • “If you walled off a certain part of it because you saw the potential downside,
        • you would get to the same outcome sort of in a roundabout way, right?
      • The connections might not be direct, like saying, ‘Oh, I see nuclear weapons in the distance; let’s go there,’

        • but we would go through the back door, and eventually we would discover everything around that thing.”
      • To bring Carlin’s analogy home,

        • we can think about the idea of artificial general intelligence, or AGI.
      • AGI is the point at which AI can perform a wide variety of tasks so competently
        • that it matches or exceeds human intelligence and performance.
      • Some people might see AGI as dangerous.
      • Others may see AGI as the savior of humanity.
      • But while we have debates and conversations,
        • we’re still marching toward AGI.
      • Scientists and programmers behind their computers are
        • solving “everything around that thing.”
      • Our hands and our brains will,
        • perhaps unconsciously,
      • drift toward the very thing we’re debating if we should do.

      • The Prometheus complex can be seen over and over again

        • in the history of science.
      • It is not simply that Edenic urge to eat the fruit or push the red button.
      • It’s the fact that
        • as the rational, intellectual part of ourselves wrestles with the decision,
        • a deeper, Promethean part of ourselves has pressed it already.
      • Thankfully, it usually turns out okay.

      comment - I disagree with the last line - If the meta-poly-perma-crisis is what is meant by "OK", then it is a very distorted use of that word. - Rather, this Promethian way of thinking and act - compounded over the lifetime of human civilization - is EXACTLY what has brought us to the brink of civilizational disaster - and it may not turn out to be "ok"!

    1. We often think of software development as a ticket-in-code-out business but this is really only a very small portion of the entire thing. Completely independently of the work done as a programmer, there exists users with different jobs they are trying to perform, and they may or may not find it convenient to slot our software into that job. A manager is not necessarily the right person to evaluate how good a job we are doing because they also exist independently of the user–software–programmer network, and have their own sets of priorities which may or may not align with the rest of the system.

      Software development as a conversation

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The authors collected genomic information from public sources covering 423 eukaryote genomes and around 650 prokaryote genomes. Based on pre-computed CDS annotation, they estimated the frequency of alternative splicing (AS) as a single average measure for each genome and computed correlations with this measure and other genomic properties such as genome size, percentage of coding DNA, gene and intergenic span, etc. They conclude that AS frequency increases with genome complexity in a somewhat directional trend from "lower" organisms to "higher" organisms.

      Strengths:

      The study covers a wide range of taxonomic groups, both in prokaryotes and eukaryotes.

      Weaknesses:

      The study is weak both methodologically and conceptually. Current high throughput sequencing technologies, coupled with highly heterogeneous annotation methods, can observe cases of AS with great sensitivity, and one should be extremely cautious of the biases and rates of false positives associated with these methods. These issues are not addressed in the manuscript. Here, AS measures seem to be derived directly from CDS annotations downloaded from public databases, and do not account for differing annotation methods or RNA sequencing depth and tissue sample diversity.

      We are aware of the bias that may exist in annotation files. Since the source of noise can be highly variable, we have assumed that most of the data has a similar bias. However, we agree with the reviewer that we could perform some analysis to test for these biases and their association to different methodologies. Thus, we will measure the uncertainty present in the data. From one side, we will be more explicit about the data limitations and the biases it can generate in the results. On the other side, while analyzing the false positives in the data is out of our scope, we will perform a statistical test to detect possible biases regarding different methods of sequencing and annotation, and types of organisms (model or non-model organisms). If positive, we will proceed, as far as possible, to normalize the data or to estimate a confidence interval.

      Here, AS measures seem to be derived directly from CDS annotations downloaded from public databases, and do not account for differing annotation methods or RNA sequencing depth and tissue sample diversity.

      Beyond taking into account the differential bias that may exist in the data, we do not consider that our AS measure is problematic. The NCBI database is one of the most reliable databases that we have to date and is continuously updated from all scientific community. So, the use of this data and the corresponding procedures for deriving the AS measure are perfectly acceptable for a comparative analysis on such a huge global scale. Furthermore, the proposal of a new genome-level measure of AS that allows to compare species spanning the whole tree of life is part of the novelty of the study. We understand that small-scale studies require a high specificity about the molecular processes involved in the study. However, this is not the case, where we are dealing with a large-scale problem. On the other side, as we have previously mention, we agree with the reviewer to analyze the degree of uncertainty in the data to better interpret the results.

      There is no mention of the possibility that AS could be largely caused by random splicing errors, a possibility that could very well fit with the manuscript's data. Instead, the authors adopt early on the view that AS is regulated and functional, generally citing outdated literature.

      There is no question that some AS events are functional, as evidenced by strongly supported studies. However, whether all AS events are functional is questionable, and the relative fractions of functional and non-functional AS are unknown. With this in mind, the authors should be more cautious in interpreting their data.

      Many studies suggest that most of the AS events observed are the result of splicing errors and are therefore neither functional nor conserved. However, we still have limited knowledge about the functionality of AS. Just because we don’t have a complete understanding of its functionality, doesn’t mean there isn’t a fundamental cause behind these events. AS is a highly dynamic process that can be associated with processes of a stochastic nature that are fundamental for phenotypic diversity and innovation. This is one of the reasons why we do not get into a discussion about the functionality of AS and consider it as a potential measure of biological innovation. Nevertheless, we agree with the reviewer’s comments, so we will add a discussion about this issue with updated literature and look at any possible misinterpretation of the results.

      The "complexity" of organisms also correlates well (negatively) with effective population size. The power of selection to eliminate (slightly) deleterious mutations or errors decreases with effective population size. The correlation observed by the authors could thus easily be explained by a non-adaptive interpretation based on simple population genetics principles.

      We appreciate the observation of the reviewer. We know well the M. Lynch’s theory on the role of the effective population size and its eventual correlation with genomic parameters, but we want to emphasize that our objective is not to find an adaptive or non-adaptive explanation of the evolution of AS, but rather to reveal it. Nevertheless, as the reviewer suggests, we will look at the correlation between the AS and the effective population size and discuss about a possible non-adaptive interpretation.

      The manuscript contains evidence that the authors might benefit from adopting a more modern view of how evolution proceeds. Sentences such as "... suggests that only sophisticated organisms optimize alternative splicing by increasing..." (L113), or "especially in highly evolved groups such as mammals" (L130), or the repeated use of "higher" and "lower" organisms need revising.

      As the reviewer suggests, we will proceed with the corresponding linguistic corrections.

      Because of the lack of controls mentioned above, and because of the absence of discussion regarding an alternative non-adaptive interpretation, the analyses presented in the manuscript are of very limited use to other researchers in the field. In conclusion, the study does not present solid conclusions.

      Reviewer #2 (Public Review):

      Summary:

      In this contribution, the authors investigate the degree of alternative splicing across the evolutionary tree and identify a trend of increasing alternative splicing as you move from the base of the tree (here, only prokaryotes are considered) towards the tips of the tree. In particular, the authors investigate how the degree of alternative splicing (roughly speaking, the number of different proteins made from a single ORF (open reading frame) via alternative splicing) relates to three genomic variables: the genome size, the gene content (meaning the fraction of the genome composed of ORFs), and finally, the coding percentage of ORFs, meaning the ratio between exons and total DNA in the ORF. When correlating the degree of alternative splicing with these three variables, they find that the different taxonomic groups have a different correlation coefficient, and identify a "progressive pattern" among metazoan groups, namely that the correlation coefficient mostly increases when moving from flowering plants to arthropods, fish, birds, and finally mammals. They conclude that therefore the amount of splicing that is performed by an organismal group could be used as a measure of its complexity.

      Weaknesses:

      While I find the analysis of alternative splicing interesting, I also find that it is a very imperfect measure of organismal complexity and that the manuscript as a whole is filled with unsupported statements. First, I think it is clear to anyone studying evolution over the tree of life that it is the complexity of gene regulation that is at the origin of much of organismal structural and behavioral complexity. Arguably, creating different isoforms out of a single ORF is just one example of complex gene regulation. However, the complexity of gene regulation is barely mentioned by the authors.

      We disagree with the reviewer with that our measure of AS is imperfect. Just as we responded to the first reviewer, we will quantify the uncertainty in the data and correct for differential biases caused by annotation and sequencing methods. Thus, beyond correcting relevant biases in the data, we consider that our measure is adequate for a comparative analysis at a global scale. A novelty of our study is the proposal of a genome-level measure of AS that takes into account data from the entire scientific community. 

      We want also to emphasize that we assume from the beginning that AS may reflect some kind of biological complexity, it is not a conclusion from the results. An argument in favor of such an assumption is that AS is associated with stochastic processes that are fundamental for phenotypic diversity and innovation. Of course, we agree with the reviewer that it is not the only mechanism behind biological complexity, so we will emphasize it in the manuscript. On the other side, we will be more explicit about the assumptions and objectives, and will correct any unsupported statement.

      Further, it is clear that none of their correlation coefficients actually show a simple trend (see Table 3). According to these coefficients, birds are more complex than mammals for 3 out of 4 measures.

      An evolutionary trend is broadly defined as the gradual change in some characteristic of organisms as they evolve or adapt to a specific environment. Under our context, we define an evolutionary trend as the gradual change in genome composition and its association with AS across the main taxonomic groups. If we look at Figure 4 and Table 3 we can conclude that there is a progressive trend. We will be more precise about how we define an evolutionary trend and correct any possible misinterpretation of the results. On the other side, we do not assume that mammals should be more complex than birds. First, we will emphasize that our results show that birds have the highest values of such a trend. Second, after reading the reviewer’s comments, we have decided that we will perform an additional analysis to correct for differences in the taxonomic group sizes, which will allow us to have more confidence in the results.

      It is also not clear why the correlation coefficient between alternative splicing ratio and genome length, gene content, and coding percentage should display such a trend, rather than the absolute value. There are only vague mechanistic arguments.

      The study analyzes the relationship of AS with genomic composition for the large taxonomic groups. We assume that significant differences in these relationships are indicators of the presence of different mechanisms of genome evolution. However, we agree with the reviewer that a correlation does not imply a causal relation, so we will be more cautious when interpreting the results.

      To quantify the relationships we use correlation coefficients, the slopes of such correlations, and the relation of variability. Although the absolute values of AS are also illustrated in Table 4, we consider that they are less informative than if we include how it relates to the genomic composition. For example, we observe that plants have a different genome composition and relation with AS if compared to animals, which suggest that they follow different mechanisms of genome evolution. On the other hand, we observe a trend in animals, where high values of AS are associated to a large percentage of introns and a percentage of intergenic DNA of about the 50% of genomes.

      Much more troubling, however, is the statement that the data supports "lineage-specific trends" (lines 299-300). Either this is just an ambiguous formulation, or the authors claim that you can see trends *within* lineages.

      We agree with the reviewer that this statement is not correct, so we will proceed to correct it.

      The latter is clearly not the case. In fact, within each lineage, there is a tremendous amount of variation, to such an extent that many of the coefficients given in Table 3 are close to meaningless. Note that no error bars or p-values are presented for the values shown in Table 3. Figure 2 shows the actual correlation, and the coefficient for flowering plants there is given as 0.151, with a p-value of 0.193. Table 3 seems to quote r=0.174 instead. It should be clear that a correlation within a lineage or species is not a sign of a trend.

      The reviewer is not understanding correctly the results in Table 3. It is precisely the variation of the genome variables what we are measuring. Given the standardization of these values by the mean values, we have proceeded to compare the variability between groups, which is the result shown in Table 3. In this case there are no error bars or p-values associated. On the other hand, we agree that a correlation is not a sign of a trend. But the relations of variability, together with the results obtained in Figure 3, are indicators of a trend. As we mentioned before, we will proceed to analyze whether the variation in the group sizes is causing a bias in the results.

      There are several wrong or unsupported statements in the manuscript. Early on, the authors state that the alternative splicing ratio (a number greater or equal to one that can be roughly understood as the number of different isoforms per ORF) "quantifies the number of different isoforms that can be transcribed using the same amount of information" (lines 51-52). But in many cases, this is incorrect, because the same sequence can represent different amounts of information depending on the context. So, if a changed context gives rise to a different alternative splice, it is because the genetic sequence has a different meaning in the changed context: the information has changed.

      We agree that there are not well supported statements, so we will proceed to revise them.

      In line 149, the authors state that "the energetic cost of having large genomes is high". No citation is given, and while such a statement seems logical, it does not have very solid support.

      We will also revise the bibliography and support our statements with updated references.

      If there was indeed a strong selective force to reduce genome size, we would not see the stunning diversity of genome sizes even within lineages. This statement is repeated (without support) several times in the manuscript, apparently in support of the idea that mammals had "no choice" to increase complexity via alternative splicing because they can't increase it by having longer genomes. I don't think this reasoning can be supported.

      We agree with the reviewer in this issue, so we will carefully revise the statements that indirectly (or directly) assume the action of selective forces on the genome composition.

      Even more problematic is the statement that "the amount of protein-coding DNA seems to be limited to a size of about 10MB" (line 219). There is no evidence whatsoever for this statement.

      In Figure 1A we observe a one-to-one relationship between the genome size and the amount of coding. However, in multicellular organisms, although the genome size increases we observe that the amount of coding does not increase by more than 10Mb, which suggest the presence of some genomic limitation. Of course, this is not an absolute or general statement, but rather a suggestion. We are only describing our results.

      The reference that is cited (Choi et al 2020) suggests that there is a maximum of 150GB in total genome size due to physiological constraints. In lines 257-258, the authors write that "plants are less restricted in terms of storing DNA sequences compared to animals" (without providing evidence or a citation).

      We will revise the bibliography and add updated references.

      I believe this statement is made due to the observation that plants tend to have large intergenic regions. But without examining the functionality of these interagency regions (they might host long non-coding RNA stretches that are used to regulate the expression of other genes, for example) it is quite adventurous to use such a simple measure as being evidence that plants "are less restricted in terms of storing DNA sequences", whatever that even means. I do not think the authors mean that plants have better access to -80 freezers. The authors conclude that "plant's primary mechanism of genome evolution is by expanding their genome". This statement itself is empty: we know that plants are prone to whole genome duplication, but this duplication is not, as far as we understand, contributing to complexity. It is not a "primary mechanism of genome evolution".

      We will revise these statements.

      In lines 293-294, the authors claim that "alternative splicing is maximized in mammalian genomes". There is no evidence that this ratio cannot be increased. So, to conclude (on lines 302-303) that alternative splicing ratios are "a potential candidate to quantify organismal complexity" seems, based on this evidence, both far-fetched and weak at the same time.

      Our results show the highest values of AS in mammals, but we understand that the results are limited to the availability and accuracy of data, which we will emphasize in the manuscript. As we previously mention, we will also proceed to analyze the uncertainty in data and carry out the appropriate corrections.

      I am also not very comfortable with the data analysis. The authors, for example, say that they have eliminated from their analysis a number of "outlier species". They mention one: Emmer wheat because it has a genome size of 900 Mb (line 367). Since 900MB does not appear to be extreme, perhaps the authors meant to write 900 Gb. When I consulted the paper that sequenced Triticum dicoccoides, they noted that 14 chromosomes are about 10GB. Even a tetraploid species would then not be near 900Gb. But more importantly, such a study needs to state precisely which species were left out, and what the criteria are for leaving out data, lest they be accused of selecting data to fit their hypothesis.

      The reviewer is right, we wanted to say 900Mb, which is approximately 7.2Gb. We had a mistake of nomenclature. This value is extreme compared to the typical values, so it generates large deviations when applying measures of central tendency and dispersion. We want to obtain mean values that are representative of the most species composing the taxonomic groups, so we find appropriate to exclude all outlier values in the study. Nevertheless, we will specify the criteria that we have used to select the data in a rigorous way.

      I understand that Methods are often put at the end of a manuscript, but the measures discussed here are so fundamental to the analysis that a brief description of what the different measures are (in particular, the "alternative splicing ratio") should be in the main text, even when the mathematical definition can remain in the Methods.

      We agree with the reviewer, so we will add a brief description of the genomic variables at the beginning of the Results section.

      Finally, a few words on presentation. I understand that the following comments might read differently after the authors change their presentation. This manuscript was at the border of being comprehensible. In many cases, I could discern the meaning of words and sentences in contexts but sometimes even that failed (as an example above, about "species-specific trends", illustrates). The authors introduced jargon that does not have any meaning in the English language, and they do this over and over again.

      Note that I completely agree with all the comments by the other reviewer, who alerted me to problems I did not catch, including the possible correlation with effective population size: a possible non-adaptive explanation for the results.

    2. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors collected genomic information from public sources covering 423 eukaryote genomes and around 650 prokaryote genomes. Based on pre-computed CDS annotation, they estimated the frequency of alternative splicing (AS) as a single average measure for each genome and computed correlations with this measure and other genomic properties such as genome size, percentage of coding DNA, gene and intergenic span, etc. They conclude that AS frequency increases with genome complexity in a somewhat directional trend from "lower" organisms to "higher" organisms.

      Strengths:

      The study covers a wide range of taxonomic groups, both in prokaryotes and eukaryotes.

      Weaknesses:

      The study is weak both methodologically and conceptually. Current high throughput sequencing technologies, coupled with highly heterogeneous annotation methods, can observe cases of AS with great sensitivity, and one should be extremely cautious of the biases and rates of false positives associated with these methods. These issues are not addressed in the manuscript.

      We are aware of the bias that may exist in annotation files. Since the source of noise can be highly variable, we have assumed that most of the data has a similar bias. However, we agree with the reviewer that we could perform some analysis to test for these biases and their association to different methodologies. Thus, we will measure the uncertainty present in the data. From one side, we will be more explicit about the data limitations and the biases it can generate in the results. On the other side, while analyzing the false positives in the data is out of our scope, we will perform a statistical test to detect possible biases regarding different methods of sequencing and annotation, and types of organisms (model or non-model organisms). If positive, we will proceed, as far as possible, to normalize the data or to estimate a confidence interval.

      Here, AS measures seem to be derived directly from CDS annotations downloaded from public databases, and do not account for differing annotation methods or RNA sequencing depth and tissue sample diversity.

      Beyond taking into account the differential bias that may exist in the data, we do not consider that our AS measure is problematic. The NCBI database is one of the most reliable databases that we have to date and is continuously updated from all scientific community. So, the use of this data and the corresponding procedures for deriving the AS measure are perfectly acceptable for a comparative analysis on such a huge global scale. Furthermore, the proposal of a new genome-level measure of AS that allows to compare species spanning the whole tree of life is part of the novelty of the study. We understand that small-scale studies require a high specificity about the molecular processes involved in the study. However, this is not the case, where we are dealing with a large-scale problem. On the other side, as we have previously mention, we agree with the reviewer to analyze the degree of uncertainty in the data to better interpret the results.

      There is no mention of the possibility that AS could be largely caused by random splicing errors, a possibility that could very well fit with the manuscript's data. Instead, the authors adopt early on the view that AS is regulated and functional, generally citing outdated literature.

      There is no question that some AS events are functional, as evidenced by strongly supported studies. However, whether all AS events are functional is questionable, and the relative fractions of functional and non-functional AS are unknown. With this in mind, the authors should be more cautious in interpreting their data.

      Many studies suggest that most of the AS events observed are the result of splicing errors and are therefore neither functional nor conserved. However, we still have limited knowledge about the functionality of AS. Just because we don’t have a complete understanding of its functionality, doesn’t mean there isn’t a fundamental cause behind these events. AS is a highly dynamic process that can be associated with processes of a stochastic nature that are fundamental for phenotypic diversity and innovation. This is one of the reasons why we do not get into a discussion about the functionality of AS and consider it as a potential measure of biological innovation. Nevertheless, we agree with the reviewer’s comments, so we will add a discussion about this issue with updated literature and look at any possible misinterpretation of the results.

      The "complexity" of organisms also correlates well (negatively) with effective population size. The power of selection to eliminate (slightly) deleterious mutations or errors decreases with effective population size. The correlation observed by the authors could thus easily be explained by a non-adaptive interpretation based on simple population genetics principles.

      We appreciate the observation of the reviewer. We know well the M. Lynch’s theory on the role of the effective population size and its eventual correlation with genomic parameters, but we want to emphasize that our objective is not to find an adaptive or non-adaptive explanation of the evolution of AS, but rather to reveal it. Nevertheless, as the reviewer suggests, we will look at the correlation between the AS and the effective population size and discuss about a possible non-adaptive interpretation.

      The manuscript contains evidence that the authors might benefit from adopting a more modern view of how evolution proceeds. Sentences such as "... suggests that only sophisticated organisms optimize alternative splicing by increasing..." (L113), or "especially in highly evolved groups such as mammals" (L130), or the repeated use of "higher" and "lower" organisms need revising.

      As the reviewer suggests, we will proceed with the corresponding linguistic corrections.

      Because of the lack of controls mentioned above, and because of the absence of discussion regarding an alternative non-adaptive interpretation, the analyses presented in the manuscript are of very limited use to other researchers in the field. In conclusion, the study does not present solid conclusions.

      Reviewer #2 (Public Review):

      Summary:

      In this contribution, the authors investigate the degree of alternative splicing across the evolutionary tree and identify a trend of increasing alternative splicing as you move from the base of the tree (here, only prokaryotes are considered) towards the tips of the tree. In particular, the authors investigate how the degree of alternative splicing (roughly speaking, the number of different proteins made from a single ORF (open reading frame) via alternative splicing) relates to three genomic variables: the genome size, the gene content (meaning the fraction of the genome composed of ORFs), and finally, the coding percentage of ORFs, meaning the ratio between exons and total DNA in the ORF. When correlating the degree of alternative splicing with these three variables, they find that the different taxonomic groups have a different correlation coefficient, and identify a "progressive pattern" among metazoan groups, namely that the correlation coefficient mostly increases when moving from flowering plants to arthropods, fish, birds, and finally mammals. They conclude that therefore the amount of splicing that is performed by an organismal group could be used as a measure of its complexity.

      Weaknesses:

      While I find the analysis of alternative splicing interesting, I also find that it is a very imperfect measure of organismal complexity and that the manuscript as a whole is filled with unsupported statements. First, I think it is clear to anyone studying evolution over the tree of life that it is the complexity of gene regulation that is at the origin of much of organismal structural and behavioral complexity. Arguably, creating different isoforms out of a single ORF is just one example of complex gene regulation. However, the complexity of gene regulation is barely mentioned by the authors.

      We disagree with the reviewer with that our measure of AS is imperfect. Just as we responded to the first reviewer, we will quantify the uncertainty in the data and correct for differential biases caused by annotation and sequencing methods. Thus, beyond correcting relevant biases in the data, we consider that our measure is adequate for a comparative analysis at a global scale. A novelty of our study is the proposal of a genome-level measure of AS that takes into account data from the entire scientific community.

      We want also to emphasize that we assume from the beginning that AS may reflect some kind of biological complexity, it is not a conclusion from the results. An argument in favor of such an assumption is that AS is associated with stochastic processes that are fundamental for phenotypic diversity and innovation. Of course, we agree with the reviewer that it is not the only mechanism behind biological complexity, so we will emphasize it in the manuscript. On the other side, we will be more explicit about the assumptions and objectives, and will correct any unsupported statement.

      Further, it is clear that none of their correlation coefficients actually show a simple trend (see Table 3). According to these coefficients, birds are more complex than mammals for 3 out of 4 measures.

      An evolutionary trend is broadly defined as the gradual change in some characteristic of organisms as they evolve or adapt to a specific environment. Under our context, we define an evolutionary trend as the gradual change in genome composition and its association with AS across the main taxonomic groups. If we look at Figure 4 and Table 3 we can conclude that there is a progressive trend. We will be more precise about how we define an evolutionary trend and correct any possible misinterpretation of the results. On the other side, we do not assume that mammals should be more complex than birds. First, we will emphasize that our results show that birds have the highest values of such a trend. Second, after reading the reviewer’s comments, we have decided that we will perform an additional analysis to correct for differences in the taxonomic group sizes, which will allow us to have more confidence in the results.

      It is also not clear why the correlation coefficient between alternative splicing ratio and genome length, gene content, and coding percentage should display such a trend, rather than the absolute value. There are only vague mechanistic arguments.

      The study analyzes the relationship of AS with genomic composition for the large taxonomic groups. We assume that significant differences in these relationships are indicators of the presence of different mechanisms of genome evolution. However, we agree with the reviewer that a correlation does not imply a causal relation, so we will be more cautious when interpreting the results.

      To quantify the relationships we use correlation coefficients, the slopes of such correlations, and the relation of variability. Although the absolute values of AS are also illustrated in Table 4, we consider that they are less informative than if we include how it relates to the genomic composition. For example, we observe that plants have a different genome composition and relation with AS if compared to animals, which suggest that they follow different mechanisms of genome evolution. On the other hand, we observe a trend in animals, where high values of AS are associated to a large percentage of introns and a percentage of intergenic DNA of about the 50% of genomes.

      Much more troubling, however, is the statement that the data supports "lineage-specific trends" (lines 299-300). Either this is just an ambiguous formulation, or the authors claim that you can see trends within lineages.

      We agree with the reviewer that this statement is not correct, so we will proceed to correct it.

      The latter is clearly not the case. In fact, within each lineage, there is a tremendous amount of variation, to such an extent that many of the coefficients given in Table 3 are close to meaningless. Note that no error bars or p-values are presented for the values shown in Table 3. Figure 2 shows the actual correlation, and the coefficient for flowering plants there is given as 0.151, with a p-value of 0.193. Table 3 seems to quote r=0.174 instead. It should be clear that a correlation within a lineage or species is not a sign of a trend.

      The reviewer is not understanding correctly the results in Table 3. It is precisely the variation of the genome variables what we are measuring. Given the standardization of these values by the mean values, we have proceeded to compare the variability between groups, which is the result shown in Table 3. In this case there are no error bars or p-values associated. On the other hand, we agree that a correlation is not a sign of a trend. But the relations of variability, together with the results obtained in Figure 3, are indicators of a trend. As we mentioned before, we will proceed to analyze whether the variation in the group sizes is causing a bias in the results.

      There are several wrong or unsupported statements in the manuscript. Early on, the authors state that the alternative splicing ratio (a number greater or equal to one that can be roughly understood as the number of different isoforms per ORF) "quantifies the number of different isoforms that can be transcribed using the same amount of information" (lines 51-52). But in many cases, this is incorrect, because the same sequence can represent different amounts of information depending on the context. So, if a changed context gives rise to a different alternative splice, it is because the genetic sequence has a different meaning in the changed context: the information has changed.

      We agree that there are not well supported statements, so we will proceed to revise them.

      In line 149, the authors state that "the energetic cost of having large genomes is high". No citation is given, and while such a statement seems logical, it does not have very solid support.

      We will also revise the bibliography and support our statements with updated references.

      If there was indeed a strong selective force to reduce genome size, we would not see the stunning diversity of genome sizes even within lineages. This statement is repeated (without support) several times in the manuscript, apparently in support of the idea that mammals had "no choice" to increase complexity via alternative splicing because they can't increase it by having longer genomes. I don't think this reasoning can be supported.

      We agree with the reviewer in this issue, so we will carefully revise the statements that indirectly (or directly) assume the action of selective forces on the genome composition.

      Even more problematic is the statement that "the amount of protein-coding DNA seems to be limited to a size of about 10MB" (line 219). There is no evidence whatsoever for this statement.

      In Figure 1A we observe a one-to-one relationship between the genome size and the amount of coding. However, in multicellular organisms, although the genome size increases we observe that the amount of coding does not increase by more than 10Mb, which suggest the presence of some genomic limitation. Of course, this is not an absolute or general statement, but rather a suggestion. We are only describing our results.

      The reference that is cited (Choi et al 2020) suggests that there is a maximum of 150GB in total genome size due to physiological constraints. In lines 257-258, the authors write that "plants are less restricted in terms of storing DNA sequences compared to animals" (without providing evidence or a citation).

      We will revise the bibliography and add updated references.

      I believe this statement is made due to the observation that plants tend to have large intergenic regions. But without examining the functionality of these interagency regions (they might host long non-coding RNA stretches that are used to regulate the expression of other genes, for example) it is quite adventurous to use such a simple measure as being evidence that plants "are less restricted in terms of storing DNA sequences", whatever that even means. I do not think the authors mean that plants have better access to -80 freezers. The authors conclude that "plant's primary mechanism of genome evolution is by expanding their genome". This statement itself is empty: we know that plants are prone to whole genome duplication, but this duplication is not, as far as we understand, contributing to complexity. It is not a "primary mechanism of genome evolution".

      We will revise these statements.

      In lines 293-294, the authors claim that "alternative splicing is maximized in mammalian genomes". There is no evidence that this ratio cannot be increased. So, to conclude (on lines 302-303) that alternative splicing ratios are "a potential candidate to quantify organismal complexity" seems, based on this evidence, both far-fetched and weak at the same time.

      Our results show the highest values of AS in mammals, but we understand that the results are limited to the availability and accuracy of data, which we will emphasize in the manuscript. As we previously mention, we will also proceed to analyze the uncertainty in data and carry out the appropriate corrections.

      I am also not very comfortable with the data analysis. The authors, for example, say that they have eliminated from their analysis a number of "outlier species". They mention one: Emmer wheat because it has a genome size of 900 Mb (line 367). Since 900MB does not appear to be extreme, perhaps the authors meant to write 900 Gb. When I consulted the paper that sequenced Triticum dicoccoides, they noted that 14 chromosomes are about 10GB. Even a tetraploid species would then not be near 900Gb. But more importantly, such a study needs to state precisely which species were left out, and what the criteria are for leaving out data, lest they be accused of selecting data to fit their hypothesis.

      The reviewer is right, we wanted to say 900Mb, which is approximately 7.2Gb. We had a mistake of nomenclature. This value is extreme compared to the typical values, so it generates large deviations when applying measures of central tendency and dispersion. We want to obtain mean values that are representative of the most species composing the taxonomic groups, so we find appropriate to exclude all outlier values in the study. Nevertheless, we will specify the criteria that we have used to select the data in a rigorous way.

      I understand that Methods are often put at the end of a manuscript, but the measures discussed here are so fundamental to the analysis that a brief description of what the different measures are (in particular, the "alternative splicing ratio") should be in the main text, even when the mathematical definition can remain in the Methods.

      We agree with the reviewer, so we will add a brief description of the genomic variables at the beginning of the Results section.

      Finally, a few words on presentation. I understand that the following comments might read differently after the authors change their presentation. This manuscript was at the border of being comprehensible. In many cases, I could discern the meaning of words and sentences in contexts but sometimes even that failed (as an example above, about "species-specific trends", illustrates). The authors introduced jargon that does not have any meaning in the English language, and they do this over and over again.

      Note that I completely agree with all the comments by the other reviewer, who alerted me to problems I did not catch, including the possible correlation with effective population size: a possible non-adaptive explanation for the results.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Specific comments to improve the quality of the work:

      (1) The choice of subunits to tag are really not ideal. In the available structures of the human proteasome, The C-terminus of Rpn3/PSMD3 points directly toward the ATPase pore and is likely to disrupt the structure and/or dynamics of the proteasome during proteolysis (see comments regarding controls for functionality below). Similarly, the C-terminal tail of Rpt1/PSMC2 has a key role in the opening of the 20S core particle gate for substrate translocation and processing (see 2018 Nature Communications, 9:1360 and 2018 Cell Reports 24:1301-1315), and Alpha3/PSMA4 can be substituted by a second copy of Alpha4/PSMA7 in some conditions (although tagging Alpha3/PSMA4 would admittedly provide a picture of the canonical proteasome interactome while actively excluding the interactome of the non-canonical proteasomes that form via replacement of Alpha3/PSMA4). Comparison of these cell lines with lines harboring tags on subunits that are commonly used for tagging in the field because of a lack of impacts, such as the N-terminus of Rpn1/PSMD2, the C-terminus of Rpn11/PSMD14, and the C-terminus of Beta4/PSMB2 would help instill confidence that the interactome reported largely arises from mature, functional proteasomes rather than subcomplexes, defective proteasomes, or other species that may occur due to tagging at these positions.

      We thank the reviewer for pointing this out. The original purpose of our strategy was to establish proximity labeling of proteasomes to enable applications both in cell culture and in vivo. The choice of PSMA4 and PSMC2 was dictated by previous successful tagging with GFP in mammalian cells (Salomons et al., Exp Cell Res 2010)(Bingol and Schuman, Nature 2006). However, the choice of C-terminal PSMC2 might have been not optimal. HEK293 cells overexpressing PSMC2-BirA show slower growth and the BioID data retrieve higher enrichment of assembly factors suggesting slower assembly of this fusion protein in proteasome. Although we did not observe a negative impact on overall proteasome activity and PSMC2-BirA was (at least in part) incorporated into fully assembled proteasomes as indicated by enrichment of 20S proteins.We apologize for not making it clear that we labeled the N-terminus of PSMD3/Rpn3 and not the C-terminus (Figure 1a and S1a). Therefore, we included in Figure S1a of the revised manuscript structures of the proteasome where the tagged subunit termini are highlighted: C-terminus for PSMA4 and PSMC2 and N-terminus for PSMD3. Additionally, we would like to point out that, differently from PSMC2-BirA, cells expressing BirA-PSMD3 did not show slower growth, and BioID data showed a more homogenous enrichment of both 19S and 20S proteins, as compared to PSMC2-BirA (Figure 1D and 1E). However, the overall level of enrichment of proteasome subunits was not comparable to PSMA4-BirA and, therefore, we opted for focusing the rest of the manuscript on this construct.

      In support of this point, the data provided in Figure 1E in which the change in the abundances of each proteasome subunit in the tagged line vs. the BirA control line demonstrates substantial enrichment of the subcomplexes of the proteasome that are tagged in each case; this effect may represent the known feedback-mediated upregulation of new proteasome subunit synthesis that occurs when proteasomal proteolysis is impaired, or alternatively, the accumulation of subcomplexes containing the tagged subunit that cannot readily incorporate into mature proteasomes. Acknowledging this limitation in the text would be valuable to readers who are less familiar with the proteasome.

      We would like to clarify that the data shown in Figure 1E do not represent whole proteome data, but rather log2 fold changes vs. BirA* control calculated on streptavidin enrichment samples. The differences in the enrichment of the various subcomplexes between cell lines derives from the fact that the effect size of the enrichment depends on both protein abundance in the isolated complexes, but also on the efficiency of biotinylation. The latter will be higher for proteins located in closer proximity to the bait. A similar observation was pointed out in a recent publication (PMID:36410438) that compared BioID and Co-IP for the same bait. When a component of the nuclear pore complex (Nup158) was analyzed by BioID only the more proximal proteins were enriched as compared to the whole complex in Co-IP data (Author response image 1):

      Author response image 1.

      Proteins identified in the NUP158 BioID or pulldown experiments are filled in red or light red for significance intervals A or B, respectively. The bait protein NUP158 is filled in yellow. Proteins enriched in the pulldown falling outside the SigA/B cutoff are filled in gray. NPC, nuclear pore complex. SigA, significant class A; SigB, significant class B. Reproduced from Figure 6 of PMID: 36410438.

      However, we would like to point out that despite quantitative differences between different proteasome subunits, both 19S and 20S proteins were found to be strongly enriched (typically >2 fold) in all the constructs compared to BirA* control line (Figure 1E). This indicates that at least a fraction of all the tagged subunits are incorporated into fully assembled proteasomes.

      Regarding the upregulation of proteasome subunits as a consequence of proteasome dysfunction, we did not find evidence of this, at least in the case of PSMA4. The immunoblot shown in Figure 2A and its quantification in S3A indicate no increased abundance of endogenous PSMA4 upon tetracycline induction of PSMA4-BirA*.

      (2) The use of myc as a substrate of the proteasome for demonstration that proteolysis is unaffected is perhaps not ideal. Myc is known to be degraded via both ubiquitin-dependent and ubiquitin-independent mechanisms, such that disruption of one means of degradation (e.g., ubiquitin-dependent degradation) via a given tag could potentially be compensated by another. A good example of this is that the C-terminal tagging of PSMC2/Rpt1 is likely to disrupt interaction between the core particle and the regulatory particle (as suggested in Fig. 1D); this may free up the core particle for ubiquitin-independent degradation of myc.

      Aside from using specific reporters for ubiquitin-dependent vs. independent degradation or a larger panel of known substrates, analysis of the abundance of K48-ubiquitinated proteins in the control vs. tag lines would provide additional evidence as to whether or not proteolysis is generally perturbed in the tag lines.

      We thank the reviewer for this suggestion. We have included an immunoblot analysis showing that the levels of K48 ubiquitylation (Figure S3d) are not affected by the expression of tagged PSMA4.

      (3) On pg. 8 near the bottom, the authors accidentally refer to ARMC6 as ARMC1 in one instance.

      We have corrected the mistake.

      (4) On pg. 10, the authors explain that they analyzed the interactome for all major mouse organs except the brain; although they explain in the discussion section why the brain was excluded, including this explanation on pg. 10 here instead of in the discussion might be a better place to discuss this.

      We moved the explanation from the discussion to the results part.

      Reviewer #2 (Recommendations For The Authors):

      (1) Perhaps the authors can quantify the fraction of unassembled PSMA4-BirA* from the SEC experiment (Fig. 2b) to give the readers a feeling for how large a problem this could be.

      The percentages based on Area Under the Curve calculations have been added to Figure S3b.

      (2) Do the authors observe any difference in the enrichment scores between proteins that are known to interact with the proteasome vs proteins that the authors can justify as "interactors of interactors" vs the completely new potential interactors? This could be an interesting way to show that the potential new interactors are not simply because of poor false positive rate calibration, but that they behave in the same way as the other populations.

      We thank the reviewer for this suggestion. We analyzed the enrichment scores for 20S proteasome subunits, known PIPs, first neighbors and the remaining enriched proteins. The remaining proteins (potential new interactors) have very similar scores as the first neighbors of known interactors. This plot has been added to Figure S3g.

      (3) Did the authors try to train a logistic model for the miniTurbo experiments, like it was done for the BirA* experiments? Perhaps combining the results of both experiments would yield higher confidence on the proteasome interactors.

      Following the reviewers suggestion, we applied the classifier on the dataset of the comparison between miniTurbo and PSMA-miniTurbo. We found a clear separation between the FPR and the TPR with 136 protein groups enriched in PSMA-miniTurbo. We have added the classifier and corresponding ROC curve to Figure S4f and S4g.

      75 protein groups were found to be enriched for both PSMA4-BirA* and PSMA4-miniTurbo (Author response image 2), including the proteasome core particles, regulatory particles, known interactors and potential new interactors. As we focused more on the identification of substrates with PSMA4-miniTurbo, we did not pursue these overlapping protein groups further, but rather used the comparison to the mouse model to identify potential new interactors.

      Author response image 2.

      Overlap between ProteasomeID enriched proteins (fpr<0.05) between PSMA4-BirA* and PSMA4-miniTurbo.

      (4) Perhaps this is already known, but did the authors check if MG132 affect proteasome assembly? The authors could for example repeat their SEC experiments in the presence of MG132.

      We thank the reviewer for the suggestion, however to our knowledge there are no reports that MG132 has an effect on the assembly of the proteasome. MG132 is one of the most used proteasome inhibitors in basic research and as such has been extensively characterized in the last 3 decades. The small peptide aldehyde acts as a substrate analogue and binds directly to the active site of the protease PSMB5/β5. We therefore think it is unlikely that MG132 is interfering with the assembly of the proteasome.

      (5) Minor comment: at the bottom of page 8, the authors probably mean ARMC6 and not ARMC1.

      We have corrected the mistake.

      (6) It would be interesting to expand the analysis of the already acquired in vivo data to try to identify tissue-specific proteasome interactors. Can the authors draw a four-way Venn diagram with the interactors of each tissue?

      We thank the reviewer for this suggestion. We have generated an UpSet plot showing the overlap of ProteasomeID enriched proteins in the four tissues that gave us meaningful results (Author response image 3). In order to investigate whether the observed differences in ProteasomeID enriched proteins could be meaningful in terms of proteasome biology, we have highlighted proteins belonging to the UPS that show tissue specific enrichments. We found proteasome activators such as PSME1/PA28alpha and PSME2/PA28beta to enrich preferentially in kidney and liver, respectively, as well as multiple deubiquitinases to enrich preferentially in the heart. These differences might be related to the specific cellular composition of the different tissues, e.g., number of immune cells present, or the tissue-specific interaction of proteasomes with enzymes involved in the ubiquitin cycle. Given the rather preliminary nature of these findings, we have opted for not including this figure in the main manuscript, but rather include it only in this rebuttal letter.

      Author response image 3.

      Upset plot showing overlap between ProteasomeID enriched proteins in different mouse organs.

      Reviewer #3 (Recommendations For The Authors):

      (1) In the first paragraph of the Introduction, the authors link cellular senescence caused by partial proteasome inhibition with the efficacy of proteasome inhibitors in cancer therapy. Although this is an interesting hypothesis, I am not aware of any direct evidence for this; rather, I believe the efficacy of bortezomib/carfilzomib in haematological malignancies is most commonly attributed to these cells having adapted to high levels of proteotoxic stress (e.g., chronic unfolded protein response activation). I would suggest rephrasing this sentence.

      We thank the reviewer for the comment and have amended the introduction.

      (2) For the initial validation experiments (e.g., Fig. 1B), have the authors checked what level of Streptavidin signal is obtained with "+ bio, - tet" ? Although I accept that the induction of PSMA4-BirA* upon doxycycline addition is clear from the anti-Flag blots, it would still be informative to ascertain what level of background labelling is obtained without induction (but in the presence of exogenous biotin).

      We tested four different conditions +/- tet and +/- biotin (24h) in PSMA4-BirA* cell lines (Author response image 4). As expected, biotinylation was most pronounced when tet and biotin were added. When biotin was omitted, streptavidin signal was the lowest regardless of the addition of tet. Compared to the -biotin conditions, a slight increase of streptavidin signal could be observed when biotin was added but tet was not added. This could be either due to the promoter leaking (PMID: 12869186) or traces of tetracycline in the FBS we used, as we did not specifically use tet-free FBS for our experiments.

      Author response image 4.

      Streptavidin-HRP immunoblot following induction of BirA fusion proteins with tetracycline (+tet) and supplementation of biotin (+bio). For the sample used as expression control tetracycline was omitted (-tet). To test background biotinylation, biotin supplementation was omitted (-bio). Immunoblot against BirA and PSMA was used to verify induction of fusion proteins, while GAPDH was used as loading control.

      (3) For the proteasome structure models in Fig. 1D, a scale bar would be useful to inform the reader of the expected 10 nm labelling radius (as the authors have done later, in Fig. 2D).

      We have added 10 nm scale bars to Figure 1d.

      (4) In the "Identification of proteasome substrates by ProteasomeID" Results subsection, I believe there is a typo where the authors refer to ARMC1 instead of ARMC6.

      We have corrected the mistake.

      (5) I think Fig. S5 was one of the most compelling in the manuscript. Given the interest in confirming on-target efficacy of targeted degradation modalities, as well as identifying potential off-target effects early-on in development, I would consider promoting this out of the supplement.

      We thank the reviewer for the comment and share the excitement about using ProteasomeID for targeted degradation screening. We have moved the data on PROTACs (Figure S5) into a new main Figure 5.

      In addition, in relation to the comment of this reviewer regarding the detection of endogenous substrates, we have now included validation for one more hit emerging from our analysis (TIGD5) and included the results in Figure 4f, 4g and S4j.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this study, the authors examined the role of IBTK, a substrate-binding adaptor of the CRL3 ubiquitin ligase complex, in modulating the activity of the eiF4F translation initiation complex. They find that IBTK mediates the non-degradative ubiquitination of eiF4A1, promotes cap-dependent translational initiation, nascent protein synthesis, oncogene expression, and tumor cell growth. Correspondingly, phosphorylation of IBTK by mTORC1/ S6K1 increases eIF4A1 ubiquitination and sustains oncogenic translation.

      Strengths:

      This study utilizes multiple biochemical, proteomic, functional, and cell biology assays to substantiate their results. Importantly, the work nominates IBTK as a unique substrate of mTORC1, and further validates eiF4A1 (a crucial subunit of the ei44F complex) as a promising therapeutic target in cancer. Since IBTK interacts broadly with multiple members of the translational initial complex - it will be interesting to examine its role in eiF2alpha-mediated ER stress as well as eiF3-mediated translation. Additionally, since IBTK exerts pro-survival effects in multiple cell types, it will be of relevance to characterize the role of IBTK in mediating increased mTORC1 mediated translation in other tumor types, thus potentially impacting their treatment with eiF4F inhibitors.

      Limitations/Weaknesses:

      The findings are mostly well supported by data, but some areas need clarification and could potentially be enhanced with further experiments:

      (1) Since eiF4A1 appears to function downstream of IBTK1, can the effects of IBTK1 KO/KD in reducing puromycin incorporation (in Fig 3A), cap-dependent luciferase reporter activity (Fig 3G), reduced oncogene expression (Fig 4A) or 2D growth/ invasion assays (Fig 4) be overcome or bypassed by overexpressing eiF4A1? These could potentially be tested in future studies.

      We appreciate the reviewer for bringing up this crucial point. As per the reviewer's suggestion, we conducted experiments where we overexpressed Myc-eIF4A1 in IBTK-KO SiHa cells. Our findings indicate that increasing levels of eIF4A1 through ectopic overexpression is unable to reverse the decrease in puromycin incorporation (Fig. S3C) and protein expression of eIF4A1 targets caused by IBTK ablation (Fig. S4E). These results clearly demonstrate that IBTK ablation-induced eIF4A1 dysfunctions cannot be rescued by simply elevating eIF4A1 protein levels. Given the above results are negative, the impacts of eIF4A1 overexpression on the 2D growth/invasion capacities of IBTK-KO cells were not further examined. We sincerely appreciate the reviewer's understanding regarding this matter.

      (2) The decrease in nascent protein synthesis in puromycin incorporation assays in Figure 3A suggest that the effects of IBTK KO are comparable to and additive with silvesterol. It would be of interest to examine whether silvesterol decreases nascent protein synthesis or increases stress granules in the IBTK KO cells stably expressing IBTK as well.

      We appreciate the reviewer for bringing up this crucial point. We have showed that silvestrol treatment still decreased nascent protein synthesis in IBTK-KO cells overexpressing FLAG-IBTK as well (Fig. S3B).

      (3) The data presented in Figure 5 regarding the role of mTORC1 in IBTK- mediated eiF4A1 ubiquitination needs further clarification on several points:

      • It is not clear if the experiments in Figure 5F with Phos-tag gels are using the FLAG-IBTK deletion mutant or the peptide containing the mTOR sites as it is mentioned on line 517, page 19 "To do so, we generated an IBTK deletion mutant (900-1150 aa) spanning the potential mTORC1-regulated phosphorylation sites" This needs further clarification.

      We appreciate the reviewer for bringing up this crucial point. The IBTK deletion mutant used in Fig. 5F is FLAG-IBTK900-1150aa. We have annotated it with smaller font size in the panel (red box) in Author response image 1.

      Author response image 1.

      • It may be of benefit to repeat the Phos tag experiments with full-length FLAG- IBTK and/or endogenous IBTK with molecular weight markers indicating the size of migrated bands.

      We appreciate the reviewer for bringing up this crucial point. We attempted to perform Phos-tag assays to detect the overexpressed full-length FLAG-IBTK or endogenous IBTK. However, we encountered difficulties in successfully transferring the full-length FLAG-IBTK or endogenous IBTK onto the nitrocellulose membrane during Phos-tag WB analysis. This is likely due to the limitations of this technique. Based on our experience, phos-tag gel is less efficient in detecting protein motility shifts with large molecular weights. As the molecular weight of IBTK protein is approximately 160 kDa, it falls within this category. Considering these technical constraints, we did not include Phos-tag assay results for full-length IBTK in our study. We sincerely appreciate the reviewer's understanding regarding this matter.

      The binding of Phos-tag to phosphorylated proteins induces a mobility shift during gel electrophoresis or protein separation techniques. This shift allows for the visualization and quantification of phosphorylated proteins separately from non-phosphorylated proteins. It's important to note that these mobility shifts indicate phosphorylation status, rather than actual molecular weights. pre- stained protein markers are typically used as a reference to assess the efficiency of protein transfer onto the membrane [Ref: 1]. Considering the aforementioned reasons, we did not add molecular weights to the WB images.

      Reference [1]. FUJIFILM Wako Pure Chemical Corporation, https://www.wako- chemicals.de/media/pdf/c7/5e/20/FUJIFILM-Wako_Phos-tag-R.pdf

      • Additionally, torin or Lambda phosphatase treatment may be used to confirm the specificity of the band in separate experiments.

      We appreciate the reviewer for bringing up this crucial point. Torin1 is a synthetic mTOR inhibitor by preventing the binding of ATP to mTOR, leading to the inactivation of both mTORC1 and mTORC2, whereas rapamycin primarily targets mTORC1 activity and may inhibit mTORC2 in certain cell types after a prolonged treatment. We have identified that the predominant mediator of IBTK phosphorylation is the mTORC1/S6K1 complex. Therefore, in this context, we think that rapamycin is sufficient to inactivate the mTORC1/S6K1 pathway. As shown in Fig. 5F, the phosphorylated IBTK900-1150aa was markedly decreased while the non-phosphorylated form was simultaneously increased in rapamycin- treated cells. As per the reviewer's suggestion, we treated FLAG-IBTK900-1150aa overexpressed cells with lambda phosphatase. As shown in Fig. 5G, lambda phosphatase treatment completely abolished the mobility shifts of phosphorylated FLAG-IBTK900-1150aa. Additionally, the lowest band displayed an abundant accumulation of the non-phosphorylated form of FLAG-IBTK900-1150aa. These findings confirm that the mobility shifts observed in WB analysis correspond to the phosphorylated forms of FLAG-IBTK900-1150aa.

      • Phos-tag gels with the IBTK CRISPR KO line would also help confirm that the non-phosphorylated band is indeed IBTK.

      We appreciate the reviewer for bringing up this crucial point. As we state above, we performed Phos-tag assays to detect the mobility shifts of phosphorylated FLAG-IBTK900-1150aa. Anti-FLAG antibody, but not the anti-IBTK antibody was used for WB detection. This antibody does not exhibit cross-reactivity with endogenous IBTK.

      • It is unclear why the lower, phosphorylated bands seem to be increasing (rather than decreasing) with AA starvation/ Rapa in Fig 5H.

      We appreciate the reviewer for bringing up this crucial point. We think the panel the reviewer mentioned is Fig. 5F. According to the principle of Phos-tag assays, proteins with higher phosphorylation levels have slower migration rates on SDS-PAGE, while proteins with lower phosphorylation levels have faster migration rates.

      As shown in Author response image 2, the green box indicates the most phosphorylated forms of FLAG-IBTK900-1150aa, the red box indicates the moderately phosphorylated forms of FLAG-IBTK900-1150aa, and the yellow box indicates the non-phosphorylated forms of FLAG-IBTK900-1150aa. AA starvation or Rapamycin treatment reduced the hyperphosphorylated forms of FLAG-IBTK900-1150aa (green box), while simultaneously increasing the hypophosphorylated (red box) and non- phosphorylated (yellow box) forms of FLAG-IBTK900-1150aa. Thus, we conclude that AA starvation or Rapamycin treatment leads to a marked decrease in the phosphorylation levels of FLAG-IBTK900-1150aa.

      Author response image 2.

      Reviewer #2 (Public Review):

      Summary:

      This study by Sun et al. identifies a novel role for IBTK in promoting cancer protein translation, through regulation of the translational helicase eIF4A1. Using a multifaceted approach, the authors demonstrate that IBTK interacts with and ubiquitinates eIF4A1 in a non-degradative manner, enhancing its activation downstream of mTORC1/S6K1 signaling. This represents a significant advance in elucidating the complex layers of dysregulated translational control in cancer.

      Strengths:

      A major strength of this work is the convincing biochemical evidence for a direct regulatory relationship between IBTK and eIF4A1. The authors utilize affinity purification and proximity labeling methods to comprehensively map the IBTK interactome, identifying eIF4A1 as a top hit. Importantly, they validate this interaction and the specificity for eIF4A1 over other eIF4 isoforms by co- immunoprecipitation in multiple cell lines. Building on this, they demonstrate that IBTK catalyzes non-degradative ubiquitination of eIF4A1 both in cells and in vitro through the E3 ligase activity of the CRL3-IBTK complex. Mapping IBTK phosphorylation sites and showing mTORC1/S6K1-dependent regulation provides mechanistic insight. The reduction in global translation and eIF4A1- dependent oncoproteins upon IBTK loss, along with clinical data linking IBTK to poor prognosis, support the functional importance.

      Weaknesses:

      While these data compellingly establish IBTK as a binding partner and modifier of eIF4A1, a remaining weakness is the lack of direct measurements showing IBTK regulates eIF4A1 helicase activity and translation of target mRNAs. While the effects of IBTK knockout/overexpression on bulk protein synthesis are shown, the expression of multiple eIF4A1 target oncogenes remains unchanged.

      Summary:

      Overall, this study significantly advances our understanding of how aberrant mTORC1/S6K1 signaling promotes cancer pathogenic translation via IBTK and eIF4A1. The proteomic, biochemical, and phosphorylation mapping approaches established here provide a blueprint for interrogating IBTK function. These data should galvanize future efforts to target the mTORC1/S6K1-IBTK-eIF4A1 axis as an avenue for cancer therapy, particularly in combination with eIF4A inhibitors.

      Reviewer #1 (Recommendations For The Authors):

      (1) Certain references should be provided for clarity. For e.g.,: Page 15, line 418 " The C-terminal glycine glycine (GG) amino acid residues are essential for Ub conjugation to targeted proteins".

      We appreciate the reviewer for bringing up this crucial point. We have taken two fundamental review papers (PMID: 22524316, 9759494) on the ubiquitin system as references in this sentence.

      (2) Please describe the properties of the ΔBTB mutant on page 15 when first describing it. What motifs does it lack and has it been described before in functional studies?

      We appreciate the reviewer for bringing up this crucial point. We added a sentence to describe the properties of the ΔBTB mutant. This mutant lacks the BTB1 and BTB2 domains (deletion of aa 554–871), which have been previously demonstrated to be essential for binding to CUL3. The original reference has been added to the revised manuscript.

      (3) In Figure 2G how do the authors explain the fact that co-expression of the Ub K-ALLR mutant, which is unable to form polyubiquitin chains, formed only a moderate reduction in IBTK-mediated eIF4A1 ubiquitination?

      We appreciate the reviewer for bringing up this crucial point. The Ub K-ALLR mutant can indeed conjugate to substrate proteins, but it cannot form chains due to its absence of lysine residues, resulting in mono-ubiquitination. Multi- mono-ubiquitination refers to the attachment of single ubiquitin molecules to multiple lysine residues on a substrate protein. It's worth noting that a poly- ubiquitinated protein and a multi-mono-ubiquitinated protein appear strikingly similar in Western blot. Our findings demonstrated that the co-expression of the Ub K-ALL-R mutant resulted in only a modest reduction in IBTK-mediated eIF4A1 ubiquitination (Fig. 2G), and that eIF4A1 was ubiquitinated at twelve lysine residues when co-expressed with IBTK (Fig. S2F). As such, we conclude that the CRL3IBTK complex primarily catalyzes multi-mono-ubiquitination on eIF4A1. .

      (4) In Figure 5, The identity of the seven sites in the IBTK 7ST A mutants should be specified.

      We appreciate the reviewer for bringing up this crucial point. We have specified the seven mutation sites in the IBTK-7ST A mutant (Fig. 6A).

      (5) In Figure 5, the rationale for generating antibodies only to S990/992/993, as opposed to the other mTORC1/S6K motifs should be specified.

      We appreciate the reviewer for bringing up this crucial point. Upon demonstrating that IBTK can be phosphorylated—with evidence from positive Phos-tag and in vitro phosphorylation assays—we sought to directly detect changes in the phosphorylation levels using an antibody specific to IBTK phosphorylation. However, the expense of generating seven phosphorylation- specific antibodies for each site is significant. Recognizing that S990/992/993 are three adjacent sites, we deemed it appropriate to generate a single antibody to recognize the phospho-S990/992/993 epitope. Moreover, out of the seven phosphorylation sites, S992 perfectly matches the consensus motif for S6K1 phosphorylation (RXRXXS). Utilizing this antibody allowed us to observe a substantial decrease in the phosphorylation levels of these three adjacent Ser residues in IBTK following either AA deprivation or Rapamycin treatment (Fig. 5L). We have specified these points in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      The following suggestions would strengthen the study:

      (1) Directly examine the effects of IBTK modulation (knockdown/knockout/ overexpression) on eIF4A1 helicase activity.

      We appreciate the reviewer for bringing up this crucial point. We agree with the reviewer's suggestion that evaluating IBTK's influence on eIF4A1 helicase activity directly would enhance the strength of our conclusion. However, the current eIF4A1 helicase assays, as described in previous publications [Ref: 1, 2], can only be conducted using in vitro purified recombinant proteins. For instance, it is feasible to assess the varying levels of helicase activity exhibited by recombinant wild-type or mutant EIF4A1 proteins [Ref: 2]. Importantly, there is currently no reported methodology for evaluating the helicase activity of EIF4A1 in vivo, as mentioned by the reviewer in gene knockdown, knockout, or overexpression cellular contexts. Therefore, we have not performed these assays and we sincerely appreciate the reviewer's understanding in this regard. We sincerely appreciate the reviewer's understanding regarding this matter.

      Reference:

      [1] Chu J, Galicia-Vázquez G, Cencic R, Mills JR, Katigbak A, Porco JA, Pelletier J. CRISPR-mediated drug-target validation reveals selective pharmacological inhibition of the RNA helicase, eIF4A. Cell reports. 2016 Jun 14;15(11):2340-7.

      [2] Chu J, Galicia-Vázquez G, Cencic R, Mills JR, Katigbak A, Porco JA, Pelletier J. CRISPR-mediated drug-target validation reveals selective pharmacological inhibition of the RNA helicase, eIF4A. Cell reports. 2016 Jun 14;15(11):2340-7.

      (2) Justify why the expression of some but not all eIF4A1 target oncogenes is affected in IBTK-depleted/overexpressing cells. This is important if IBTK should be considered as a therapeutic target. The authors should consider which of the eIF4A1 targets are most impacted by IBTK KO. This would provide a more focused therapeutic approach in the future.

      We appreciate the reviewer for bringing up this crucial point. As the reviewer has pointed out, we assessed the protein levels of ten reported eIF4A1 target genes across three cancer cell lines (Fig.4, Fig. S4A, C). We observed that IBTK depletion led to a substantial reduction in the protein levels of most eIF4A1- regulated oncogenes upon IBTK depletion, although there were some exceptions. For instance, IBTK KO in H1299 cells exerted minimal influence on the protein levels of ROCK1 (Fig. S4A). Several possible explanations might account for this observation: firstly, given that our list of eIF4A1 target genes collected from previous studies conducted using distinct cell lines, it is not unexpected for different lines to exhibit subtle differences in regulation of eIF4A1 target genes. Secondly, as a CRL3 adaptor, IBTK potentially performs other biological functions via ubiquitination of specific substrates; dysregulation of these could buffer the impact of IBTK KO on the protein expression of some eIF4A1 target genes. We added these comments to the Discussion section of the revised manuscript.

      (3) Expand mTOR manipulation experiments (inhibition, Raptor knockout, activation) and evaluate impacts on IBTK phosphorylation, eIF4A1 ubiquitination, and translation.

      The mTORC1 signaling pathway is constitutively active under normal culture conditions. In order to inhibit mTORC1 activation, we employed several approaches including AA starvation, Rapamycin treatment, or Raptor knockout. Our results have demonstrated that both AA starvation and rapamycin treatment led to a reduction in eIF4A1 ubiquitination (Fig. 5M). Moreover, we have included new findings in the revised manuscript, which highlight that Raptor knockout specifically decreases eIF4A1 ubiquitination (Fig. 5N). It is worth mentioning that the impacts of mTOR inhibition or activation on protein translation have been extensively investigated and documented in numerous studies. Therefore, in our study, we did not feel it necessary to examine these treatments further.

      (4) Although not absolutely necessary, it would be nice to see if some of these findings are true in other cancer cell types.

      We appreciate the reviewer for bringing up this crucial point. We concur with the reviewer's suggestion that including data from other cancer cell types would enhance the strength of our conclusion. While the majority of our data is derived from two cervical cancer cell lines, we have corroborated certain key findings— such as the impact of IBTK on eIF4A1 and its target gene expression—in H1299 cells (human lung cancer) (Fig. 2C, Fig. S4A, B) and in CT26 cells (murine colon adenocarcinoma) (Fig. S4C, D). Additionally, we demonstrated that IBTK promotes IFN-γ-induced PD-L1 expression and tumor immune escape in both the H1299 and CT26 cells (Fig. S6A-K).

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2023-02154

      Corresponding author(s): Marco, Galardini

      1. General Statements

      We have carefully read the comments put forward by the two reviewers and we have produced a revised version of the manuscript that we believe addresses all the concerns expressed by the reviewers. In short, we have validated our approach against experimentally derived epistatic coefficients, compared our mutual information (MI) method against one that uses direct coupling analysis (DCA), and experimentally tested three interactions in the spike RBD that we have predicted and which emerged only in summer 2023, thus demonstrating the potential predictive power of this approach. We have also carefully reworded the manuscript to acknowledge the inherent limitation of a method based on MI to identify epistatic interactions. We believe that the revised manuscript is now more robust with these new in-silico and in-vitro validations, and more direct in exposing the advantages (speed) and caveats (higher false-positives) of this approach.

      Note: the line numbers referenced in the responses to reviewers below refer to the document in which the changes are highlighted.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: The authors inferred the pairwise epistasis through the Mutual Information provided by the spydrpick algorithm. They claim that the MIs could serve as a real-time identification of the epistatic interactions with the SARS-CoV-2 genomes due to the fast inference and high sensitivities.

      Major comments:

      1.The authors take a data-driven approach to infer the Mutation Information as the epistatic interactions between the mutations over different sites over SARS-CoV-2 genomes. However, it would be better to specify why this metric is reliable to be used as the representation of the pairwise epistatic interactions, and any theoretical explanations to support this.

      We agree that readers should be better informed on why MI can be used to estimate epistatic interactions from genomic data. We have therefore expanded the introduction (lines 93-98), methods (lines 540-543) and discussion (lines 453-457) sections to provide a proper theoretical and practical foundation on the use of a MI-based method. Furthermore, we have expanded the results section to add one additional in-silico validation (lines 244-249, Supplementary Figure 5, and updated Supplementary Figure 8) and an in-vitro one (Figure 5, see also reply to comment 2 from reviewer #2), which we believe give strong support to the MI-based method.

      2.The authors claimed that the DCA method requires more computational resources and more time to complete. However, with a proper filtering procedure, the computational time could be reduced heavily. An example is Physical Review E 106 (4), 044409, 2002, in which the DCA was used to investigate the real-time pair-wise interactions (month-to-month). There the DCA results were compared with the correlation analysis. It would be nice to have comparisons of the inferred interactions between MIs and other methods.

      We agree that our MI-based approach should be compared against DCA-based methods. The original manuscript had in fact one such comparison (for the 2023-03 dataset, Figure 3C), which indicated a strong correlation between the two methods. To make this result more robust we have computed the DCA values for the complete time-series dataset and measured the correlation with the MI values (Supplementary Figure 4)

      We observed a relatively high correlation in estimated values between the two methods, with the exception of three time points, i.e., 2020-11, 2023-02 and 2023-03. We can explain these lower correlations with the low overall sequence diversity observed in the early phase of the pandemic (2020-11) and with the different weighting scheme of our approach, which would significantly alter the dataset when compared to the one used by the DCA method, especially towards the later timepoints (see also the reply to reviewer #2, comment 3, section iv). When those three timepoints are excluded, the two methods show a high degree of correlation, implying that they are comparably suitable in detecting coevolutionary signals.

      We have also used the 2nd order coefficients derived from experimental data in Moulana et al., 2022 (10.1038/s41467-022-34506-z) to validate both approaches (see methods, lines 624-631).

      The panels which we have combined to create the new Supplementary Figure 5, indicate how both approaches (MI for panel A and C, and DCA for panels B and D) correctly recover the interaction with 2nd order epistatic coefficient > 0.15, based on the odds-ratio metric. Our MI-based approach has, however, a higher recall across multiple time points, which is especially visible comparing panels A and B. The DCA-based method did correctly identify known epistatic interactions, but did so only in sporadic timepoints, even though the distribution of the underlying variants did not change significantly month to month. We believe that the higher recall of the MI-based method has a higher value for genomic epidemiology, at least for SARS-CoV-2.

      3.In Figure 1C, the authors show that their spydrpick algorithm provides more pairwise MIs for longer distances, where the outliers are denser than those with short distances. How do we explain this phenomenon?

      We thank the reviewer for bringing this point up; we actually think that our data shows the opposite, meaning that we observe a higher proportion of close interactions when normalizing by the number of possible interactions. If we take an arbitrary distance threshold of 1'000 bases to define "close" Vs. "distant" interactions, we observe 194 and 280 interactions, respectively. It is true that distant interactions would be more, but the space of possible interactions is orders of magnitude larger for "distant" interactions, simply by the fact that there are more sites from which interactions can originate. As a crude estimate we can use the combinations between 1,000 sites (499,500 possible interactions) Vs those between 28,903 sites (the full SARS-CoV-2 genome length 29,903 bp minus 1,000, 417,677,253). Based on these estimates we have indeed observed less "close" than "distant" interactions.

      Minor comments:

      4.The explanations of Fig. 1E could be in more detail. Say, the grey dots in Fig. 1E, which is marked as "other" and such "other"s are dominated here. Why?

      We thank the reviewer for pointing out a section where more clarity was needed. We have added the following sentence to the figure legend: "The category "other" indicates positions which are not known to have an impact on affinity to ACE2, immune escape or otherwise flagged as MOI/MOC.". This indicates that predicted interactions involving a site classified as "other" are either false positives or previously undiscovered interactions.

      5.On line 210, the authors mentioned that the weights of the old sequences are lower "at around six months (120 days)". It would be better to specify why six months is 120 days instead of 180 days,

      We have corrected this mistake and indicated 4 months. We thank the reviewer for spotting this error.

      Referees cross-commenting

      I agree with what Reviewer #2 presented in the Consults Comments. The authors should present the reasons why MIs can be explained as the epistatic interations between sites as both of us mentioned this point. I checked the other revision points that raised by the Reviewer #2. They would be definetely helpful for enhancing the quality of the manuscript.

      Reviewer #1 (Significance (Required)):

      The work in the current manuscript is interesting and presented nicely. However, the theoretical foundations that the MIs could be explained as epistatic interactions should be illustrated. Otherwise, the tools would be useful for SARS-CoV-2 and other potential pandemics by different virus.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The manuscript proposes an approach to identify epistatic interactions in the SRAR-CoV-2 genome using the large amount of genomic data which accumulated during the COVID pandemics. They argue that due to a relatively low computational cost, this can be done online in any ongoing pandemics nowadays (i.e. in the situation where the viral spreading and evolution are closely monitored by massive sequencing). In principle, this is interesting, but in my opinion the manuscript has some strong problems and will require major rewrighting:

      1) In difference to the claims of the manuscript, detected correlation does not necessarily imply epistatic couplings:

      • Even in a totally neutral setting, mutations may occur by chance together, and expand due to genetic drift or when ecountering a susceptible population. Equally, to independent muations may spread in different geographic regions, without the double mutant ever arising. Both cases lead to non-zero mutual information.

      • In evolution, frequently driver and passenger mutations are observed, in particular in settings of relatively high mutation rate. The passenger will rise in frequency with the driver, without any epistatic coupling.

      • The very unequal sequencing across geographic areas will enhance certain variants and leave others undetected. Even if the authors avoid double counting of identical sequences, more small variation is detected when sequencing deeper. The Omicron variant illustrates an extreme case here: it combined a large number of mutations, never detected before, but epistasis is not the most likely explanation, but rather lack of monitoring of the evolutionary path from the ancestral variants to Omicron.

      • MI has been criticised because it overestimates the effect of indirecrt correlations in particular in dense epistatic networks. The situation in the spike protein in Fig. 1B seems very dense.

      Currently the manuscript does not make any effort to disentangle any of these effects.

      Following this (and reviewer 1) comments, we have made a number of changes to the manuscript in order to provide more context into how MI can be used to estimate epistatic interactions and the inherent limitations of this approach. In particular, we have expanded the introduction (lines 93-98), methods (lines 540-543) and discussion (lines 453-457) sections in a way that we believe exposes the limitations of the approach. Despite these limitations, we still believe that a MI-based approach strikes a good balance between speed, ease of implementation, and sensitivity. To further demonstrate this point we have added two additional validations to our results: the first one (in-silico) uses estimated 2nd order epistatic coefficients derived from experimental data (Moulana et al., 2022, 10.1038/s41467-022-34506-z), and the second (in-vitro) our own experimental data on three predicted interactions. The results of the new in-vitro validation have been described in the reply to comment #2 from reviewer 1; in short they show how the MI-based method has comparable sensitivity and specificity as the DCA-based method, and most importantly they allow the recovery of known epistatic interactions across the time period in which they have appeared. The results of the in-vitro validation are discussed in the reply to the next comment from this reviewer, as they directly address the predictive power of our approach: in short, we show how we could also validate these predictions. We think that these new results clearly show how, despite its limitations, the MI-based approach is able to identify bona-fide epistatic interactions, with the advantage of being a simple method to be implemented and with the possibility to be run in real time. For a more detailed discussion of the merits of the MI-based approach over DCA, see the reply to comment #3 from this reviewer.

      2) What are the predictive capacities of the approach? Mutual information is bounded from above by the individual site entropies. So high MI can be detected only in highly mutated sites - i.e. in sides for sure already under monitoring. In fact, the sites in Fig. 1B with many links reflect the overall profile of variant frequencies in single sites (i.e. a totally non-epistatic measure) available on Nextstrain, and extracted from the same data sources.

      The discussion of the results is very anecdotal and it is not clear to me in how far there is any real prediction in the paper, which might surprise and trigger observation or further analyses.

      There is an entire line of related research in estimating and exploiting epistatic couplings in HIV evolution (A Chakraborty, M. Kardar, J. Barton, M MacKay and others) - not cited in the manuscript but relevant for the question how to detect epistatic couplings and what they are good for.

      We thank the reviewer for pointing out relevant literature we had not covered in the original manuscript, and which can be used to indicate how epistatic interaction signals can be leveraged when studying viruses. We have added citations to these studies in the introduction (lines 76-78) to provide a better background for our own study. Regarding the broader concern of showing the predictive power of our approach, we had a similar concern after the manuscript was submitted, and we had already planned a "blind" in-vitro validation to put our approach to the test. In order to make this validation as "blind" as possible, we expanded the dataset to include sequences until August 2023. We then selected interactions within the spike RBD with confidence level O4 in at least the last 4 time points and with one position already flagged as either "affinity", "escape" or "other MOI/MOC"

      We then selected the top three interactions (446-460, 446-486 and 452-490) for our validation, as they have an outlier confidence O4 in at least the 4 time points, and lower or no prediction before. We also added the known 498-501 interaction as a control (Figure 5, panel B)

      We then focused on selecting a set of non-synonymous substitutions to test for their potential epistatic interactions. We decided to select 6 substitutions affecting the 3 predicted interactions based on their frequency in the time points after the cutoff of the original manuscript, shown in Figure 5, panel C.

      Of those, L452R/F490S and G446S/F486V are anti-correlated in their frequency and virtually never observed together in our dataset, G446S/F486S is observed at low frequency (87 samples after 2023-05), and G446S/N460H is virtually never observed (5 samples). We chose the anti-correlated pairs to test the potential of the MI method to explain these "avoidance" phenomenon, and the low frequency pairs as a way to test an early warning system for mutation signatures that might rise in the future. We then planned to test the impact of the individual variants, the double variants, both in the wild-type background and in the Q498R/N501Y background as a crude model for the Omicron variant.

      We then used a pseudovirus assay to test mutated RBDs across two phenotypes: infectivity (i.e. the ability to infect Vero B4 cells) and immune escape (i.e. antibody neutralization curves). We then tested for the presence of epistatic interactions for the double mutants in both backgrounds using a simple linear model (see Methods, lines 711-727). The results of these in-vitro assays are summarized below (Figure 5, panel E for infectivity, F for immune escape).

      Double mutants with a significant (p-value -10) interaction have been highlighted with an asterisk. We confirmed the epistatic interaction for the Q498R/N501H, both for its effect on infectivity and immune escape. For both anti-correlated pairs we found a significant interaction for either the infectivity assay (both) and immune escape (G446S/F486V). In particular, we found that the one hand the G446S/F486V pair induced a large drop in infectivity in the Q498R/N501H background while the double mutant was fairly similar to the immune escape profile of the single G446S variant, thus compensating for the loss of escape shown by the F486V variant alone. We observed the opposite for the L452R/F490S pair in terms of infectivity, with the pair showing a large increase in infectivity in the Q498R/N501H background, an effect we found to be significant. The double mutant had a slightly better immune escape profile than the single mutants, although not significant. From these observations we can hypothesize that the G446S/F486V is anticorrelated for their strong defect in infectivity; we cannot apply the same reasoning for the L452R/F490S pair, whose absence from circulating variants could be ascribed to stochasticity in population dynamics or interactions with other variants. We observed a similar impact of the G446S/F486S and G446S/N460H pairs on infectivity as G446S/F486V; based on these results we could estimate that variants carrying these pairs might have a fitness disadvantage. The inability of unsupervised methods (MI or DCA based) to predict the direction of the effect of course makes it difficult to inform which of the two pairs should be added to a "watchlist", but it would potentially reduce the number of interactions to be tested. We believe that the results of this admittedly small scale in-vitro validation demonstrates the potential of the MI-based approach to flag emerging interactions worthy of further studying. Recent advances in scalability of molecular assays (e.g. 10.1101/2024.03.08.584176) could then be coupled with a real-time system as the one we describe in our manuscript to filter out the more relevant interactions. We have added this forward-looking observation in the discussion as well (lines 465-474).

      3) The authors say that more involved methods like the Direct Coupling Analysis with Pseudolikelihood maximisation would be too slow for the analysis, but several papers show the contrary. The paper by Zeng et al. (Ref. [39]) does so very early in the pandemics in 2020, and another uncited paper of the same authors (Physical Review 2022) uses a nearly identical approach to study the time evolution of epistatic couplings (extractions from Gisaid at several times). As one of theit results, they show that their approach is not only feasible, but delivers more stable results than simpler correlation measures like MI.

      We thank the reviewer for pointing out a relevant reference we had missed in the initial manuscript. At a general level Zeng et al. take a similar approach to what we have described, namely to divide the data according to the isolation date to look for temporal trends. We however see a few differences that we think are in favor of the approach we describe:

      1- Our manuscript covers the time period after the emergence of the Omicron variant, in which epistatic interactions are known and have been characterized and validated experimentally, a crucial requirement for validation. We have also conducted an in-vitro validation on a selected set of predicted interactions (see the reply to the previous comment), which indicates that the method is sound and predictive.

      2- We have prepared a cumulative time-series dataset, meaning that each month introduces new sequences on top of the ones already selected from the previous time points. To the best of our knowledge the Zheng et al. dataset has "insulated" sequences at each month. We believe our approach has the advantage of allowing for a higher recall, as it includes a representation of extinct lineages, which may increase diversity at key loci and thus boost the signal. As described in the original manuscript and in the reply to this reviewer's comments "iv" and "v", we have added a weighting scheme in order to reduce the influence of older sequences and increase the relevance of smaller lineages.

      3- While we have not tested the DCA implementation used by Zeng et al., and we cannot therefore directly comment on its scalability, we have encountered serious limitations when scaling up the popular plmc C implementation developed by the lab of Deborah Marks. In particular we were unable to successfully run it for datasets with more than ~300k sequences, encountering segmentation faults.

      Regarding the third point, while this meant that we could not test the DCA approach on the full dataset, we could still manage to apply it on the time series data, focusing exclusively on the spike (S) gene. As shown above in the reply to reviewer's 1 comment #2, the two methods have a high correlation and are both able to recover known interactions, although with the DCA method having a lower recall. Taken together we believe that the MI-based approach we describe is robust enough to be considered when a tradeoff between speed, ease of implementation and sensitivity has to be struck, which we believe may be the case for a rapid response during a potential future pandemic. We have added more details to the part of the discussion in which the comparison with the DCA-based methods was made to point out how those are still feasible with very large collections of sequences (lines 444-448).

      It would therefore be essential that the authors strongly revise their manuscript to show the relaibility of the results, the predictive value of the predicted couplings, and the originality and robustness of the approach.

      We believe that our response to both reviewers have addressed these concerns, and as a result we have provided a more nuanced view on the use of MI-based methods in the prediction of epistatic interactions in pandemic viruses. Our wording has been modified to make sure that readers interested in replicating our approach are aware of its strengths (speed, ease of implementation) and limitations.

      Furthermore, there are some minor issues in the formulations, which should be corrected

      i) "the virus has differentiated into a number of lineages, almost all of which have taken over the whole population..." This is wrong. SARS-CoV-2 has always been very heterogeneous, with diverse variants circulating (the authors use millions of non-redundant sequences), and only very few have become VOIs or VOCs at some point. This image of competition between multiple coexisting strains is much closer to clonal interference than what the authors describe (even if clonal interference does not rely on population structure, which has always been an important element in COVID).

      We thank the reviewer for pointing out this error in our observation. We have changed "almost all" to "some", which we agree is more accurate.

      ii) The authors say that pseudolikelihood methods would require "aggressive subsampling". This is not true, in machine learning massive training data are frequently used in the context of batch learning, i.e. in each learning epoch a "batch" is sampled from the full data. This leads to stochasticity in learning, but all data are eventually used.

      We have reformulated this sentence (lines 85-90) to indicate how batch learning could also be used to make certain methods scalable, with the caveat that they would be more complicated to implement.

      iii) The authors say that the download also a phylogenetic tree, but I do not see where it is used.

      As indicated in the methods section, we have used the phylogenetic tree for two purposes:

      1- To single out high quality sequences from the raw MSA (line 515)

      2- To compute the weight of each sequence in the final MSA, as described in line 540-549

      iv)The authors use sequence weights as implemented in Ref. [31]. There a weighting at sequence similarity threshold of 90% is used. I would expect that there are no SARS-CoV-2 genomes having accumulated more than 10% of nucleotide mutations, i.e. the weighting procedure would be without any effect.

      We realized that the sequence weighting scheme we have used is not described in Pensar et al. (10.1093/nar/gkz656), but rather in the implementation of the spydrpick algorithm used by the panaroo software (Tonkin-Hill et al., 10.1186/s13059-020-02090-4). This weighting scheme is based on the more granular metric that is the patristic distance of each sequence from the root of the tree, divided at each branching point by the number of its terminal leaves. In practical terms this means that sequences belonging to smaller lineages (i.e. with fewer observed samples) will have a larger weight, regardless of a discrete sequence similarity threshold, as was done in the original implementation. We have updated the methods section to clearly indicate that the weighting scheme is that first shown in the panaroo software package (line 543).

      v)The authors estimate that they need 10,000-100,000 sequences to estimate MI, but find the epistatic coupling in spike residues 498-501 as soon as 6 double mutants are present, which is a frequency of about 1e-4. The corresponding entropies should be low and in consequence the MI, too.

      We thank the reviewer for raising this point, which prompted us to devise a way to better illustrate the sequence weighting scheme we have used. As a side note we also discovered that the number of Omicron sequences at the 2021-11 was actually 7, and not 6 as stated throughout the original manuscript, an error we have now fixed. As described in the methods section we have combined two weights in the time-series analysis: the first one, described in the response to the previous comment, is based on the "density" of the phylogenetic tree, which deflates the contribution of "denser" regions of the tree, and the second reduces the relevance of older sequences. The two weights are then combined multiplicatively. As a result the "real" (i.e. effective) number of sequences harboring a particular double mutation will be different than by just counting their occurrences.

      As shown in Supplementary Figure 3, the combination of both weights (first column) leads to an increased effective number of sequences for "younger" samples and those that come from "sparser" regions of the overall phylogenetic tree. This is particularly evident for the middle row (2021-11); the light orange dot, which indicates sequences belonging to the first Omicron lineage to appear in the dataset (BA.1), has an actual N of 7, but an effective N of ~100 (exact value 86), thanks to its "novelty" both in the tree (middle panel) and in terms of time (right panel). We again thank the reviewer for raising this point, which led us to generate this visualization, which will hopefully clarify the rationale for the weighting strategy we have used for moist readers.

      vi)The authors say that the public health toll of COVID has been "balanced" by scientific discovery - I would urge the authors to avoid such formulations, which sound cynical.

      We agree with the reviewer that this comment might sound cynical and tone-deaf, and have reformulated to indicate that the impact of the pandemic has coincided with an accelerated pace of applied scientific discovery.

      Referees cross-commenting

      Both reports bring up very similar points (points 1 of both reports, point 2 of Reviewer #1 vs. my point 3) but add partially complementary questions (point 3 of Reviewer #1, my point 2), both related to the interpretation of the data. My report is more severe, but reading the ms I am convinced that the paper requires serious revision. So reports seem coherent but with different degrees of recommendations. However, none of the comments of one reviewer is contradiction to the other reviewer.

      Reviewer #2 (Significance (Required)):

      While the paper asks interesting questions and wants to make use of the quite unique data which have accumulated during the COVID pandemics, the above mentioned problems raise important questions about the manuscript. It would be essential that the authors strongly revise their manuscript to show the relaibility of the results, the predictive value of the predicted couplings, and the originality and robustness of the approach.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is a follow-up study to the authors' previous eLife report about the roles of an alpha-arrestin called protein thioredoxin interacting protein (Txnip) in cone photoreceptors and in the retinal pigment epithelium. The findings are important because they provide new information about the mechanism of glucose and lactate transport to cone photoreceptors and because they may become the basis for therapies for retinal degenerative diseases.

      Strengths:

      Overall, the study is carefully done and, although the analysis is fairly comprehensive with many different versions of the protein analyzed, it is clearly enough described to follow. Figure 4 greatly facilitated my ability to follow, understand and interpret the study. The authors have appropriately addressed a few concerns about statistical significance and the relationship between their findings and previous studies of the possible roles of Txnip on GLUT1 expression and localization on the surfaces of RPE cells.

      We are delighted that Reviewer #1 is satisfied with this revised version.

      Reviewer #2 (Public Review):

      The hard work of the authors is much appreciated. With overexpression of a-arrestin Txnip in RPE, cones and the combined respectively, the authors show a potential gene agnostic treatment that can be applied to retinitis pigmentosa. Furthermore, since Txnip is related to multiple intracellular signaling pathway, this study is of value for research in the mechanism of secondary cone dystrophy as well.

      There are a few areas in which the article may be improved through further analysis and application of the data, as well as some adjustments that should be made in to clarify specific points in the article.

      Strengths

      • The follow-up study builds on innovative ground by exploring the impact of TxnipC247S and its combination with HSP90AB1 knockdown on cone survival, offering novel therapeutic pathways.

      • Testing of different Txnip deletion mutants provides a nuanced understanding of its functional domains, contributing valuable insights into the mechanism of action in RP treatment.

      • The findings regarding GLUT1 clearance and the differential effects of Txnip mutants on cone and RPE cells lay the groundwork for targeted gene therapy in RP.

      Weaknesses

      • The focus on specific mutants and overexpression systems might overlook broader implications of Txnip interactions and its variants in the wider context of retinal degeneration.

      Txnip is not expressed in WT or RP cones, as described in our previous study (Xue et al., 2021, eLife), so we could not perform loss of function assays. We thus chose overexpression, and assayed various alleles, based upon the literature, as we describe in our manuscript.

      • The study's reliance on cell count and GLUT1 expression as primary outcomes misses an opportunity to include functional assessments of vision or retinal health, which would strengthen the clinical relevance.

      In our previous study, we demonstrated that the optomotor response of Txnip-treated RP mice improved (Xue et al., 2021, eLife). Also, as described in our previous Txnip study, as well as an independent study (Xue et al., 2021, eLife; Xue et al., 2023, PNAS), ERG assays of Txnip-treated RP cones were no different than the controls. Other therapies that prolong RP cone survival and the optomotor response in our lab also failed to save the ERG, suggesting that there are other pathways that need to be addressed, e.g. the visual cycle. A combination therapy addressing multiple problems is one of our goals.

      • The paper could benefit from a deeper exploration of why certain treatments (like Best1-146 Txnip.C247S) do not lead to cone rescue and the potential for these approaches to exacerbate disease phenotypes through glucose shortages.

      This system is more complicated than we currently understand, and more work needs to be done.

      • Minor inconsistencies, such as the missing space in text references and the need for clarification on data representation (retinas vs. mice), should be addressed for clarity and accuracy.

      The missing spaces are added.

      We described the strategy of injecting the same mouse in each eye, one eye with control and one with the experimental vector. However, the following sentence has been added to the Materials and Methods to better assist the reader:

      “In almost all experiments, other than as noted, one eye of the mouse was treated with control (AAV8-RedO-H2BGFP, 2.5 × 108 vg/eye), and the other eye was treated with the experimental vector plus AAV8-RedO-H2BGFP, 2.5 × 108 vg/eye.”

      • The observation of promoter leakage and potential vector tropism issues raise questions about the specificity and efficiency of the gene delivery system, necessitating further discussion and validation.

      The following sentences have been added to the Results. We do not think this phenomenon affects the practice of the experiments or the interpretation of the results in this study.

      “To enable automated cone counting and trace the infection, we co-injected an AAV (AAV8-RedO-H2BGFP-WPRE-bGHpA) encoding an allele of GFP fused to histone 2B (H2BGFP), which localized to the nucleus. As the red opsin promoter was used to express this gene, H2BGFP was seen in cone nuclei, but not in the RPE, if AAV8-RedO-H2BGFP-WPRE-bGHpA was injected alone. However, when an AAV that expressed in the RPE, i.e. AAV8-Best1-Sv40intron-(Gene)-WPRE-bGHpA, was co-injected with AAV8-RedO-H2BGFP-WPRE-bGHpA, H2BGFP was expressed in the RPE, along with expression in cones (Figure 2A). We speculate that this is due to concatenation or recombination of the two genomes, such that the H2BGFP comes under the control of the RPE promoter. This may be due to the high copy number of AAV in the RPE, as it did not happen in the reverse combination, i.e. AAV with an RPE promoter driving GFP and a cone promoter driving another gene, perhaps due to the observation that the AAV genome copy number is »10 fold lower in cones than in the RPE (Wang et al., 2020).”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Summary:

      This paper provides a straightforward mechanism of how mycobacterial cAMP level is increased under stressful conditions and shows that the increase is important for the survival of the bacterium in animal hosts. The cAMP level is increased by decreasing the expression of an enzyme that degrades cAMP.

      We thank the reviewer for these extremely encouraging comments.

      Strengths:

      The paper shows that under different stresses the response regulator PhoP represses a phosphodiesterase (PDE) that degrades cAMP specifically. Identification of PhoP as a regulator of cAMP is significant progress in understanding Mtb pathogenesis, as increase in cAMP apparently increases bacterial survival upon infection. On the practical side, reduction of cAMP by increasing PDE can be a means to attenuate the growth of the bacilli. The results have wider implications since PhoP is implicated in controlling diverse mycobacterial stress responses and many bacterial pathogens modulate host cell cAMP level. The results here are straightforward, internally consistent, and of both theoretical and applied interests. The results also open considerable future work, especially how increases in cAMP level help to increase survival of the pathogen.

      Weaknesses:

      It is not clear whether PhoP-PDE Rv0805 is the only pathway to regulate cAMP level under stress.

      Reviewer 1 (Recommendations for the authors):

      (1) L.1: "maintenance of" or 'regulating'- I thought change in cAMP level upon stress is the whole point of the paper. Also, can replace "intracellular survival" with 'survival in host macrophages' if you want to be more specific.

      We agree with the reviewer, and therefore, we have now replaced “maintenance of” with “regulating cAMP level” in the title. However, we feel more comfortable with “intracellular survival” rather than being more specific with ‘survival in host macrophages’ as we have also shown animal experiments to demonstrate ‘in vivo’ effect in mice lung and spleen.

      (2) L.26: ---requires the bacterial virulence regulator –

      The suggested change has been made to the text.

      (3) L.30: Replace "phoP locus since the" with 'PhoP since this'. (The product, not the locus, is the regulator). The same comment for l.113.

      We agree with the reviewer. The suggested changes have been made to the text.

      (4) L.31: Change represtsor to repressor.

      We are sorry for the embarrassing spelling mistake. We have rectified the mistake in the revised version.

      (5) L.32: "hydrolytically degrades" or hydrolyses? (lytic and degrade sound like tautology). Same comment for l.117.

      We agree. The suggested change has been made to the text in both places of the revised manuscript.

      (6) L.35: I would also suggest changing "intra-mycobacterial" to 'intra bacterial' because you are talking about one bacterium here. The same change is recommended in l.29.

      Following reviewer’s recommendation, we have made the changes in the revised manuscript.

      (7) L.37: bacillus unless use of the plural form is the norm in the field.

      We agree. The suggested change has been made to the text.

      (8) L.43: Delete "intracellular" and change "intracellular" to host in l.44.

      The suggested changes have been made to the text.

      (9) L.66: --that a burst--

      We have corrected the mistake in the revised manuscript.

      (10) L.76: Receptor or receptor?

      We have corrected the mistake in the revised manuscript.

      (11) L.86: -- mechanisms of regulation of mycobacterial cAMP level. (homeostasis needs to be introduced first, and not used in the concluding statement for the first time).

      The suggested changes have been made to the text.

      (12) L.96: "essential" or 'a requirement'. (reduction is not the same as elimination)

      We understand the reviewer’s concern. However, several studies have independently established that phoPR remains an essential requirement for mycobacterial virulence.

      (13) L.97: Moreover, a mutant

      The suggested change has been made to the text.

      (14) L.113: --locus since PhoP has been –

      The suggested change has been made to the text.

      (15) L.119: mechanism or manner? (you are stating a fact, not a mechanism)

      We agree. We have now replaced ‘mechanism’ with ‘manner’ in the revised manuscript.

      (16) L.130: --lacking copies of both phoP and phoR (I am assuming you don't have two copies of each gene)

      We understand the reviewer’s concern. For better clarity, we have now clearly mentioned that the phoPR-KO mutant lacks both the single copies of phoP and phoR genes.

      (17) L.156: Indicate why GroEL2? - cells as another cytoplasmic protein, GroEL2 was also undetectable

      We have now mentioned it in the secretion experiments that mycobacterial cells did not undergo autolysis. To prove this point, we have used cytoplasmic GroEL2 as a marker protein. The absence of detectable GroEL2 in the culture filtrates (CFs) suggests absence of autolysis. To this end, we have modified the sentence in the revised manuscript (duplicated below):

      “Fig. 1C confirms absence of autolysis of mycobacterial cells as GroEL2, a cytoplasmic protein, was undetectable in the culture filtrates (CF).”

      (18) L.266: May delete "Together". Start with These data--, which would draw more attention to integrated view. In l.268-270, a reminder that intracellular pH is acidic in the normal course would enhance the physiological significance of the present results.

      We agree. We have made the suggested changes to the text. In view of the second comment of the reviewer, we have modified the text (duplicated below):

      “These data represent an integrated view of our results suggesting that PhoP-dependant repression of rv0805 regulates intra-mycobacterial cAMP level. In keeping with these results, activated PhoP under acidic pH conditions significantly represses rv0805, and intracellular mycobacteria most likely utilizes a higher level of cAMP to effectively mitigate stress for survival under hostile environment including acidic pH of the phagosome.”

      (19) L.272: Delete "and intracellular survival" (?) (I am assuming the survival is due to stress tolerance; also the section talks about stress only). No period in l.273.

      Following reviewer’s recommendations, the suggested changes have been made to the text.

      (20) L.295: Start the sentence thus: It appears that at least one of ---. (This would put more emphasis on the inference)

      We agree. We have now incorporated the recommended changes in the revised version.

      (21) L.301: No parenthesis.

      The parenthesis has been removed in the revised manuscript.

      (22) L.306: Together already implies these. Either delete Together (which I would prefer) or say 'Together, the results suggest that strains expressing wild type and mutant----properties, and the results are

      We agree. We have now deleted ‘Together’ in the revised manuscript.

      (23) L.311: These results support our view that higher---- (to avoid repetition of l.266)

      We agree. We have now incorporated the suggested change in the revised manuscript.

      (24) L.316: Using or with?

      We think “with” goes well with the statement.

      (25) L.329: Rephrase thus: Effect of intra-bacterial cAMP level on in vivo--

      The recommended change has been made to the text.

      (26) L.333: I would use ~, if you want to indicate about.

      We agree. We have now used ‘~’ in the revised version. Changes were incorporated in lines 328, 330 and 333 of the revised manuscript.

      (27) L.350: Change "somewhat functionally" to phenotypically?

      We thank the reviewer for this suggestion. We have changed “somewhat functionally” to “phenotypically” in the revised manuscript.

      (28) L.361: Change "is connected to" to 'regulates'.

      The suggested change has been made to the text.

      (29) L.365: ACs (to be parallel with PDEs)

      We agree. The suggested change has been made to the text.

      (30) L.366: delete "very" (let the readers decide how recent from the reference date).

      The suggested change has been made to the text.

      (31) L.382: level remained unknown before the present study.

      The recommended change has been made to the text.

      (32) L.399: add at the end of the sentence 'under stress'. Also, represent, not represents.

      The recommended changes have been made to the text.

      (33) L.560 and 571: Section headings formatted differently from the rest. Similar problem in l.900.

      We have rectified the issue and all of the section headings are now formatted in the same style.

      Reviewer #2 (Public Review):

      Summary:

      In the manuscript, the authors have presented new mechanistic details to show how intracellular cAMP levels are maintained linked to the phosphodiesterase enzyme which in turn is controlled by PhoP. Later, they showed the physiological relevance linked to altered cAMP concentrations.

      Strengths:

      Well thought out experiments. The authors carefully planned the experiments well to uncover the molecular aspects of it diligently.

      We thank the reviewer for these extremely encouraging comments.

      Weaknesses:

      Some fresh queries were made based on the author's previous responses and hope to get satisfactory answers this time.

      We provide below a point-by-point response to the fresh queries.

      (2) Line 134: please describe the complementation strain features as it is mentioned for the first time (plasmid, copy number, promoter etc.) in the manuscript. Especially under NO stress what could be the authors' justification regarding the high cAMP concentration in the complementation strain?

      As recommended by the reviewer, the details of construction of the complemented strain have been incorporated in the 'Materials and Methods' section of the revised manuscript (duplicated below): "To complement phoPR expression, pSM607 containing a 3.6-kb DNA fragment of M. tuberculosis phoPR including 200-bp phoP promoter region, a hygromycin resistance cassette, attP site and the gene encoding phage L5 integrase, as detailed earlier (Walters et al., 2006) was used to transform phoPR mutant to integrate at the L5 attB site.

      " To address the reviewer's other concern, we have now included the following sentence in the 'Results' section of the revised manuscript (duplicated below): "A higher cAMP level in the complemented strain under NO stress is possibly attributable to reproducibly higher phoP expression in the complemented mutant under specific stress condition (Khan et al., 2022)."

      Reference: Khan et al. (2022) Convergence of two global regulators to coordinate expression of essential virulence determinants of Mycobacterium tuberculosis. eLife 2022, 11:e80965.

      New query: The complemented gene (in pSM607 plasmid) becomes a single copy after chromosomal integration, so it should ideally behave like a WT strain. How could authors still justify the high cAMP concentration under NO stress?

      We agree with the reviewer. We are unable to provide a cogent justification regarding this result. We speculate that PhoP is strikingly activated under NO stress by a non-canonical mechanism and strongly represses rv0805 expression. As a result, there is a significantly higher cAMP concentration in case of the complemented mutant under NO stress.

      (13) Line 292: There is a difference between red and green bars. Authors should do statistical analysis and then comment on whether overexpression of WT and mutant pde are different or similar, to me they are different; also, explain why the WT-Rv0805 strain is different than the phoPR-KO strain in the context of cell wall metabolism.

      As recommended by the reviewer, we have now included statistical significance of the data in the revised version, and modified the text accordingly in the manuscript.

      New query: Authors are asked to put a statistical significance test between WT-Rv0805 and WT-Rv0805M.

      We have included it in the modified figure. Also, to explain it we incorporated new text in the legend to Fig. 4C of the revised manuscript (duplicated below):

      “Note that similar to phoPR-KO, WT-Rv0805 shows a comparably higher sensitivity to CHP relative to WT bacilli. However, WT-Rv0805M expressing a mutant Rv0805, shows a significantly lower sensitivity to CHP relative to WT-Rv0805, as measured by the corresponding CFU values.”

      (14) Line 299-303: Authors should explain how the colocalization % are calculated. Also, in the figure 4D merge panel please highlight the difference.

      As suggested by the reviewer, we have now explained the methodology used to calculate percent colocalization in greater details. Also, we have modified Figure 4D to highlight the difference between samples shown in merge panel. Please see our response to comment # 33 from the Reviewer 1.

      New query: In the figure legend it should be mentioned that the white arrow indicates non-co-localization which is visibly higher in WT and WT Rvo805M.

      We thank the reviewer for this very important suggestion. We have now included the following text in the legend to Fig. 4D of the revised manuscript.

      “White arrowheads in the merge panels indicate non-colocalization, which remains higher in WT-H37Rv and WT-Rv0805M relative to phoPR-KO or WT-Rv0805.”

  5. docdrop.org docdrop.org
    1. The older you get, the worse it is

      I did not previously think about how age impacts the way you experience poverty, but I can see how this may be true. When we are little kids, we are still unaware of a lot of the different facets of identity that set us apart, and are more likely to be open minded to more things-- we are still very easily impressionable. As we become older though, and cliques form, there is a way clearer understanding of what may be deemed as cool or desirable for teenagers and what is embarrassing or something to be ashamed of.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Editor’s summary:

      This paper by Castello-Serrano et al. addresses the role of lipid rafts in trafficking in the secretory pathway. By performing carefully controlled experiments with synthetic membrane proteins derived from the transmembrane region of LAT, the authors describe, model and quantify the importance of transmembrane domains in the kinetics of trafficking of a protein through the cell. Their data suggest affinity for ordered domains influences the kinetics of exit from the Golgi. Additional microscopy data suggest that lipid-driven partitioning might segregate Golgi membranes into domains. However, the relationship between the partitioning of the synthetic membrane proteins into ordered domains visualised ex vivo in GPMVs, and the domains in the TGN, remain at best correlative. Additional experiments that relate to the existence and nature of domains at the TGN are necessary to provide a direct connection between the phase partitioning capability of the transmembrane regions of membrane proteins and the sorting potential of this phenomenon.

      The authors have used the RUSH system to study the traffic of model secretory proteins containing single-pass transmembrane domains that confer defined affinities for liquid ordered (lo) phases in Giant Plasma Membrane derived Vesicles (GPMVs), out of the ER and Golgi. A native protein termed LAT partitioned into these lo-domains, unlike a synthetic model protein termed LAT-allL, which had a substituted transmembrane domain. The authors experiments provide support for the idea that ER exit relies on motifs in the cytosolic tails, but that accelerated Golgi exit is correlated with lo domain partitioning.

      Additional experiments provided evidence for segregation of Golgi membranes into coexisting lipid-driven domains that potentially concentrate different proteins. Their inference is that lipid rafts play an important role in Golgi exit. While this is an attractive idea, the experiments described in this manuscript do not provide a convincing argument one way or the other. It does however revive the discussion about the relationship between the potential for phase partitioning and its influence on membrane traffic.

      We thank the editors and scientific reviewers for thorough evaluation of our manuscript and for positive feedback. While we agree that our experimental findings present a correlation between trafficking rates and raft affinity, in our view, the synthetic, minimal nature of the transmembrane protein constructs in question makes a strong argument for involvement of membrane domains in their trafficking. These constructs have no known sorting determinants and are unlikely to interact directly with trafficking proteins in cells, since they contain almost no extramembrane amino acids. Yet, the LATTMD traffics through Golgi similarly to the full-length LAT protein, but quite different from mutants with lower raft phase affinity. We suggest that these observations can be best rationalized by involvement of raft domains in the trafficking fates and rates of these constructs, providing strong evidence (beyond a simple correlation) for the existence and relevance of such domains.

      We have substantially revised the manuscript to address all reviewer comments, including several new experiments and analyses. These revisions have substantially improved the manuscript without changing any of the core conclusions and we are pleased to have this version considered as the “version of record” in eLife.

      Below is our point-by-point response to all reviewer comments.

      ER exit:

      The experiments conducted to identify an ER exit motif in the C-terminal domain of LAT are straightforward and convincing. This is also consistent with available literature. The authors should comment on whether the conservation of the putative COPII association motif (detailed in Fig. 2A) is significantly higher than that of other parts of the C-terminal domain.

      Thank you for this suggestion, this information has now been included as Supp Fig 2B. While there are other wellconserved residues of the LAT C-terminus, many regions have relatively low conservation. In contrast, the essential residues of the COPII association motif (P148 and A150) are completely conserved across in LAT across all species analyzed.

      One cause of concern is that addition of a short cytoplasmic domain from LAT is sufficient to drive ER exit, and in its absence the synthetic constructs are all very slow. However, the argument presented that specific lo phase partitioning behaviour of the TMDs do not have a significant effect on exit from the ER is a little confusing. This is related to the choice of the allL-TMD as the 'non-lo domain' partitioning comparator. Previous data has shown that longer TMDs (23+) promote ER export (eg. Munro 91, Munro 95, Sharpe 2005). The mechanism for this is not, to my knowledge, known. One could postulate that it has something to do with the very subject of this manuscript- lipid phase partitioning. If this is the case, then a TMD length of 22 might be a poor choice of comparison. A TMD 17 Ls' long would be a more appropriate 'non-raft' cargo. It would be interesting to see a couple of experiments with a cargo like this.

      The basis for the claim that raft affinity has relatively minor influence on ER exit kinetics, especially in comparison to the effect of the putative COPII interaction motif, is in Fig 1G. We do observe some differences between constructs and they may be related to raft affinity, however we considered these relatively minor compared to the nearly 4-fold increase in ER efflux induced by COPII motifs.

      We have modified the wording in the manuscript to avoid the impression that we have ruled out an effect of raft affinity of ER exit.

      We believe that our observations are broadly consistent with those of Munro and colleagues. In both their work and ours, long TMDs were able to exit the ER. In our experiments, this was true for several proteins with long TMDs, either as fulllength or as TMD-only versions (see Fig 1G). We intentionally did not measure shorter synthetic TMDs because these would not have been comparable with the raft-preferring variants, which all require relatively long TMDs, as demonstrated in our previous work1,2. Thus, because our manuscript does not make any claims about the influence of TMD length on trafficking, we did not feel that experiments with shorter non-raft constructs would substantively influence our conclusions.

      However, to address reviewer interest, we did complete one set of experiments to test the effect of shortening the TMD on ER exit. We truncated the native LAT TMD by removing 6 residues from the C-terminal end of the TMD (LAT-TMDd6aa). This construct exited the ER similarly to all others we measured, revealing that for this set of constructs, short TMDs did not accumulate in the ER. ER exit of the truncated variant was slightly slower than the full-length LAT-TMD, but somewhat faster than the allL-TMD. These effects are consistent with our previous measurements with showed that this shortened construct has slightly lower raft phase partitioning than the LAT-TMD but higher than allL2. While these are interesting observations, a more thorough exploration of the effect of TMD length would be required to make any strong conclusion, so we did not include these data in the final manuscript.

      Author response image 1.

      Golgi exit:

      For the LAT constructs, the kinetics of Golgi exit as shown in Fig. 3B are surprisingly slow. About half of the protein Remains in the Golgi at 1 h after biotin addition. Most secretory cargo proteins would have almost completely exited the Golgi by that time, as illustrated by VSVG in Fig. S3. There is a concern that LAT may have some tendency to linger in the Golgi, presumably due to a factor independent of the transmembrane domain, and therefore cannot be viewed as a good model protein. For kinetic modeling in particular, the existence of such an additional factor would be far from ideal. A valuable control would be to examine the Golgi exit kinetics of at least one additional secretory cargo.

      We disagree that LAT is an unusual protein with respect to Golgi efflux kinetics. In our experiments, Golgi efflux of VSVG was similar to full-length LAT (t1/2 ~ 45 min), and both of these were similar to previously reported values3. Especially for the truncated (i.e. TMD) constructs, it is very unlikely that some factor independent of their TMDs affects Golgi exit, as they contain almost no amino acids outside the membrane-embedded TMD.

      Practically, it has proven somewhat challenging to produce functional RUSH-Golgi constructs. We attempted the experiment suggested by the reviewer by constructing SBP-tagged versions of several model cargo proteins, but all failed to trap in the Golgi. We speculate that the Golgin84 hook is much more sensitive to the location of the SBP on the cargo, being an integral membrane protein rather than the lumenal KDEL-streptavidin hook. This limitation can likely be overcome by engineering the cargo, but we did not feel that another control cargo protein was essential for the conclusions we presented, thus we did not pursue this direction further.

      Comments about the trafficking model

      (1) In Figure 1E, the export of LAT-TMD from the ER is fitted to a single-exponential fit that the authors say is "well described". This is unclear and there is perhaps something more complex going on. It appears that there is an initial lag phase and then similar kinetics after that - perhaps the authors can comment on this?

      This is a good observation. This effect is explainable by the mechanics of the measurement: in Figs 1 and 2, we measure not ‘fraction of protein in ER’ but ‘fraction of cells positive for ER fluorescence’. This is because the very slow ER exit of the TMD-only constructs present a major challenge for live-cell imaging, so ER exit was quantified on a population level, by fixing cells at various time points after biotin addition and quantifying the fraction of cells with observable ER localization (rather than tracking a single cell over time).

      For fitting to the kinetic model (which attempts to describe ‘fraction in ER/Golgi’) we re-measured all constructs by livecell imaging (see Supp Fig 5) to directly quantify relative construct abundance in the ER or Golgi. These data did not have the plateau in Fig 1E, suggesting that this is an artifact of counting “ER positive cells” which would be expected to have a longer lag than “fraction of protein in ER”. Notably however, t1/2 measured by both methods was similar, suggesting that the population measurement agrees well with single-cell live imaging.

      We have included all these explanations and caveats in the manuscript. We have also changed the wording from “well described” to “reasonably approximated”.

      (2) The model for Golgi sorting is also complicated and controversial, and while the authors' intention to not overinterpreting their data in this regard must be respected, this data is in support of the two-phase Golgi export model (Patterson et al PMID:18555781).

      The reviewers are correct, our observations and model are consistent with Patterson et al and it was a major oversight that a reference to this foundational work was not included. We have now added a discussion regarding the “two phase model” of Patterson and Lippincott-Schwartz.

      Furthermore contrary to the statement in lines 200-202, the kinetics of VSVG exit from the Golgi (Fig. S3) are roughly linear and so are NOT consistent with the previous report by Hirschberg et al.

      Regarding kinetics of VSVG, our intention was to claim that the timescale of VSVG efflux from the Golgi was similar to previously reported in Hirschberg, i.e. t1/2 roughly between 30-60 minutes. We have clarified this in the text. Minor differences in the details between our observations and Hirschberg are likely attributable to temperature, as those measurements were done at 32°C for the tsVSVG mutant.

      Moreover, the kinetics of LAT export from the Golgi (Fig. 3B) appear quite different, more closely approximating exponential decay of the signal. These points should be described accurately and discussed.

      Regarding linear versus exponential fits, we agree that the reality of Golgi sorting and efflux is far more complicated than accounted for by either the phenomenological curve fitting in Figs 1-3 or the modeling in Fig 4. In addition to the possibility of lateral domains within Golgi stacks, there is transport between stacks, retrograde traffic, etc. The fits in Figs 1-3 are not intended to model specifics of transport, but rather to be phenomenological descriptors that allowed us to describe efflux kinetics with one parameter (i.e. t1/2). In contrast, the more refined kinetic modeling presented in Figure 4 is designed to test a mechanistic hypothesis (i.e. coexisting membrane domains in Golgi) and describes well the key features of the trafficking data.

      Relationship between membrane traffic and domain partitioning:

      (1) Phase segregation in the GPMV is dictated by thermodynamics given its composition and the measurement temperature (at low temperatures 4degC). However at physiological temperatures (32-37degC) at which membrane trafficking is taking place these GPMVs are not phase separated. Hence it is difficult to argue that a sorting mechanism based solely on the partitioning of the synthetic LAT-TMD constructs into lo domains detected at low temperatures in GPMVs provide a basis (or its lack) for the differential kinetics of traffic of out of the Golgi (or ER). The mechanism in a living cell to form any lipid based sorting platforms naturally requires further elaboration, and by definition cannot resemble the lo domains generated in GPMVs at low temperatures.

      We thank the reviewers for bringing up this important point. GPMVs are a useful tool because they allow direct, quantitative measurements of protein partitioning between coexisting ordered and disordered phases in complex, cell-derived membranes. However, we entirely agree, that GPMVs do not fully represent the native organization of the living cell plasma membrane and we have previously discussed some of the relevant differences4,5. Despite these caveats, many studies have supported the cellular relevance of phase separation in GPMVs and the partitioning of proteins to raft domains therein 6-9. Most notably, elegant experiments from several independent labs have shown that fluorescent lipid analogs that partition to Lo domains in GPMVs also show distinct diffusive behaviors in live cells 6,7, strongly suggesting the presence of nanoscopic Lo domains in live cells. Similarly, our recent collaborative work with the lab of Sarah Veatch showed excellent agreement between raft preference in GPMVs and protein organization in living immune cells imaged by super-resolution microscopy10. Further, several labs6,7, including ours11, have reported nice correlations between raft partitioning in GPMVs and detergent resistance, which is a classical (though controversial) assay for raft association.

      Based on these points, we feel that GPMVs are a useful tool for quantifying protein preference for ordered (raft) membrane domains and that this preference is a useful proxy for the raft-associated behavior of these probes in living cells. We propose that this approach allows us to overcome a major reason for the historical controversy surrounding the raft field: nonquantitative and unreliable methodologies that prevented consistent definition of which proteins are supposed to be present in lipid rafts and why. Our work directly addresses this limitation by relating quantitative raft affinity measurements in a biological membrane with a relevant and measurable cellular outcome, specifically inter-organelle trafficking rates.

      Addressing the point about phase transition temperatures in GPMVs: this is the temperature at which macroscopic domains are observed. Based on physical models of phase separation, it has been proposed that macroscopic phase separation at lower temperatures is consistent sub-microscopic, nanoscale domains at higher temperatures8,12. These smaller domains can potentially be stabilized / functionalized by protein-protein interactions in cells13 that may not be present in GPMVs (e.g. because of lack of ATP).

      (2) The lipid compositions of each of these membranes - PM, ER and Golgi are drastically different. Each is likely to phase separate at different phase transition temperatures (if at all). The transition temperature is probably even lower for Golgi and the ER membranes compared to the PM. Hence, if the reported compositions of these compartments are to be taken at face value, the propensity to form phase separated domains at a physiological temperature will be very low. Are ordered domains even formed at the Golgi at physiological temperatures?

      It is a good point that the membrane compositions and the resulting physical properties (including any potential phase behavior) will be very different in the PM, ER, and Golgi. Whether ordered domains are present in any of these membranes in living cells remains difficult to directly visualize, especially for non-PM membranes which are not easily accessible by probes, are nanoscopic, and have complex morphologies. However, the fact that raft-preferring probes / proteins share some trafficking characteristics, while very similar non-raft mutants behave differently argues that raft affinity plays a role in subcellular traffic.

      (3) The hypothesis of 'lipid rafts' is a very specific idea, related to functional segregation, and the underlying basis for domain formation has been also hotly debated. In this article the authors conflate thermodynamic phase separation mechanisms with the potential formation of functional sorting domains, further adding to the confusion in the literature. To conclude that this segregation is indeed based on lipid environments of varying degrees of lipid order, it would probably be best to look at the heterogeneity of the various membranes directly using probes designed to measure lipid packing, and then look for colocalization of domains of different cargo with these domains.

      This is a very good suggestion, and a direction we are currently following. Unfortunately, due to the dynamic nature and small size of putative lateral membrane domains, combined with the interior of a cell being filled with lipophilic environments that overlay each other, directly imaging domains in organellar membranes with lipid packing probes remains extremely difficult with current technology (or at least available to us). We argue that the TMD probes used in this manuscript are a reasonable alternative, as they are fluorescent probes with validated selectivity for membrane compartments with different physical properties.

      Ultimately, the features of membrane domains suggested by a variety of techniques – i.e. nanometric, dynamic, relatively similar in composition to the surrounding membrane, potentially diverse/heterogeneous – make them inherently difficult to microscopically visualize. This is one reason why we believe studies like ours, which use a natural model system to directly quantify raft-associated behaviors and relate them to cellular effects (in our case, protein sorting), are a useful direction for this field.

      We believe we have been careful in our manuscript to avoid confusing language surrounding lipid rafts, phase separation, etc. Our experiments clearly show that mammalian membranes have the capacity to phase separate, that some proteins preferentially interact with more ordered domains, and that this preference is related to the subcellular trafficking fates and rates of these proteins. We have edited the manuscript to emphasize these claims and avoid the historical controversies and confusions.

      (4) In the super-resolution experiments (by SIM- where the enhancement of resolution is around two fold or less compared to optical), the authors are able to discern a segregation of the two types of Golgi-resident cargo that have different preferences for the lo-domains in GPMVs. It should be noted that TMD-allL and the LATallL end up in the late endosome after exit of the Golgi. Previous work from the Bonafacino laboratory (PMID: 28978644) has shown that proteins (such as M6PR) destined to go to the late endosome bud from a different part of the Golgi in vesicular carriers, while those that are destined for the cell surface first (including TfR) bud with tubular vesicular carriers. Thus at the resolution depicted in Fig 5, the segregation seen by the authors could be due to an alternative explanation, that these molecules are present in different areas of the Golgi for reasons different from phase partitioning. The relatively high colocalization of TfR with the GPI probe in Fig 5E is consistent with this explanation. TfR and GPI prefer different domains in the GPMV assays yet they show a high degree of colocalization and also traffic to the cell surface.

      This is a good point. Even at microscopic resolutions beyond the optical diffraction limit, we cannot make any strong claims that the segregation we observe is due to lateral lipid domains and not several reasonable alternatives, including separation between cisternae (rather than within), cargo vesicles moving between cisternae, or lateral domains that are mediated by protein assemblies rather than lipids. We have explicitly included this point in the Discussion: “Our SIM imaging suggests segregation of raft from nonraft cargo in the Golgi shortly (5 min) after RUSH release (Fig 5B), but at this level of resolution, we can only report reduced colocalization, not intra-Golgi protein distributions. Moreover, segregation within a Golgi cisterna would be very difficult to distinguish from cargo moving between cisternae at different rates or exiting via Golgi-proximal vesicles.”

      We have also added a similar caveat in the Results section of the manuscript: “These observations support the hypothesis that proteins can segregate in Golgi based on their affinity for distinct membrane domains; however, it is important to emphasize that this segregation does not necessarily imply lateral lipid-driven domains within a Golgi cisterna. Reasonable alternative possibilities include separation between cisternae (rather than within), cargo vesicles moving between cisternae, or lateral domains that are mediated by protein assemblies rather than lipids.”

      Finally, while probes with allL TMD do eventually end up in late endosomes (consistent with the Bonifacino lab’s findings which we include), they do so while initially transiting the PM2,11.

      Minor concerns:

      (1) Generally, the quantitation is high quality from difficult experimental data. Although a lot appears to be manual, it appears appropriately performed and interpreted. There are some claims that are made based on this quantitation, however, where there are no statistics performed. For example, figure 1B. Any quantitation with an accompanying conclusion should be subject to a statistical test. I think the quality of the model fits- this is particularly important.

      We appreciate the thoughtful feedback, the quantifications and fits were not trivial, but we believe important. We have added statistical significance to Figure 1B and others where it was missing.

      (2) Modulation of lipid levels in Fig 4E shows a significant change for the trafficking rate for the LAT-TMD construct and a not so significant change for all-TMD construct. However, these data are not convincing and appear to depend on a singular data point that appears to lower the mean value. In general, the experiment with the MZA inhibitor (Fig. 4D-F) is hard to interpret because cells will likely be sick after inhibition of sphingolipid and cholesterol synthesis. Moreover, the difference in effects for LAT-TMD and allL-TMD is marginal.

      We disagree with this interpretation. Fig 4E shows the average of three experiments and demonstrates clearly that the inhibitors change the Golgi efflux rate of LAT-TMD but not allL-TMD. This is summarized in the t1/2 quantifications of Fig 4F, which show a statistically significant change for LAT-TMD but not allL-TMD. This is not an effect of a singular data point, but rather the trend across the dataset.

      Further, the inhibitor conditions were tuned carefully to avoid cells becoming “sick”: at higher concentrations, cells did adopt unusual morphologies and began to detach from the plates. We pursued only lower concentrations, which cells survived for at least 48 hrs and without major morphological changes.

      (3) Line 173: 146-AAPSA-152 should read either 146-AAPSA-150 or 146-AAPSAPA-152, depending on what the authors intended.

      Thanks for the careful reading, we intended the former and it has been fixed.

      (4) What is the actual statistical significance in Fig. 3C and Fig. 3E? There is a single asterisk in each panel of the figure but two asterisks in the legend.

      Apologies, a single asterisk representing p<0.05 was intended. It has been fixed.

      (5) The code used to calculate the model. is not accessible. It is standard practice to host well-annotated code on Github or similar, and it would be good to have this publicly available.

      We have deposited the code on a public repository (doi: 10.5281/zenodo. 10478607) and added a note to the Methods.

      (1) Lorent, J. H. et al. Structural determinants and func7onal consequences of protein affinity for membrane ra=s. Nature communica/ons 8, 1219 (2017).PMC5663905

      (2) Diaz-Rohrer, B. B., Levental, K. R., Simons, K. & Levental, I. Membrane ra= associa7on is a determinant of plasma membrane localiza7on. Proc Natl Acad Sci U S A 111, 8500-8505 (2014).PMC4060687

      (3) Hirschberg, K. et al. Kine7c analysis of secretory protein traffic and characteriza7on of golgi to plasma membrane transport intermediates in living cells. J Cell Biol 143, 1485-1503 (1998).PMC2132993

      (4) Levental, K. R. & Levental, I. Giant plasma membrane vesicles: models for understanding membrane organiza7on. Current topics in membranes 75, 25-57 (2015)

      (5) Sezgin, E. et al. Elucida7ng membrane structure and protein behavior using giant plasma membrane vesicles. Nat Protoc 7, 1042-1051 (2012)

      (6) Komura, N. et al. Ra=-based interac7ons of gangliosides with a GPI-anchored receptor. Nat Chem Biol 12, 402-410 (2016)

      (7) Kinoshita, M. et al. Ra=-based sphingomyelin interac7ons revealed by new fluorescent sphingomyelin analogs. J Cell Biol 216, 1183-1204 (2017).PMC5379944

      (8) Stone, M. B., Shelby, S. A., Nunez, M. F., Wisser, K. & Veatch, S. L. Protein sor7ng by lipid phase-like domains supports emergent signaling func7on in B lymphocyte plasma membranes. eLife 6 (2017).PMC5373823

      (9) Machta, B. B. et al. Condi7ons that Stabilize Membrane Domains Also Antagonize n-Alcohol Anesthesia. Biophys J 111, 537-545 (2016)

      (10) Shelby, S. A., Castello-Serrano, I., Wisser, I., Levental, I. & S., V. Membrane phase separa7on drives protein organiza7on at BCR clusters. Nat Chem Biol in press (2023)

      (11) Diaz-Rohrer, B. et al. Rab3 mediates a pathway for endocy7c sor7ng and plasma membrane recycling of ordered microdomains Proc Natl Acad Sci U S A 120, e2207461120 (2023)

      (12) Veatch, S. L. et al. Cri7cal fluctua7ons in plasma membrane vesicles. ACS Chem Biol 3, 287-293 (2008)

      (13) Wang, H. Y. et al. Coupling of protein condensates to ordered lipid domains determines func7onal membrane organiza7on. Science advances 9, eadf6205 (2023).PMC10132753

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendation for the authors):

      (1) On a few occasions, I found that the authors would introduce a concept, but provide evidence much later on. For example, in line 57, they introduced the idea that feedback timing modulates engagement of the hippocampus and striatum, but they provided the details much later on around line 99. There are a few instances like these, and the authors may want to go through the manuscript critically to bridge such gaps to improve the flow of reading.

      First, we thank the reviewer for acknowledging the contribution of our study and the methodological choices. We acknowledge the concern raised about the flow of information in the introduction. We have critically reviewed the manuscript, especially on writing style and overall structure, to ensure a smoother transition between the introduction of concepts and the provision of supporting evidence. In the case of the concept of feedback timing and memory systems, lines 46-58 first introduce the concept enhanced with evidence regarding adults, and we then pick up the concept around line 103 again to relate it to children and their brain development to motivate our research question. To further improve readability, we have included an outline of what to expect in the introduction. Specifically, we added a sentence in line 66-68 that provides an overview of the different paragraphs: “We will introduce the key parameters in reinforcement learning and then we review the existing literature on developmental trajectories in reinforcement learning as well as on hippocampus and striatum, our two brain regions of interest.”

      This should prepare the reader better when to expect more evidence regarding the concepts introduced. We included similar “road-marker” outline sentences in other occasions the reviewer commented on, to enhance consistency and readability.

      (2) I am curious as to how they think the 5-second delay condition maps onto real-life examples, for example in a classroom setting feedback after 5 seconds could easily be framed as immediate feedback.

      The authors may want to highlight a few illustrative examples.

      Thank you for asking about the practical implications of a 5-second delay condition, which may be very relevant to the reader. We have modified the introduction example in line 39-41 towards the role of feedback timing in the classroom to point out its practical relevance early on: “For example, children must learn to raise their hand before speaking during class. The teacher may reinforce this behavior immediately or with a delay, which raises the question whether feedback timing modulates their learning”.

      We have also expanded a respective discussion point in lines 720-728 to pick up the classroom example and to illustrate how we think timescale differences may apply: “In scenarios such as in the classroom, a teacher may comment on a child’s behavior immediately after the action or some moments later, in par with our experimental manipulation of 1 second versus 5 seconds. Within such short range of delay in teachers’ feedback, children’s learning ability during the first years of schooling may function equally well and depend on the striatal-dependent memory system. However, we anticipate that the reliance on the hippocampus will become even more pronounced when feedback is further delayed for longer time. Children’s capacity for learning over longer timescales relies on the hippocampal-dependent memory system, which is still under development. This knowledge could help to better structure learning according to their development.”

      (3) In the methods section, there are a few instances of task description discrepancies which make things a little bit confusing, for example, line 173 reward versus punishment, or reward versus null elsewhere e.g. line 229. In the same section, line 175, there are a few instances of typos.

      We appreciate your attention to detail in pointing out discrepancies in task descriptions and typos in the method section. We have revised the section, corrected typos, and now phrased the learning outcomes consistently as “reward” and “punishment”.

      (4). I wasn't very clear as to why the authors did not compute choice switch probability directly from raw data but implemented this as a model that makes use of a weight parameter. Former would-be much easier and straightforward for data plotting especially for uninformed readers, i.e., people who do not have backgrounds in computational modelling.

      Thank you for asking for clarification on the calculation of switching behavior. Indeed, in the behavioral results, switching behavior was directly calculated from the raw data. We now stressed this in the methods in lines 230-235, also by naming win-stay and lose-shift as “proportions” instead of as “probabilities”:“As a first step, we calculated learning outcomes diretly from the raw data, which where learning accuracy, win-stay and lose-shift behavior as well as reaction time.

      Learning accuracy was defined as the proportion to choose the more rewarding option, while win-stay and lose-shift refer to the proportion of staying with the previously chosen option after a reward and switching to the alternative choice after receiving a punishment, respectively.”

      In contrast to the raw data switching behavior, the computational heuristic strategy model indeed uses a weight for a relative tendency of switching behavior. We have also stressed the advantage of the computational measure and its difference to the raw data switching behavior in lines 248-252 and believe that the reader can now clearly distinguish between the raw data and the computational results: “Note that these model-based outcomes are not identical to the win-stay and lose-shift behavior that were calculated from the raw data. The use of such model-based measure offers the advantage in discerning the underlying hidden cognitive process with greather nuance, in contrast to classical approaches that directly use raw behavioral data.”

      (5) I agree with the authors' assertion that both inverse temperature and outcome sensitivity parameters may lead to non-identifiability issues, but I was not 100% convinced about their modelling approach exclusively assessing a different family of models (inv temperature versus outcome sensitivity). Here, I would like to make one mid-way recommendation. They may want to redefine the inverse temperature term in terms of reaction time, i.e., B=exp^(s+g(RT-mean (RT)) where s and g are free parameters (see Webb, 2019), and keep the outcome sensitivity parameter in the model with bounds [0,2] so that the interpretation could be % increase or decrease in actual outcome. Personally, in tasks with binary outcomes i.e. [0,1: null vs reward] I do not think outcome sensitivity parameters higher than 2 are interpretable as these assign an inflated coefficient to outcomes.

      We appreciate the mid-way recommendation regarding the modeling approach for inverse temperature and outcome sensitivity parameters. We have carefully revised our analysis approach by considering alternative modeling choices. Regarding the suggestion to redefine the inverse temperature in terms of reaction time by B=exp^(s+g(RT-mean (RT)), we unfortunately were not able to identify the reference Webb (2019), nor did we find references to the suggested modeling approach. Any further information that the reviewer could provide will be greatly appreciated. Regardless, we agree that including reaction times through the implementation of drift-diffusion modeling may be beneficial. However, changing the inverse temperature model in such a way would necessitate major changes in our modeling approach, which unfortunately would result in non-convergence issues in our MCMC pipeline using Rstan. Hence, this approach goes beyond the scope of the manuscript. Nonetheless, we have decided to mention the use of a drift-diffusion model, along with other methodological considerations, as future recommendation for disentangling outcome sensitivity from inverse temperature in lines 711-712: “Future studies might shed new light by examining neural activations at both task phases, by additionally modeling reaction times using a drift-diffusion approach, or by choosing a task design that allows independent manipulations of these phases and associated model parameters, e.g., by using different reward magnitudes during reinforcement learning, or by studying outcome sensitivity without decisionmaking.“

      Regarding the upper bound of outcome sensitivity, we agree that traditionally, limiting the parameter values at 2 is the choice for the parameter to be best interpretable. During model fitting, we had experienced non-convergence issues and ceiling effects in the outcome sensitivity parameter when fixing the inverse temperature at 1. The non-convergence issue was not resolved when we fixed the inverse temperature at 15.47, which was the group mean of the winning inverse temperature family. Model convergence was only achieved after increasing the outcome sensitivity upper bound to 20, with inverse temperature again fixed at 1. Since this model also performed well during parameter and model recovery, we argue that the parameter is nevertheless meaningful, despite the more extreme trial-to-trial value fluctuations under higher outcome sensitivity. We described our choice for this model in the methods section in lines 282-288: “Even though outcome sensitivity is usually restricted to an upper bound of 2 to not inflate outcomes at value update, this configuration led to ceiling effects in outcome sensitivity and non-converging model results. Further, this issue was not resolved when we fixed the inverse temperature at the group mean of 15.47 of the winning inverse temperature family model. It may be that in children, individual differences in outcome sensitivity are more pronounced, leading to more extreme values. Therefore, we decided to extend the upper bound to 20, parallel to the inverse temperature, and all our models converged with Rhat < 1.1.”.

      (6) I think the authors reporting optimal parameters for the model is very important (line 464), but the learning rate they report under stable contingencies is much higher than LRs reported by for example Behrens et al 2007, LRs around 0.08 for the optimal learning behaviour. The authors may want to discuss why their task design calls for higher learning rates.

      Thank you for appreciating our optimal parameter analysis, and for the recommendation to discuss why optimal learning rates in our task design may call for higher learning rates compared to those reported in some other studies. As largely articulated in Zhang et al (2020; primer piece by one of our co-authors), the optimal parameter combination is determined by several factors, such as the reward schedule (e.g., 75:25, vs 80:20) and task design (e.g., no reversal, one reversal, vs multiple reversal) and number of trials (e.g., 80, vs 100, vs, 120). Notably, in these taskrelated regards, our task is different from Behrens et al. (2007), which hinders a quantitative comparison among the optimal parameters in the two tasks. We have now included more details in our discussion in lines 643-656: “However, the differences in learning rate across studies have to be interpreted with caution. The differences in the task and the analysis approach may limit their comparability. Task proporties such as the trial number per condition differed across studies. Our study included 32 trials per cue in each condition, while in adult studies, the trials per condition ranged from 28 to 100. Optimal learning rates in a stable learning environment were at around 0.25 for 10 to 30 trials, another study reported a lower optimal learning rate of around 0.08 for 120 trials. This may partly explain why in our case of 32 trials per condition and cue, optimal learning rates called for a relatively high optimal learning rate of 0.29, while in other studies, optimal learning rates may be lower. Regarding differences in the analysis approach, the hierarchical bayesian estimation approach used in our study produces more reliable results in comparison to maximum likelihood estimation, which had been used in some of the previous adult studies and may have led to biased results towards extreme values. Taken together, our study underscores the importance of using longitudinal data to examine developmental change as well as the importance of simulation-based optimal parameters to interpret the direction of developmental change.”

      (7) The authors may want to report degrees of freedom in t-tests so that it would be possible to infer the final sample size for a specific analysis, for example, line 546.

      We appreciate the recommendation to include degrees of freedom, which are now added in all t-test results, for example in line 579: “Episodic memory, as measured by individual corrected object recognition memory (hits - false alarms) of confident (“sure”) ratings, showed at trend better memory for items shown in the delayed feedback condition (𝛽!""#$%&’(#")%*"# = .009, SE =.005, t(df = 137) = 1.80, p = .074, see Figure 5A).”

      (8) I'm not sure why reductions in lose shift behaviour are framed as an improvement between 2 assessment points, e.g. line 578. It all depends on the strength of the contingency so a discussion around this point should be expanded.

      We acknowledge that a reduction in lose-shift behavior only reflect improvements under certain conditions where uncertainty is low and the learning contingencies are stable, which is the case in our task. We have added Supplementary Material 4 to illustrate the optimality of win-stay and lose-shift proportions from model simulation and to confirm that children’s longitudinal development was indeed towards more optimal switching behavior. In the manuscript, we refer to these results in lines 488-490: “We further found that the average longitudinal change in win-stay and lose-shift proportion also developed towards more optimal value-based learning (Supplementary Material 4).”

      (9) If I'm not mistaken, the authors reframe a trend-level association as weak evidence. I do not think this is an accurate framing considering the association is strictly non-significant, therefore should be omitted line 585.

      We thank for the point regarding the interpretation of a trend-level association as weak evidence. We changed our interpretation, corrected in lines 581-585: “The inclusion of poor learners in the complete dataset may have weakend this effect because their hippocampal function was worse and was not involved in learning (nor encoding), regardless of feedback timing. To summarize, there was inconclusive support for enhanced episodic memory during delayed compared to immediate feedback, calling for future study to test the postulation of a selective association between hippocampal volume and delayed feedback learning.” as well as lines 622-623: “Contrary to our expectations, episodic memory performance was not enhanced under delayed feedback compared to immediate feedback.”

      Reviewer # 2 (Public Review):

      We thank the reviewer for acknowledging the strength of our study and pointing out its weaknesses.

      Weaknesses:

      There were a few things that I thought would be helpful to clarify. First, what exactly are the anatomical regions included in the striatum here?

      We appreciate the clarification question regarding the anatomical regions included in the striatum. The striatum included ventral and dorsal regions, i.e., accumbens, caudate and putamen. We have now specified the anatomical regions that were included in the striatum in lines 211-212: “We extracted the bilateral brain volumes for our regions of interest, which were striatum and hippocampus. The striatum regions included nucleus accumbens, caudate and putamen.”

      Second, it was mentioned that for the reduced dataset, object recognition memory focused on "sure" ratings. This seems like the appropriate way to do it, but it was not clear whether this was also the case for the full analyses in the main text.

      Thank you for pointing out that in the full dataset analysis, the use of “sure” ratings for object recognition memory was previously not mentioned. Including only “sure” ratings was used consistently across analyses. This detail is now described under methods in lines 332-333: “Only confident (“sure”) ratings were included in the analysis, which were 98.1 % of all given responses.”

      Third, the children's fitted parameters were far from optimal; is it known whether adults would be closer to optimal on the task?

      We thank for your question on whether adult learning rates in the task have been reported to be more optimal than those of the children in our study. This indeed seems to be the case, and we added this point in our discussion in line 639-643: “Adult studies that examined feedback timing during reinforcement learning reported average learning rates range from 0.12 to 0.34, which are much closer to the simulated optimal learning rates of 0.29 than children’s average learning rates of 0.02 and 0.05 at wave 1 and 2 in our study. Therefore, it is likely that individuals approach adult-like optimal learning rates later during adolescence.”

      The main thing I would find helpful is to better integrate the differences between the main results reported and the many additional results reported in the supplement, for example from the reduced dataset when excluding non-learners. I found it a bit challenging to keep track of all the differences with all the analyses and parameters. It might be helpful to report some results in tables side-by-side in the two different samples. And if relevant, discuss the differences or their implication in the Discussion. For example, if the patterns change when excluding the poor learners, in particular for the associations between delayed feedback and hippocampal volume, and those participants were also those less well fit by the value-based model, is that something to be concerned about and does that affect any interpretations? What was not clear to me is whether excluding the poor learners at one extreme simply weakens the general pattern, or whether there is a more qualitative difference between learners and non-learners. The discussion points to the relevance of deficits in hippocampaldependent learning for psychopathology and understanding such a distinction may be relevant.

      We appreciate the feedback that it might seem challenging to keep track of differences between the analyses of the full and the reduced dataset. We have now gathered all the analyses for the reduced dataset in Supplementary Material 6, with side-by-side tables for comparison to the full dataset results. Whenever there were differences between the results, they were pointed out in the results section, see lines 557-560: “In the results of the reduced dataset, the hippocampal association to the delayed learning score was no longer significant, suggesting a weakened pattern when excluding poor learners (Supplementary Material 6). It is likely that the exclusion reduced the group variance for hippocampal volume and delayed learning score in the model.” and lines 579-581: “Note that in the reduced dataset, delayed feedback predicted enhanced item memory significantly (Supplementary Material 6).”

      The found differences were further included in our discussion in lines 737-740 in the context of deficits in hippocampal-dependent learning and psychopathology: “Interestingly, poor learners showed relatively less value-based learning in favor of stronger simple heuristic strategies, and excluding them modulated the hippocampal-dependent associations to learning and memory in our results. More studies are needed to further clarify the relationship between hippocampus and psychopathology during cognitive and brain development.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) There appears to be a flaw in the exploration of cortical inputs. the authors never show that HFS of cortical inputs has no effect in the absence of thalamic stimulation. It appears that there is a citation showing this, but I think it would be important to show this in this study as well.

      We understand that the reviewer would like us to induce an HFS protocol on cortical input and then test if there is any change in synaptic strength in thalamic input. We have done this experiment which shows that without a footshock, high-frequency stimulation (HFS) of the cortical inputs did not induce synaptic potentiation on the thalamic pathway (Extended Data Fig. 4d).

      (2) t is somewhat confusing that the authors refer to the cortical input as driving heterosynaptic LTP, but this is not shown until Figure 4J, that after non-associative conditioning (unpaired shock and tone) HFS of the cortex can drive freezing and heterosynaptic LTP of thalamic inputs.

      We agree with the reviewer that it is in figure 4j and figure 5,b,c which we show electrophysiological evidence for cortical input driving heterosynaptic LTP. It is only to be consistent with our terminology that initially we used behavioral evidence as the proxy for heteroLTP (figure 3c).

      …, the authors are 'surprised' by this outcome, which appears to be what they predict.

      We removed the phrase “To our surprise”.

      (3) 'Cortex' as a stimulation site is vague. The authors have coordinates they used, it is unclear why they are not using standard anatomical nomenclature.

      We replaced “cortex” with “auditory/associative cortex”.

      (4) The authors' repeated use of homoLTP and heteroLTP to define the input that is being stimulated makes it challenging to understand the experimental detail. While I appreciate this is part of the goal, more descriptive words such as 'thalamic' and 'cortical' would make this much easier to understand.

      We agree with the reviewer that a phrase such as “an LTP protocol on thalamic and cortical inputs” would be more descriptive. We chose the words “homoLTP” and “heteroLTP” only to clarify (for the readers) the physiological relevance of these protocols. We thought by using “thalamic” and “cortical” readers may miss this point. However, when for the first time we introduce the words “homoLTP” and “heteroLTP”, we describe which stimulated pathway each refers to.

      Reviewer #2 (Public Review):

      (1) …The experimental schemes in Figs. 1 and 3 (and Fig. 4e and extended data 4a,b) show that one group of animals was subjected to retrieval in the test context at 24 h, then received HFS, which was then followed by a second retrieval session. With this design, it remains unclear what the HFS impacts when it is delivered between these two 24 h memory retrieval sessions.

      We understand that the reviewer has raised the concern that the increase in freezing we observed after the HFS protocol (ex. Fig. 1b, the bar labeled as Wth+24hHFSth) could be caused or modulated by the recall prior to the HFS (Fig. 1a, top branch). To address this concern, in a new group of mice, 24 hours after weak conditioning, we induced the HFS protocol, followed by testing (that is, no testing prior to the HFS protocol). We observed that homoLTP was as effective in mice that were tested prior to the induction protocol as those that were not (Fig. 1b, Extended Data Fig. 1d,e).

      It would be nice to see these data parsed out in a clean experimental design for all experiments (in Figs 1, 3, and 4), that means 4 groups with different treatments that are all tested only once at 24 h, and the appropriate statistical tests (ANOVA). This would also avoid repeating data in different panels for different pairwise comparisons (Fig 1, Fig 3, Fig 4, and extended Fig 4).

      While we understand the benefit of the reviewer’s suggestion, the current presentation of the data was done to match the flow of the text and the delivery of the information throughout the manuscript. We think it is unlikely that the retrieval test prior to the HFS impacts its effectiveness, as confirmed by homosynaptic HFS data (Extended Data Fig. 1d,e). It is beyond the scope of current manuscript to investigate the mechanisms and manipulations related to reconsolidation and retrieval effects.

      (2) … It would be critical to know if LFPs change over 24 h in animals in which memory is not altered by HFS, and to see correlations between memory performance and LFP changes, as two animals displayed low freezing levels. … They would suggest that thalamo-LA potentiation occurs directly after learning+HFS (which could be tested) and is maintained over 24 h.

      We have performed the experiment where we recorded the evoked LFP 2hrs and 24hrs following the weak conditioning protocol. We observed that a weak conditioning protocol that was not followed by an optical LTP protocol on the cortical inputs failed to produce synaptic potentiation of the thalamic inputs (tested 2hrs and 24hrs after the LTP protocol; Extended Data Fig. 5d,e).

      (3) The statistical analyses need to be clarified. All statements should be supported with statistical testing (e.g. extended data 5c, pg 7 stats are missing). The specific tests should be clearly stated throughout. For ANOVAs, the post-hoc tests and their outcomes should be stated. In some cases, 2-way ANOVAs were performed, but it seems there is only one independent variable, calling for one-way ANOVA.

      All the statistical analyses have been revised and the post-hoc tests performed after the ANOVAs are mentioned in the relevant figure legends.

      Reviewer #2 (Recommendations For The Authors):

      The wording "transient" and "persistent" used here in the context of memory seems a bit misleading, as only one timepoint was assessed for memory recall (24 h), at which the memory strength (freezing levels) seem to change.

      As the reviewer mentioned, we have tested memory recall only at one time point. For this reason, throughout the text we used “transient” exclusively to refer to the experience (receiving footshock) and not to the memory. We replaced “persistence” with “stabilization” where it refers to a memory (“the induction of plasticity influences the stabilization of the memory”).

      For the procedures in which the CS and US were not paired, the term "unpairing" is used (which is probably the more adequate one), but the term "non-associative conditioning" appears in the text, which seems a bit misleading, as this term may have another connotation. There is also literature that an unpairing of CS and US could lead to the formation of a safety memory to the CS, that may be disrupted by HFS stimulation.

      We replaced "non-associative" with “unpaired”.

      Validation of viral injection sites for all experiments: Only representative examples are shown, it would be nice to see all viral expression sites.

      For this manuscript, we have used 155 mice. For this reason, including the injection sites for all the animals in the manuscript is not feasible. Except for the mice that have been excluded, (please see exclusion criteria added in the methods), the expression pattern we observed was consistent across animals and therefore the images shown are true representatives.

      Extended Data 1b: Please explain what N, U, W, and S behavioral groups mean. To what groups mentioned in the text (pg 2,3) do these correspond?

      The requested clarifications are implemented in the figure legend.

      Please elaborate on the following aspects of your methods and approaches:

      • Please explain if the protocol for HFS to manipulate behavior was the same as the one used for the LTP experiments (Fig 1d, Fig 4j) and was identical for homo/hetero inputs from thal and ctx?

      We used the same HFS protocol for all the HFS inductions. We included this information in the methods section.

      • Please state when the HFS was given in respect to the conditioning (what means immediately before and after?) and in which context it was given. Were animals subjected to HFS exposed to the context longer (either before or after the conditioning while receiving HFS) than the other groups? When the HFS was given in another context (for the 24 h group)- how was this controlled for?

      Requested information has been added to the methods section. The control and intervention groups were treated in the same way.

      • When were the footshocks given in the anesthesized recordings (Fig. 4j) and how was the temporal relationship to the HFS? Was the timing the same as for the HFS in the behavioral experiments?

      Requested information has been added to the methods section.

      • Please add information on how the LFP was stimulated and how the LFP- EPSP slope was determined in in vivo recordings, likewise for the whole cell recordings of EPSPs in Fig. 5d-f.

      Requested information has been added to the methods section.

      Here, the y-Axis in Fig. 5e should be corrected to EPSP slope rather than fEPSP slope if these are whole-cell recordings.

      This has been corrected.

      • Please include information if the viral injections and opto-manipulations were done bilateral or unilateral and if so in which hemisphere. Likewise, indicate where the LFP recordings were done.

      Requested information has been added to the methods section.

      • Were there any exclusion criteria for animals (e.g. insufficient viral targeting or placement of fibers and electrodes), other than the testing of the optical CS for adverse effects?

      Requested information has been added to the methods section.

      Statistics: In addition to clarifying analytical statistics, please clarify n-numbers for slice recordings (number of animals, number of slices, and number of cells if applicable).

      Requested information has been added to the methods section.

      It would be nice to scrutinize the results in extended data 4b. The freezing levels with U+24h HFS show a strong trend towards an increase, the effect size may be similar to immediate HFS Fig 4f and extended data 4a) if n was increased.

      We agree with the reviewer. To address this point, we added “HomoLTP protocol when delivered 24hrs later, produced an increase in freezing; however, the value was not statistically significant.” To show this point, we used the same scale for freezing in Extended Data Fig. 4a and b.

      In the final experiment (Fig. 5a-c), Fig. 5b seems to show results from only one animal, but behavioral results are from 4 animals (Fig 5c). It would be helpful to see the quantification of potentiation in each animal.

      The results (now with error bar) include all mice.

      Please spell out the abbreviation "STC".

      Now, it is spelled out.

      Page 8 last sentence of the discussion does not seem to fit there.

      The sentence has been removed.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors did not determine how WTh affects Th-LA synapses, as field EPSPs were recorded only after HFS. WTh was required for the effects of HFS, as HFS alone did not produce CR in naïve and/or unpaired controls. As such the effects of the WTh protocol on synaptic strength must be investigated.

      We have performed the experiment where we recorded the evoked LFP 2hrs and 24hrs following the weak conditioning protocol. We observed that a weak conditioning protocol that was not followed by an optical LTP protocol on the cortical inputs failed to produce synaptic potentiation of the thalamic inputs (tested 2hrs and 24hrs after the LTP protocol; Extended Data Fig. 5d,e).

      (2) The authors provide some evidence that their dual opsin approach is feasible, particularly the use of sustained yellow light to block the effects of blue light on ChrimsonR. However, this validation was done using single pulses making it difficult to assess the effect of this protocol on Th input when HFS was used. Without strong evidence that the optogenetic methods used here are fault-proof, the main conclusions of this study are compromised. Why did the authors not use a protocol in which fibers were placed directly in the Ctx and Th while using soma-restricted opsins to avoid cross-contamination?

      We understand that the reviewer raises the possibility that our dual-opsin approach, although effective with single pulses, may fail in higher frequency stimulation protocols (10Hz and 85Hz). To address this concern, in a new group of mice we applied our approach to 10Hz and 85Hz stimulation protocols. We show that our approach is effective in single-pulse as well as in 10Hz and 85Hz stimulation protocols (Fig. 2d-h).

    1. Author response:

      The following is the authors’ response to the current reviews.

      We sincerely appreciate the reviewer’s dedication to evaluating our manuscript and raising essential considerations regarding the classification of the migration behavior we described. While the reviewer suggests that this behavior aligns with the concept of itinerancy, we contend that it represents a distinct phenomenon, albeit with similarities, as both involve the non-breeding movements of birds. We acknowledge that our manuscript did not adequately address this distinction and have considered the reviewer’s feedback. In our response, we clarify the difference between the described phenomenon and itinerancy. Our revised manuscript will include a new section in the Discussion to address this issue comprehensively.

      In the first part of the review, the reviewer emphasizes that the pattern we are describing is consistent with itinerancy. Regardless of the terminology used, we want to highlight the existence of two different types of migratory behavior, both of which involve movement in non-breeding areas.

      The first type, called itinerancy, was first described by Moreau in 1972 in “The Palaearctic-African Bird Migration Systems.” As noted by the reviewer, this behavior involves an alternation of stopovers and movements between different short-term non-breeding residency areas. They usually occur in response to food scarcity in one part of the non-breeding range, causing birds to move to another part of the same range. These movements typically cover distances of 10 to 100 kilometers but are neither continuous nor directional. Moreau (1972) defined itinerancy as prolonged stopovers, normally lasting several months, primarily in tropical regions. He noted observations of certain species disappearing from his study areas in sub-Saharan Africa in December and others appearing, suggesting they may have multiple home ranges during the non-breeding season. Subsequent research, as mentioned by the reviewer, has confirmed itinerancy in many species, particularly among Palaearctic-African migrants in sub-Saharan Africa. In particular, the Montagu’s Harrier has been extensively studied in this regard. The reviewer rightly points out that our study does not include recent findings on this species. In our revised version, we will include references to recent studies, such as those by Trierweiler et al. (2013, Journal of Animal Ecology, 82:107-120) and Schlaich et al. (2023, Ardea, 111:321-342), which show that Montagu’s Harrier has an average of 3-4 home ranges separated by approximately 200 kilometers. These studies suggest that the species spends approximately 1.5 months at each site, with the most extended period typically observed at the last site before migrating to the breeding grounds.

      In the second type, birds undertake a post-breeding migration, arrive in their non-breeding range, and then gradually move in a particular direction throughout the season. This continuous directional movement covers considerable distances and continues throughout the non-breeding period. In our study, this movement covered about 1000 km, comparable to the total migration distance of Rough-legged Buzzards of about 1500 km. As observed in our research, these movements are influenced by external factors such as snow cover. In such cases, the progression of snow cover in a south-westerly direction during winter can prevent birds from finding food, forcing them to continue migrating in the same direction. In essence, this movement represents a prolonged phase of the migration process but at a slower pace. Similar behavior has been documented in buzzards, as reported by Strandberg et al. (2009, Ibis 151:200-206). Although several transmitters in their study stopped working in mid-winter, the authors observed a phenomenon they termed ‘prolonged autumn migration.’

      In the second part of the review, the reviewer questions the need to distinguish between the two behaviors we have discussed. However, we believe these behaviors differ in their structure (with the first being intermittent and often non-directional, whereas the second is continuous and directional) and in their causes (with the first being driven by seasonal food resource cycles and the second by advancing snow cover). We therefore argue that it is worth distinguishing between them. To differentiate these forms of non-breeding movement, we propose to use ‘itinerancy’ for the first type, as described initially by Moreau in 1972, and introduce a separate term for the second behavior. Although ‘slow directional itinerancy’ could be considered, we find it too cumbersome.

      Moreover, ‘itinerancy’ in the literature refers not only to non-breeding movements but also to the use of different nesting sites, e.g., Lislevand et al. (2020, Journal of Avian Biology: e02595), reinforcing its association with movements between multiple sites within habitats. We, therefore, propose that the second behavior be given a distinct name. We acknowledge the reviewer’s point that we did not adequately address this distinction in the Discussion and plan to include a separate section in our paper’s revised version. In the third part of his review, the reviewer suggests an alternative title. Another reviewer, Dr Theunis Piersma, suggested the current title during the first round of reviewing, and we have chosen his version.

      In the fourth part of the review, the reviewer questions whether it is appropriate to discuss the conservation aspect of this study. This type of non-breeding movement raises concerns about accurately determining non-breeding ranges and population dynamics for species that exhibit this behavior. We believe that accurate determination of range and population dynamics is critical to conservation efforts. While this may be less important for species breeding in Europe and migrating to Africa, for which monitoring breeding territories is more feasible, it’s essential for Arctic and sub-Arctic breeding species. Large-scale surveys in these regions have historically been challenging and have become even more so with the end of Arctic cooperation following Russia’s war with Ukraine (Koivurova, Shibata, 2023). For North America and Europe, non-breeding abundance is typically estimated once per season in mid-winter. In North America, these are the so-called Christmas counts (which take place once at the end of December), and in Europe, they are the IWC counts mentioned by the reviewer (as follows from their official website - “The IWC requires a single count at each site, which should be repeated each year. The exact dates vary slightly from region to region, but take place in January or February”). Because of such a single count in mid-winter, non-breeding habitats occupied in autumn and spring will be listed as ‘uncommon’ at best, while south-western habitats where birds are only present in mid-winter will be listed as ‘common.’ However, the situation will be reversed if we consider the time birds spend in these habitats.

      The reviewer also highlights the introduction’s unconventional structure and information redundancy at the beginning. We have chosen this structure and provided basic explanations to improve readability for a wider audience, given eLife’s readership. At the same time, we will certainly take the reviewers’ feedback into account in the revised version. We plan to include the references to modern itinerancy research mentioned above and to add a section on itinerancy to the Discussion.

      We appreciate the reviewer’s input and sincerely thank them for their time and effort in reviewing our paper. While we may not fully agree on the classification of the behavior we describe, we value the opportunity to engage in discussion and believe that presenting arguments and counterarguments to the reader is beneficial to scientific progress.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I much enjoyed reading this manuscript, that is, once I understood what it is about. Titles like "Conserving bird populations in the Anthropocene: the significance of non-breeding movements" are a claim to so-called relevance, they have NOTHING to do with the content of the paper, so once I understood that this paper was about the "Quick quick slow: the foxtrot migration of rough-legged buzzards is a response to habitat and snow" (an alternative title), it was becoming very interesting. So the start of the abstract as well as the introduction is very tedious, as clearly much trouble is taken here to establish reputability. In my eyes this is unnecessary: eLife should be interested in publishing such a wonderful description of such a wonderful migrant in a study that comes to grips with limiting factors on a continental scale!

      We sincerely appreciate your time and effort in reviewing our manuscript. Thank you for your appreciation of our study.

      We agree that the focus of the article should be changed from conservation to migration patterns. We have rewritten the Introduction and Discussion as suggested. We have added the application of this pattern including conservation at the end of the Discussion by completely changing Figure 5. We have also changed the title to the suggested one.

      Not sure that the first paragraph statements that seek to downplay what we know about wintering vs breeding areas are valid (although I see what purpose they serve). Migratory shorebirds have extensively been studied in the nonbreeding areas, for example, including movement aspects (see, as just one example, Verhoeven, M.A., Loonstra, A.H.J., McBride, A.D., Both, C., Senner, N.R. & Piersma, T. (2020) Migration route, stopping sites, and non breeding destinations of adult Black tailed Godwits breeding in southwest Fryslân, The Netherlands. Journal of Ornithology 162, 61-76) and there are very impressive studies on the winter biology of migrants across large scale (for example in Zwarts' Living on the Edge book on the Sahel wetlands). Think also about geese and swans and about seabirds!

      We have rewritten the first paragraph and it now talks about patterns of migratory behavior. We have also rewritten the second paragraph, now it is devoted to studies of movements in the non-breeding period. We explain how our pattern differs from those already studied and give references to the papers you mentioned.

      Directional movements in nonbreeding areas as a function of food (in this case locusts) have really beautifully been described by Almut Schlaich et al in JAnimEcol for Montagu's harriers.

      We have added Montagu's harrier example in the second paragraph of the Introduction and the Discussion. We have added a reference to Schlaich and to Garcia and Arroyo, who suggested that Montagu's harriers have long directional migrations during the non-breeding period.

      Once the paper starts talking buzzards, and the analyses of the wonderful data, all is fine. It is a very competent analysis with a description of a cool pattern.

      Thank you for your appreciation of our study. We hope the revised version is better and clearer.

      However, i would say that it is all a question of spatial scale. The buzzards here respond to changes in food availability, but there is not an animal that doesn't. The question is how far they have to move for an adequate response: in some birds movements of 100s of meters may be enough, and then anything to the scale of rough-legged buzzards.

      In the new version of the manuscript, we emphasize that this is a large distance (about 1000 km), comparable to the distance of the fall and spring migrations (about 1400 km) in lines 70-72 of the Introduction and 379-383 of the Discussion.

      And actually, several of the shorebirds I know best also do a foxtrot, such as red knots and bar-tailed godwits moulting in the Wadden Sea, then spending a few months in the UK estuaries, before returning to the Wadden Sea before the long migrations to Arctic breeding grounds. The publication of the rough-legged buzzard story may help researchers to summarize patterns such as this too. Mu problem with this paper is the framing. A story on the how and why of these continental movements in response to snow and other habitat features would be a grand contribution. Drop Anthropocene, and rethink whether foxtrot should be introduced as a hypothesis or a summary of cool descriptions. I prefer the latter, and recommend eLife to go with that too, rather than encourage "disconnected frames that seek 'respectability'" Good luck, theunis piersma

      We thank the reviewer again for his valuable comments and suggestions. We have changed the framing to the suggested one and removed the Anthropocene from the article.

      Reviewer #2 (Recommendations For The Authors):

      We sincerely appreciate the time and effort you have taken to review our manuscript. We have carefully considered all of your comments, including both public and author comments, and provided detailed responses to each of them below. In addition, we would like to address the most important public comments.

      We agree with the suggestion to shift the focus of the article from conservation to migration patterns. Accordingly, we have rewritten both the Introduction and Discussion sections to focus on migration behavior rather than conservation.

      However, we respectfully disagree with the suggestion that the migration patterns we describe are synonymous with itinerancy. We acknowledge that our original presentation may have been unclear and may have hindered full understanding. In the revised version, we provide a detailed analysis of migratory behavior in the Introduction that describes how our pattern differs from itinerancy. We also revisit this distinction in the Discussion section. We have also carefully revised Figure 1 to improve clarity and avoid potential misunderstandings.

      Regarding the applicability of the described migration pattern, we acknowledge that the Rough-legged Buzzard is not listed as an endangered species. However, we believe that our findings have practical implications. We have moved our discussion of this issue to the end of the Discussion section and have completely revised Figure 5. While the overall population of Rough-legged Buzzards is not declining, certain regions within its range are experiencing declines. We show that this decline does not warrant listing the species as endangered. Instead, it may represent a redistribution within the non-breeding range - a shift in range dynamics. We use the example of the Rough-legged Buzzard to illustrate this concept and emphasize the importance of considering such dynamics when assessing the conservation status of species in the future.

      We also acknowledge that the hypothesis of this form of behavior has been proposed previously for Montagu's Harrier, and we have included this information in the revised manuscript. In addition, we agree that the focus on the Anthropocene is unnecessary in this context and have therefore removed it.

      We believe that these revisions significantly improve the clarity and robustness of the manuscript, and we are grateful for your insightful comments and suggestions.

      As a general comment, please note that including line numbers (as it is the standard in any manuscript submission) would facilitate reviewers providing more detailed comments on the text.

      We apologize for this oversight and have added line numbers to our revised manuscript.

      Dataset: unclear what is the frequency of GPS transmissions. Furthermore, information on relative tag mass for the tracked individuals should be reported.

      We have included this information in our manuscript (L 157-163). We also refer to the study in which this dataset was first used and described in detail (L 164).

      Data pre-processing: more details are needed here. What data have been removed if the bird died? The entire track of the individual? Only the data classified in the last section of the track? The section also reports on an 'iterative procedure' for annotating tracks, which is only vaguely described. A piecewise regression is mentioned, but no details are provided, not even on what is the dependent variable (I assume it should be latitude?).

      Regarding the deaths. We only removed the data when the bird was already dead. We have corrected the text to make this clear (L 170).

      Regarding the iterative procedure. We have added a detailed description on lines 175-188.

      Data analysis: several potential issues here:

      (1) Unclear why sex was not included in all mixed models. I think it should be included.

      Our dataset contains 35 females and eight males. This ratio does not allow us to include sex in all models and adequately assess the influence of this factor. At the same time, because adult females disperse farther than males in some raptor species, we conducted a separate analysis of the dependence of migration distance on sex (Table S8) and found no evidence for this in our species. We have written a separate paragraph about this. This paragraph can be found on lines 356-360 of the new manuscript.

      (2) Unclear what is the rationale of describing habitat use during migration; is it only to show that it is a largely unsuitable habitat for the species? But is a formal analysis required then? Wouldn't be enough to simply describe this?

      Habitat use and snow cover determine the two main phases (quick and slow) of the pattern we describe. We believe that habitat analysis is appropriate in this case and that a simple description would be uninformative and would not support our conclusions.

      (3) Analysis of snow cover: such a 'what if' analysis is fine but it seems to be a rather indirect assessment of the effect of snow cover on movement patterns. Can a more direct test be envisaged relating e.g. daily movement patterns to concomitant snow cover? This should be rather straightforward. The effectiveness of this method rests on among-year differences in snow cover and timing of snowfall. A further possibility would be to demonstrate habitat selection within the entire non-breeding home range of an individual in relation snow cover. Such an analysis would imply associating presence-absence of snow to every location within the non-breeding range and testing whether the proportion of locations with snow is lower than the proportion of snow of random locations within the entire non-breeding home range (95% KDE) for every individual (e.g. by setting a 1/10 ratio presence to random locations).

      The proposed analysis will provide an opportunity to assess whether the Rough-legged Buzzard selects areas with the lowest snow cover, but will not provide an opportunity to follow the dynamics and will therefore give a misleading overall picture. This is especially true in the spring months. In March-April, Rough-legged Buzzards move northeast and are in an area that is not the most open to snow. At this time, areas to the southwest are more open to snow (this can be seen in Figure 4b). If we perform the proposed analysis, the control points for this period would be both to the north (where there is more snow) and to the south (where there is less snow) from the real locations, and the result would be that there is no difference in snow cover.

      A step-selection analysis could be used, as we did in our previous work (Curk et al 2020 Sci Rep) with the same Rough-legged Buzzard (but during migration, not winter). But this would only give us a qualitative idea, not a quantitative one - that Rough-legged Buzzards move from snow (in the fall) and follow snowmelt progression (in the spring).

      At the same time, our analysis gives a complete picture of snow cover dynamics in different parts of the non-breeding range. This allows us to see that if Rough-legged Buzzards remained at their fall migration endpoint without moving southwest, they would encounter 14.4% more snow cover (99.5% vs. 85.1%). Although this difference may seem small (14.4%), it holds significance for rodent-hunting birds, distinguishing between complete and patchy snow cover. Simultaneously, if Rough-legged Buzzards immediately flew to the southwest and stayed there throughout winter, they would experience 25.7% less snow cover (57.3% vs. 31.6%). Despite a greater difference than in the first case, it doesn't compel them to adopt this strategy, as it represents the difference between various degrees of landscape openness from snow cover.

      We write about this in the new manuscript on lines 385-394.

      Results: it is unclear whether the reported dispersion measures are SDs or SEs. Please provide details.

      For the date and coordinates of the start and end of the different phases of migration, we specified the mean, sd, and sample size. We wrote this in line 277. For the values of the parameters of the different phases of the migration (duration, distance, speed, and direction), we used the mean, the standard error of the mean, and the confidence interval (obtained using the ‘emmeans’ package). We have indicated this in lines 302-303 and the caption of Table 1 (L 315) and Figure 2 (L 293-294). For the values of habitat and snow cover experienced by the Rough-legged Buzzards, we used the mean and the error of the mean. We reported this on lines 322 and 337 and in Figures 3 (L 332-333) and 4 (L 355-356).

      Discussion: in general, it should be reshaped taking into account the comments. It is overlong, speculative and quite naive in several passages. Entire sections can be safely removed (I think it can be reduced by half without any loss of information). I provide some examples of the issues I have spotted below. For instance, the entire paragraph starting with 'Understanding....' is not clear to me. What do you mean by 'prohibited management' options? Without examples, this seems a rather general text, based on unclear premises when related to the specific of this study. Some statements are vague, derive from unsubstantiated claims, and unclear. E.g. "Despite their scarcity in these habitats, forests appear to hold significant importance for Rough-legged buzzards for nocturnal safety". I could not find any day-night analysis showing that they actually roost in forests during nighttime. Being a tundra species, it may well be possible that rough-legged buzzards perceive forests as very dangerous habitats and that they prefer instead to roost in open habitats. Analysing habitat use during day and night during the non-breeding period may be of help to clarify this. Furthermore, considering the fast migration periods, what is the flight speed during day and night above forests? Do these birds also migrate at night or do they roost during the night? Perhaps a figure visualizing day and night track segments could be of help (or an analysis of day vs. night flight speed) (there are several R packages to annotate tracks in relation to day and night). This is an example of another problematic statement: "The progression of snow cover in the wintering range of Rough-legged buzzards plays a significant role in their winter migration pattern." The manuscript does not contain any clear demonstration of this, as I wrote in my previous comments. Without such evidence, you must considerably tone down such assertions. But since providing a direct link is certainly possible, I think that additional analyses would clearly strengthen your take-home message.

      The paragraph starting with "The quantification of environmental changes that could prove fatal to bird species presents yet another challenge for conservation efforts in an era of rapid global change." is quite odd. Take the following statement "For instance, the presence of small patches of woodland in the winter range might appear crucial to the survival of the Rough-legged buzzard. Elimination of these seemingly minor elements of vegetation cover through management actions could have dire consequences for the species.". It is based on the assumption that minor vegetation elements play a key role in the ecology of the species, without any evidence supporting this. Does it have any sense? I could safely say exactly the opposite and I would believe it might even be more substantiated.

      We agree with these comments.

      We have completely rewritten this section. As suggested, we have shortened it by removing statements that were not supported by the research. We have completely removed the statements about "prohibited management". We have also removed the statement that "forests appear to be of significant importance to Rough-legged buzzards for nocturnal safety" and everything associated with that statement, e.g. the statement about "small elements of vegetation cover", etc. We do believe that this statement is true in substance, but we also agree that it is not supported by the results and requires separate analysis. At the same time, we believe that this is a topic for a separate study and would be redundant here. Therefore, we leave it for a separate publication.

      Conclusion paragraph: I believe this severely overstates the conservation importance of this study. That the results have "crucial implications for conservation efforts in the Anthropocene, where rapidly changing environmental factors can severely impact bird migration" seems completely untenable to me. What is the evidence for such crucial implications? For instance, these results may suggest that climate change, because global warming is predicted to reduce snow cover in the non-breeding areas, might well be beneficial for populations of this species, by reducing non-breeding energy expenditure and improving non-breeding survival. I think statements like these are simply not necessary, and that the study should be more focused on the actual results and evidence provided.

      We have completely rewritten this section. We removed the reference to the Anthropocene and focused on migratory behavior and migration patterns.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are grateful for these balanced, nuanced evaluations of our work concerning the observed epistatic trends and our interpretations of their mechanistic origins. Overall, we think the reviewers have done an excellent job at recognizing the novel aspects of our findings while also discussing the caveats associated with our interpretations of the biophysical effects of these mutations. We believe it is important to consider both of these aspects of our work in order to appreciate these advances and what sorts of pertinent questions remain.

      Notably, both reviewers are concerned that our lack of experimental approaches to compare the conformational properties of GnRHR variants weakens our claims. We would first humbly suggest that this constitutes a more general caveat that applies to nearly all investigations of the cellular misfolding of α-helical membrane proteins. Whether or not any current in vitro folding measurements report on conformational transitions that are relevant to cellular protein misfolding reactions remains an active area of debate (discussed further below). Nevertheless, while we concede that our structural and/ or computational evaluations of various mutagenic effects remain speculative, prevailing knowledge on the mechanisms of membrane protein folding suggest our mutations of interest (V276T and W107A) are highly unlikely to promote misfolding in precisely the same way. Thus, regardless of whether or not we were able experimentally compare the relevant folding energetics of GnRHR variants, we are confident that the distinct epistatic interactions formed by these mutations reflect variations in the misfolding mechanism and that they are distinct from the interactions that are observed in the context of stable proteins. In the following, we provide detailed considerations concerning these caveats in relation to the reviewers’ specific comments.

      Reviewer #1 (Public Review):

      The paper carries out an impressive and exhaustive non-sense mutagenesis using deep mutational scanning (DMS) of the gonadotropin-releasing hormone receptor for the WT protein and two single point mutations that I) influence TM insertion (V267T) and ii) influence protein stability (W107A), and then measures the effect of these mutants on correct plasma membrane expression (PME).

      Overall, most mutations decreased mGnRHR PME levels in all three backgrounds, indicating poor mutational tolerance under these conditions. The W107A variant wasn't really recoverable with low levels of plasma membrane localisation. For the V267T variant, most additional mutations were more deleterious than WT based on correct trafficking, indicating a synergistic effect. As one might expect, there was a higher degree of positive correlation between V267T/W107A mutants and other mutants located in TM regions, confirming that improper trafficking was a likely consequence of membrane protein co-translational folding. Nevertheless, context is important, as positive synergistic mutants in the V27T could be negative in the W107A background and vice versa. Taken together, this important study highlights the complexity of membrane protein folding in dissecting the mechanism-dependent impact of disease-causing mutations related to improper trafficking.

      Strengths

      This is a novel and exhaustive approach to dissecting how receptor mutations under different mutational backgrounds related to co-translational folding, could influence membrane protein trafficking.

      Weaknesses

      The premise for the study requires an in-depth understanding of how the single-point mutations analysed affect membrane protein folding, but the single-point mutants used seem to lack proper validation.

      Given our limited understanding of the structural properties of misfolded membrane proteins, it is unclear whether the relevant conformational effects of these mutations can be unambiguously validated using current biochemical and/ or biophysical folding assays. X-ray crystallography, cryo-EM, and NMR spectroscopy measurements have demonstrated that many purified GPCRs retain native-like structural ensembles within certain detergent micelles, bicelles, and/ or nanodiscs. However, helical membrane protein folding measurements typically require titration with denaturing detergents to promote the formation of a denatured state ensemble (DSE), which will invariably retain considerable secondary structure. Given that the solvation provided by mixed micelles is clearly distinct from that of native membranes, it remains unclear whether these DSEs represent a reasonable proxy for the misfolded conformations recognized by cellular quality control (QC, see https://doi.org/10.1021/acs.chemrev.8b00532). Thus, the use and interpretation of these systems for such purposes remains contentious in the membrane protein folding community. In addition to this theoretical issue, we are unaware of any instances in which GPCRs have been found to undergo reversible denaturation in vitro- a practical requirement for equilibrium folding measurements (https://doi.org/10.1146/annurev-biophys-051013-022926). We note that, while the resistance of GPCRs to aggregation, proteolysis, and/ or mechanical unfolding have also been probed in micelles, it is again unclear whether the associated thermal, kinetic, and/ or mechanical stability should necessarily correspond to their resistance to cotranslational and/ or posttranslational misfolding. Thus, even if we had attempted to validate the computational folding predictions employed herein, we suspect that any resulting correlations with cellular expression may have justifiably been viewed by many as circumstantial. Simply put, we know very little about the non-native conformations are generally involved in the cellular misfolding of α-helical membrane proteins, much less how to measure their relative abundance. From a philosophical standpoint, we prefer to let cells tell us what sorts of broken protein variants are degraded by their QC systems, then do our best to surmise what this tells us about the relevant properties of cellular DSEs.

      Despite this fundamental caveat, we believe that the chosen mutations and our interpretation of their relevant conformational effects are reasonably well-informed by current modeling tools and by prevailing knowledge on the physicochemical drivers of membrane protein folding and misfolding. Specifically, the mechanistic constraints of translocon-mediated membrane integration provide an understanding of the types of mutations that are likely to disrupt cotranslational folding. Though we are still learning about the protein complexes that mediate membrane translocation (https://doi.org/10.1038/s41586-022-05336-2), it is known that this underlying process is fundamentally driven by the membrane depth-dependent amino acid transfer free energies (https://doi.org/10.1146/annurev.biophys.37.032807.125904). This energetic consideration suggests introducing polar side chains near the center of a nascent TMDs should almost invariably reduce the efficiency of topogenesis. To confirm this in the context of TMD6 specifically, we utilized a well-established biochemical reporter system to confirm that V276T attenuates its translocon-mediated membrane integration (Fig. S1)- at least in the context of a chimeric protein. We also constructed a glycosylation-based topology reporter for full-length GnRHR, but ultimately found its’ in vitro expression to be insufficient to detect changes in the nascent topological ensemble.

      In contrast to V276T, the W107A mutation is predicted to preserve the native topological energetics of GnRHR due to its position within a soluble loop region. W107A is also unlike V276T in that it clearly disrupts tertiary interactions that stabilize the native structure. This mutation should preclude the formation of a structurally conserved hydrogen bonding network that has been observed in the context of at least 25 native GPCR structures (https://doi.org/10.7554/eLife.5489). However, without a relevant folding assay, the extent to which this network stabilizes the native GnRHR fold in cellular membranes remains unclear. Overall, we admit that these limitations have prevented us from measuring how much V276T alters the efficiency of GnRHR topogenesis, how much the W107A destabilizes the native fold, or vice versa. Nevertheless, given these design principles and the fact that both reduce the plasma membrane expression of GnRHR, as expected, we are highly confident that the structural defects generated by these mutations do, in fact, promote misfolding in their own ways. We also concede that the degree to which these mutagenic perturbations are indeed selective for specific folding processes is somewhat uncertain. However, it seems exceedingly unlikely that these mutations should disrupt topogenesis and/ or the folding of the native topomer to the exact same extent. From our perspective, this is the most important consideration with respect to the validity of the conclusions we have made in this manuscript.

      Furthermore, plasma membrane expression has been used as a proxy for incorrect membrane protein folding, but this not necessarily be the case, as even correctly folded membrane proteins may not be trafficked correctly, at least, under heterologous expression conditions. In addition, mutations can affect trafficking and potential post-translational modifications, like glycosylation.

      While the reviewer is correct that the sorting of folded proteins within the secretory pathway is generally inefficient, it is also true that the maturation of nascent proteins within the ER generally bottlenecks the plasma membrane expression of most α-helical membrane proteins. Our group and several others have demonstrated that the efficiency of ER export generally appears to scale with the propensity of membrane proteins to achieve their correct topology and/ or to achieve their native fold (see https://doi.org/10.1021/jacs.5b03743 and https://doi.org/10.1021/jacs.8b08243). Notably, these investigations all involved proteins that contain native glycosylation and various other post-translational modification sites. While we cannot rule out that certain specific combinations of mutations may alter expression through their perturbation of post-translational GnRHR modifications, we feel confident that the general trends we have observed across hundreds of variants predominantly reflect changes in folding and cellular QC. This interpretation is supported by the relationship between observed trends in variant expression and Rosetta-based stability calculations, which we identified using unbiased unsupervised machine learning approaches (compare Figs. 6B & 6D).

      Reviewer #2 (Public Review):

      Summary:

      In this paper, Chamness and colleagues make a pioneering effort to map epistatic interactions among mutations in a membrane protein. They introduce thousands of mutations to the mouse GnRH Receptor (GnRHR), either under wild-type background or two mutant backgrounds, representing mutations that destabilize GnRHR by distinct mechanisms. The first mutant background is W107A, destabilizing the tertiary fold, and the second, V276T, perturbing the efficiency of cotranslational insertion of TM6 to the membrane, which is essential for proper folding. They then measure the surface expression of these three mutant libraries, using it as a proxy for protein stability, since misfolded proteins do not typically make it to the plasma membrane. The resulting dataset is then used to shed light on how diverse mutations interact epistatically with the two genetic background mutations. Their main conclusion is that epistatic interactions vary depending on the degree of destabilization and the mechanism through which they perturb the protein. The mutation V276T forms primarily negative (aggravating) epistatic interactions with many mutations, as is common to destabilizing mutations in soluble proteins. Surprisingly, W107A forms many positive (alleviating) epistatic interactions with other mutations. They further show that the locations of secondary mutations correlate with the types of epistatic interactions they form with the above two mutants.

      Strengths:

      Such a high throughput study for epistasis in membrane proteins is pioneering, and the results are indeed illuminating. Examples of interesting findings are that: (1) No single mutation can dramatically rescue the destabilization introduced by W107A. (2) Epistasis with a secondary mutation is strongly influenced by the degree of destabilization introduced by the primary mutation. (3) Misfolding caused by mis-insertion tends to be aggravated by further mutations. The discussion of how protein folding energetics affects epistasis (Fig. 7) makes a lot of sense and lays out an interesting biophysical framework for the findings.

      Weaknesses:

      The major weakness comes from the potential limitations in the measurements of surface expression of severely misfolded mutants. This point is discussed quite fairly in the paper, in statements like "the W107A variant already exhibits marginal surface immunostaining" and many others. It seems that only about 5% of the W107A makes it to the plasma membrane compared to wild-type (Figures 2 and 3). This might be a low starting point from which to accurately measure the effects of secondary mutations.

      The reviewer raises an excellent point that we considered at length during the analysis of these data and the preparation of the manuscript. Though we remain confident in the integrity of these measurements and the corresponding analyses, we now realize this aspect of the data required further discussion and documentation which we have provided in the revised version of the manuscript as is described in the following.

      Still, the authors claim that measurements of W107A double mutants "still contain cellular subpopulations with surface immunostaining intensities that are well above or below that of the W107A single mutant, which suggests that this fluorescence signal is sensitive enough to detect subtle differences in the PME of these variants". I was not entirely convinced that this was true.

      We made this statement based on the simple observation that the surface immunostaining intensities across the population of recombinant cells expressing the library of W107A double mutants was consistently broader than that of recombinant cells expressing W107A GnRHR alone (see Author response image 1 for reference). Given that the recombinant cellular library represents a mix of cells expressing ~1600 individual variants that are each present at low abundance, the pronounced tails within this distribution presumably represent the composite staining of many small cellular subpopulations that express collections of variants that deviate from the expression of W107A to an extent that is significant enough to be visible on a log intensity plot.

      Author response image 1.

      Firstly, I think it would be important to test how much noise these measurements have and how much surface immunostaining the W107A mutant displays above the background of cells that do not express the protein at all.

      For reference, the average surface immunostaining intensity of HEK293T cells transiently expressing W107A GnRHR was 2.2-fold higher than that of the IRES-eGFP negative, untransfected cells within the same sample- the WT immunostaining intensity was 9.5-fold over background by comparison. Similarly, recombinant HEK293T cells expressing the W107A double mutant library had an average surface immunostaining intensity that was 2.6-fold over background across the two DMS trials. Thus, while the surface immunostaining of this variant is certainly diminished, we were still able to reliably detect W107A at the plasma membrane even under distinct expression regimes. We have included these and other signal-to-noise metrics for each experiment in the Results section of the revised manuscript.

      Beyond considerations related to intensity, we also previously noticed the relative intensity values for W107A double mutants exhibited considerable precision across our two biological replicates. If signal were too poor to detect changes in variant expression, we would have expected a plot of the intensity values across these two replicates to form a scatter. Instead, we found DMS intensity values for individual variants to be highly correlated from one replicate to the next (Pearson’s R2 = 0.95, see Author response image 2 for reference). This observation empirically demonstrates that this assay consistently differentiated between variants that exhibit slightly enhanced immunostaining from those that have even lower immunostaining than W107A GnRHR. We have included these discussion points in the Results section as well as scatter plots for replicate variant intensities within all three genetic backgrounds in Figure S3 of the revised manuscript.

      Author response image 2.

      But more importantly, it is not clear if under this regimen surface expression still reports on stability/protein fitness. It is unknown if the W107A retains any function or folding at all. For example, it is possible that the low amount of surface protein represents misfolded receptors that escaped the ER quality control.

      While we believe that such questions are outside the scope of this work, we certainly agree that it is entirely possible that some of these variants bypass QC without achieving their native fold. This topic is quite interesting to us but is quite challenging to assess in the context of GPCRs, which have complex fitness landscapes that involve their propensity to distinguish between different ligands, engage specific components associated with divergent downstream signaling pathways, and navigate between endocytic recycling/ degradation pathways following activation. In light of the inherent complexity of GPCR function, we humbly suggest our choice of a relatively simple property of an otherwise complex protein may be viewed as a virtue rather than a shortcoming. Protein fitness is typically cast as the product of abundance and activity. Rather than measuring an oversimplified, composite fitness metric, we focused on one variable (plasma membrane expression) and its dominant effector (folding). We believe restraining the scope in this manner was key for the elucidation of clear mechanistic insights.

      The differential clustering of epistatic mutations (Fig. 6) provides some interesting insights as to the rules that dictate epistasis, but these too are dominated by the magnitude of destabilization caused by one of the mutations. In this case, the secondary mutations that had the most interesting epistasis were exceedingly destabilizing. With this in mind, it is hard to interpret the results that emerge regarding the epistatic interactions of W107A. Furthermore, the most significant positive epistasis is observed when W107A is combined with additional mutations that almost completely abolish surface expression. It is likely that either mutation destabilizes the protein beyond repair. Therefore, what we can learn from the fact that such mutations have positive epistasis is not clear to me. Based on this, I am not sure that another mutation that disrupts the tertiary folding more mildly would not yield different results. With that said, I believe that the results regarding the epistasis of V276T with other mutations are strong and very interesting on their own.

      We agree with the reviewer. In light of our results we believe it is virtually certain that the secondary mutations characterized herein would be likely to form distinct epistatic interactions with mutations that are only mildly destabilizing. Indeed, this insight reflects one of the key takeaway messages from this work- stability-mediated epistasis is difficult to generalize because it should depend on the extent to which each mutation changes the stability (ΔΔG) as well as initial stability of the WT/ reference sequence (ΔG, see Figure 7). Frankly, we are not so sure we would have pieced this together as clearly had we not had the fortune (or misfortune?) of including such a destructive mutation like W107A as a point of reference.

      Additionally, the study draws general conclusions from the characterization of only two mutations, W107A and V276T. At this point, it is hard to know if other mutations that perturb insertion or tertiary folding would behave similarly. This should be emphasized in the text.

      We agree. Our findings suggest different mutations may not behave similarly, which we believe is a key finding of this work. We have emphasized this point in the Discussion section of the revised manuscript as follows:

      “These findings suggest the folding-mediated epistasis is likely to vary among different classes of destabilizing mutations in a manner that should also depend on folding efficiency and/ or the mechanism(s) of misfolding in the cell.”

      Some statistical aspects of the study could be improved:

      (1) It would be nice to see the level of reproducibility of the biological replicates in a plot, such as scatter or similar, with correlation values that give a sense of the noise level of the measurements. This should be done before filtering out the inconsistent data.

      We thank the reviewer for this suggestion and will include scatters for each genetic background like the one shown above in Figure S3 of the revised version of the manuscript.

      (2) The statements "Variants bearing mutations within the C- terminal region (ICL3-TMD6-ECL3-TMD7) fare consistently worse in the V276T background relative to WT (Fig. 4 B & E)." and "In contrast, mutations that are 210 better tolerated in the context of W107A mGnRHR are located 211 throughout the structure but are particularly abundant among residues 212 in the middle of the primary structure that form TMD4, ICL2, and ECL2 213 (Fig. 4 C & F)." are both hard to judge. Inspecting Figures 4B and C does not immediately show these trends, and importantly, a solid statistical test is missing here. In Figures 4E and F the locations of the different loops and TMs are not indicated on the structure, making these statements hard to judge.

      We apologize for this oversight and thank the reviewer for pointing this out. We utilized paired Wilcoxon-Signed Rank Tests to evaluate the statistical significance of these observations and modified the description of these findings in the revised version of the results section as follows:

      “Variants bearing mutations within the C-terminal regions including ICL3, TMD6, and TMD7 fare consistently worse in the V276T background relative to WT (paired Wilcoxon-Signed Rank Test p-values of 0.0001, 0.02, and 0.005, respectively) (Fig. 4 B & E). Given that V276T perturbs the cotranslational membrane integration of TMD6 (Fig. S1, Table S1), this directional bias potentially suggests that the apparent interactions between these mutations manifest during the late stages of cotranslational folding. In contrast, mutations that are better tolerated in the context of W107A mGnRHR are located throughout the structure but are particularly abundant among residues in the middle of the primary structure that form ICL2, TMD4, and ECL2 (paired Wilcoxon-Signed Rank Test p-values of 0.0005, 0.0001, and 0.004, respectively) (Fig. 4 C & F).”

      (3) The following statement lacks a statistical test: "Notably, these 98 variants are enriched with TMD variants (65% TMD) relative to the overall set of 251 variants (45% TMD)." Is this enrichment significant? Further in the same paragraph, the claim that "In contrast to the sparse epistasis that is generally observed between mutations within soluble proteins, these findings suggest a relatively large proportion of random mutations form epistatic interactions in the context of unstable mGnRHR variants". Needs to be backed by relevant data and statistics, or at least a reference.

      We thank the reviewer for this reasonable suggestion. In the revised manuscript, we included the results of a paired Wilcoxon-Signed Rank Test that confirms the statistical significance of this observation and modified the Results section to reflect this as follows:

      “Notably, these 98 variants are enriched with TMD variants (65% TMD) relative to the overall set of 251 variants (45% TMD, Fisher’s Exact Test p = 0.0019). These findings suggest random mutations form epistatic interactions in the context of unstable mGnRHR variants in a manner that depends on the specific folding defect (V276T vs. W107A) and topological context.”

      Reviewer #1 (Recommendations for the Authors):

      As far as this reviewer is aware, the effect of the V267T variant on MP insertion has not been measured directly; its position corresponds to T277 in TMD6 of human GnRHR that has been measured for TM insertion, but given the clear lack of conservation (threonine vs valine) the mutation in TM6 could potentially have a different impact on the mouse homologue. Please clarify what the predicted delta TM for insertion is between human and mouse GnRHR is? Moreover, I would argue that single TM insertion by tethering to Lep is insufficient to understand MP insertion/folding, as neighbouring TM helices could help to drive TM6 insertion. Has ER microsome experiments for mouse GnRHR also been carried out in the context of neighbouring helices?

      We included measurements (and predictions) of the impact of the V276T substitution on the translocon-mediated membrane integration of the mouse TMD6 in the context of a chimeric Lep protein (see Fig. S1 & Table S1). Our results reveal that this substitution decreases the efficiency of TMD6 membrane integration by ~10%. Though imperfect, this prevailing biochemical assay remains popular for a variety of theoretical and technical reasons. Importantly, extensive experimental testing of this system has shown that these measurements report apparent equilibrium constants that are well-described by two-state equilibrium partitioning models (see DOIs 10.1038/nature03216 and 10.1038/nature06387). This observation provides a reasonable rationale to interpret these measurements using energetic models as we have in this work (see Table S1). From a technical perspective, the Lep system is also advantageous due to the fact that this protein is generally well expressed in the context of in vitro translation systems containing native membranes, which generally ensures a consistent signal to noise and dynamic range for membrane integration measurements. Nevertheless, the reviewers are correct that membrane integration efficiencies are likely distinct in the context of the native mGnRHR protein. For these reasons, we attempted to develop a glycosylation-based topology reporter prior to the posting and submission of this manuscript. However, all GnRHR reporters we tested were poorly expressed in vitro and the resulting 35S-labeled proteins only generated faint smears on our phosphorimaging screens that could not be interpreted. For these reasons, we chose to rely the Lep measurements for these investigations.

      The lack of a more relevant topological reporter is one of many challenges we faced in our investigations of this unstable, poorly behaved protein. We share the reviewer’s frustrations concerning the speculative aspects of this work. Nevertheless, there is increasing appreciation for the fact that our perspectives on protein biophysics have been skewed by our continuing choice to focus on the relatively small set of model proteins that are compatible with our favored methodologies (doi: 10.1016/j.tibs.2013.05.001). We humbly suggest this work represents an example of how we can gain a deeper understanding of the limits of biochemical systems when we instead choose to study the unsavory bits of cellular proteomes. But this choice requires a willingness to make some reasonable assumptions and to lean on energetic/ structural modeling from time to time. Despite this limitation, we believe there is still tremendous value in this compromise.

      What is the experimental evidence the W107A variant affects the protein structure? Has its melting temperature with and without inverse agonist binding for WT vs the W107A variant been measured, for example? Even heat-FSEC of detergent-solubilised membranes would be informative to know how unstable the W107A variant is. If is very unstable in detergent, then it could be that recovery mutants are going to be unlikely as you are already starting with a poor construct showing poor folding/localisation.

      We again understand the rationale for this concern, but do not believe that thermal melting measurements are likely to report the same sorts of conformational transitions involved in cellular misfolding. Heating up a protein to the point in which membranes (or micelles) are disrupted and the proteins begin to form insoluble aggregates is a distinct physical process from those that occur during co- and post-translational folding within intact ER membranes at physiological temperatures (discussed further in the Response to the Reviews). Indeed, as the reviewer points out below, there seems to be little evidence that secretion is linked to thermal stability or various other metrics that others have attempted to optimize for the sake of purification and/ or structural characterization. Thus, we believe it would be just as speculative to suggest thermal aggregation represents a relevant metric for the propensity of membrane proteins to fold in the cell. The physical interpretation of membrane protein misfolding reaction remains contentious in our field due to the key fact that the denatured states of helical membrane proteins remain highly structured in a manner that is hard to generalize beyond the fact that the denatured states retain α-helical secondary structure (doi: 10.1146/annurev-biophys-051013-022926). This is in stark contrast to soluble proteins, where random coil reference states have proven to be generally useful for energetic interpretations of protein stability. For reference, our lab is currently working to leverage epistatic measurements like this to map the prevailing physiological denatured states of an integral membrane protein. Our current findings suggest that non-native electrostatic interactions form in the context of misfolded states. We hope that more information on the structural aspects of these states will help us to develop and interpret meaningful folding measurements within the membrane.

      For reference, even in cases when quantitative folding measurements can be achieved, their relevance remains actively debated. As a point of reference, the corresponding author of this work previously worked on the stability and misfolding of another human α-helical membrane protein (PMP22). Like GnRHR, PMP22 is prone to misfolding in the secretory pathway and is associated with dozens of pathogenic mutations that cause protein misfolding. To understand how the thermodynamic stability of this protein is linked to secretion, the corresponding author purified PMP22, reconstituted it into n-Dodecyl-phosphocholine (DPC) micelles, and measured its resistance to denaturation by an anionic denaturing detergent (Lauryl Sarcosine, LS). The results were initially perplexing due to the fact that equilibrium unfolding curves manifested as an exponential decay (rather than a sigmoid) and relaxation kinetics appeared to be dominated by the rate constant for unfolding (doi: 10.1021/bi301635f). Unfortunately, these data could not be fit with existing folding models due to the lack of a folded protein baseline and the absence of a folding arm in the chevron plot. We eventually found that a full sigmoidal unfolding transition and refolding kinetics could be measured upon addition of 15% (v/v) glycerol. Our measurements revealed that the free energy of unfolding in DPC micelles was 0 kcal/ mol (without glycerol). This shocking lack of WT stability made it impossible to directly measure the effects of destabilizing mutations that enhance misfolding- you can’t measure the unfolding of a protein that is already unfolded. We ultimately had to instead infer the energetic effects of such mutations from the thermodynamic coupling between cofactor binding and folding (doi: 10.1021/jacs.5b03743). Finally, after demonstrating the resulting ΔΔGs correlated with both cellular trafficking and disease phenotype, we still faced justified scrutiny about the relevance of these measurements due to the fact that they were carried out in micelles. For these reasons, we do not feel that additional biophysical measurements will add much to this work until more is understood about the nature of misfolding reactions in the membrane and how to effectively recapitulate it in vitro. We also note that PMP22 is secreted with 20% efficiency in mammalian cell lines, which is 20-fold more efficient than human GnRHR under similar conditions (doi: 10.1016/j.celrep.2021.110046). Thus, we suspect equilibrium unfolding measurements are likely out of reach using previously described measurements.

      Our greatest evidence suggesting W107A destabilizes the protein has to do with the fact that it deletes a highly conserved structural contact and that this structural modification kills its secretion. The fact that this mutation clearly reduces the escape of GnRHR from ER quality control is a classic indicator of misfolding that represents the cell’s way of telling us that the mutation compromises the folding of the nascent protein in some way or another. Precisely how this mutation remodels the nascent conformational ensemble of nascent GnRHR and how this relates to the free energy difference between the native and non-native portions of its conformational ensemble under cellular conditions is a much more challenging question that lies beyond the scope of this investigation (and likely beyond the scope of what’s currently possible). Indeed, there is an entire field dedicated to understanding such. Nevertheless, the difference in the epistatic interactions formed by W107A and V276T is at the very least consistent with our speculative interpretation that these two mutations vary in their misfolding mechanism and/ or in the extent to which they destabilize the protein. For these reasons, we feel the main conclusions of this manuscript are well-justified.

      Please clarify if the protein is glycosylated or not and, if it is, how would this requirement affect the conclusions of your analysis?

      As we noted in the Response to the Reviewers, which also constitutes a published portion of the final manuscript, this protein is indeed glycosylated. We were well aware of this aspect of the protein since inception of this project and do not think this changes our interpretation at all. Most membrane proteins are glycosylated, and several groups have demonstrated in various ways that the secretion efficiency of glycoproteins is proportional to certain stability metrics for secreted soluble proteins and membrane proteins alike. Generally, mutations that enhance misfolding do not change the propensity of the nascent chain to undergo N-linked glycosylation, which occurs during translation before protein synthesis and/ or folding is complete. Misfolded proteins typically carry lower weight glycans, which reflects their failure to advance from the ER to the Golgi, where N-linked glycans are modified and O-linked glycans are added. From our perspective, glycosyl modifications just ensure that nascent proteins are engaged by calnexin and other lectin chaperones involved in QC. It does not decouple folding from secretion efficiency. In the case of PMP22 (described above), we found that removal of its glycosylation site allows the nascent protein to bypass the lectin chaperones in a manner that enhances its plasma membrane expression eight-fold (doi: 10.1016/j.jbc.2021.100719). Similar to WT, the expression of several misfolded PMP22 variants also significantly increases upon removal of the glycosylation site. Nevertheless, their expression is still significantly lower than the un-glycosylated WT protein, and the expression patterns of the mutants relative to WT was quite similar across this panel of un-glycosylated proteins. Thus, while glycosylation certainly impacts secretion, it does not change its dependence on folding efficiency within the ER. There are many layers of partially redundant QC within the ER, and it seems that folding imposes a key bottleneck to secretion regardless of which QC proteins are involved. For these reasons, we do not think glycosylation (or other PTMs) should factor into our interpretation of these results.

      One caveat with the study is that there is a poor understanding of the factors that decide if the protein should be trafficked to the PM or not. Even secretory proteins not going through the calnexin/reticulum cycle (as they have no N-linked glycans), might still get stuck in the ER, despite the fact they are functional. Could this be a technical issue of heterologous expression overloading the Sec system?

      While we agree that there is much to be learned about this topic, we disagree with the notion that our understanding of folding and secretion is insufficient to generally interpret the molecular basis of the observed trends. In collaboration with various other groups, the corresponding author of this paper has shown for several other proteins that the stability of the native topology and the native tertiary structure can constrain secretion efficiency (see dois: 10.1021/jacs.8b08243, 10.1021/jacs.5b03743, and 10.1016/j.jbc.2021.100423). Moreover, the Balch and Kelly groups demonstrated many years ago that relatively simple models for the coupling between folding and chaperone binding can recapitulate the observed effects of mutations on the secretion efficiency of various proteins (doi: 10.1016/j.cell.2007.10.025). Given a wide body of prevailing knowledge in this area, we believe it is entirely reasonable to assume that the conformational effects of these mutation have a dominant effect on plasma membrane expression.

      Whether or not some of the proteins retained in the ER are folded and/ or functional is an interesting question, but is outside the scope of this work. Various lines of evidence concerning approaches to rescue misfolded membrane proteins suggest many of these variants are likely to retain residual function once they escape the ER, which may suggest there are pockets of foldable/ folded proteins within the ER. But it seems generally clear that the efficiency of folding in the ER bottlenecks secretion regardless of whether or not the ER contains some fraction of folded/ functional protein. We note that it is certainly possible, if not likely, that secretion efficiency is likely to be higher at lower expression levels (doi: 10.1074/jbc.AC120.014940). However, the mutational scanning platform used in this work was designed such that all variants are expressed from an identical promoter at the same location within the genome. Thus, for the purposes of these investigations, we believe it is entirely fair to draw “apples-to-apples” comparisons of their relative effects on plasma membrane expression.

      Please see Francis Arnold's paper on this point and their mutagenesis library of the channelrhodopsin (https://www.pnas.org/doi/10.1073/pnas.1700269114), which further found that 20% of mutations improved WT trafficking. Some general comparisons to this paper might be informative.

      We agree that it may be interesting to compare the results from this paper to those in our own. Indeed, we find that 20% of the point mutations characterized herein also enhance the expression of WT mGnRHR, as mentioned in the Results section. However, we think it might be a bit premature to suggest this is a more general trend in light of the fact that the channelrhodopsins engineered in those studies were not of eukaryotic origin and have likely resulted from distinct evolutionary constraints. We ultimately decided against adding more on this to our already lengthy discussion in order to maintain focus on the mechanisms of epistasis.

      Chris Tate and others have shown that there is a high frequency of finding stabilising point mutations in GPCRs and this is the premise of the StAR technology used to thermostabilise GPCRs in the presence of different ligands, i.e. agonist vs inverse agonists. As far as I am aware, there is a poor correlation between expression levels and thermostability (measured by ligand binding to detergent-solubilised membranes). As such, it is possible that some of the mutants might be more stable than WT even though they have lower levels of PME.

      We believe the disconnect between thermostability and expression precisely speaks to our main point about the suitability of current membrane protein folding assays for the questions we address herein. The degradative activity of ER quality control has not necessarily selected for proteins that are resistant to thermal degradation and/ or are suitable for macromolecular crystallography. For this reason, it is often not so difficult to engineer proteins with enhanced thermal stability. We do not believe this disconnect signals that quality control is insensitive to protein folding and stability, but rather that it is more likely to recognize conformational defects that are distinct from those involved in thermal degradation and/ or aggregation. Indeed, recent work from the Fluman group, which builds on a wider body of previous observations, has shown that the exposure of polar groups within the membrane is a key factor that recruits degradation machinery (doi: 0.1101/2023.12.12.571171). It is hard to imagine that these sorts of conformational defects are the same as those involved in thermal aggregation.

      Reviewer #2 (Recommendations For The Authors):

      (1) I believe that by focusing more on the epistasis with V276T, and less on W107A, the paper could be strengthened significantly.

      We appreciate this sentiment. But we believe the comparison of these two mutants really drive home the point that destabilizing mutations are not equivalent with respect to the epistatic interactions they form.

      (2) In the abstract - please define the term epistasis in a simple way, to make it accessible to a general audience. For example - negative epistasis means that... this should be explicitly explained.

      We thank the reviewer for this suggestion. To meet eLife formatting, we had to cut down the abstract significantly. We simplified this as best we could in the following statement:

      “Though protein stability is known to shape evolution, it is unclear how cotranslational folding constraints modulate the synergistic, epistatic interactions between mutations.”

      We also define positive and negative epistasis in the results section as follows:

      “Positive Ɛ values denote double mutants that have greater PME than would be expected based on the effects of single mutants. Negative Ɛ values denote double mutants that have lower PME than would be expected based on the effects of single mutants. Pairs of mutations with Ɛ values near zero have additive effects on PME.”

      (3) The title is quite complex and might deter readers from outside the protein evolution field. Consider simplifying it.

      We thank the reviewer for this suggestion. We have simplified the title to the following:

      “Divergent Folding-Mediated Epistasis Among Unstable Membrane Protein Variants”

      (4) The paper could benefit from a simple figure explaining the different stages of membrane protein folding (stages 1+2) to make it more accessible to readers from outside the membrane protein field.

      This is a great suggestion. We incorporated a new schematic in the revised manuscript that outlines the nature of these processes (see Fig. 1A in the revised manuscript).

      (5) For the FACS-Seq experiment - it was not clear to me if and when all cells are pulled together. For example - are the 3 libraries mixed together already at the point of transfection, or are the transfected cells pulled together at any point before sorting? This could have some implications on batch effects and should, therefore, be explicitly mentioned in the main text.

      We thank the reviewer for this suggestion. We modified the description of the DNA library assembly to emphasize that the mutations were generated in the context of three mixed plasmid pools, which were then transfected into the cells and sorted independently:

      “We then generated a mixed array of mutagenic oligonucleotides that collectively encode this series of substitutions (Table S3) and used nicking mutagenesis to introduce these mutations into the V276T, W107A, and WT mGnRHR cDNAs (Medina-Cucurella et al., 2019), which produced three mixed plasmid pools.”

      (6) The following description in the text is quite confusing. It would be better to simplify it considerably or remove it: "scores (Ɛ) were then determined by taking the log of the double mutant fitness value divided by the difference between the single mutant fitness values (see Methods)."

      We thank the reviewer for this valuable feedback and have simplified the text as follows:

      “To compare epistatic trends in these libraries, we calculated epistasis scores (Ɛ) for the interactions that these 251 mutations form with V276T and W107A by comparing their relative effects on PME of the WT, V276T, and W107A variants using a previously described epistasis model (product model, see Methods) (Olson et al. 2014).”

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      For many years, there has been extensive electrophysiological research investigating the relationship between local field potential patterns and individual cell spike patterns in the hippocampus. In this study, using state-of-the-art imaging techniques, they examined spike synchrony of hippocampal cells during locomotion and immobility states. In contrast to conventional understanding of the hippocampus, the authors demonstrated that hippocampal place cells exhibit prominent synchronous spikes locked to theta oscillations.

      Strengths:

      The voltage imaging used in this study is a highly novel method that allows recording not only suprathreshold-level spikes but also subthreshold-level activity. With its high frame rate, it offers time resolution comparable to electrophysiological recordings. Moreover, it enables the visualization of actual cell locations, allowing for the examination of spatial properties (e.g., Figure 4G).

      We thank the reviewer for pointing out the technical novelty of this work.

      Weaknesses:

      There is a notable deviation from several observations obtained through conventional electrophysiological recordings. Particularly, as mentioned below in detail, the considerable differences in baseline firing rates and no observations of ripple-triggered firing patterns raise some concerns about potential artifacts from imaging and analysis, such as cell toxicity, abnormal excitability, and false detection of spikes. While these findings are intriguing if the validity of these methods is properly proven, accepting the current results as new insights is challenging.

      We appreciate the reviewer’s insightful comments regarding the intriguing aspect of our findings. Indeed, the emergence of a novel form of CA1 population synchrony presents exciting implications for hippocampal memory research and beyond.

      While we acknowledge the deviations from conventional electrophysiological recordings, we respectfully contend that these differences do not necessarily imply methodological flaws. All experiments and analyses were conducted with meticulous adherence to established standards in the field.

      Regarding the observed variations in averaging firing rates, it is important to note the well-documented heterogeneity in CA1 pyramidal neuron firing rates, spanning from 0.01 to 10 Hz, with a skewed distribution toward lower frequencies (Mizuseki et al., 2013). Our exclusion criteria for neurons with low estimated firing rates may have inadvertently biased the selection towards more active neurons. Moreover, prior research has indicated that averaging firing rates tend to increase during exposure to novel environments (Karlsson et al., 2008), and among deep-layer CA1 pyramidal neurons (Mizuseki et al., 2011). Given our recording setup in a highly novel environment and the predominance of deep CA1 pyramidal neurons in our sample, the observed higher averaging firing rates could be influenced by these factors. Considering these points, our mean firing rates (3.2 Hz) are reasonable estimations compared to previously reported values obtained from electrophysiological recordings (2.1 Hz in McHugh et al., 1996 and 2.4-2.6 Hz in Buzsaki et al., 2003).

      Regarding concerns about potential cell toxicity, previous studies have shown that Voltron expression and illumination do not significantly alter membrane resistance, membrane capacitance, resting membrane potentials, spike amplitudes, and spike width (see Abdelfattah 2019, Science, Supplementary Figure 11 and 12). In our recordings, imaged neurons exhibit preserved membrane and dendritic morphology during and after experiments (Author response image 1), supporting the absence of significant toxicity.

      Author response image 1.

      Voltron-expressing neurons exhibit preserved membrane and dendritic morphology. (A) Images of two-photon z-stack maximum intensity projection showing Voltron-expressing neurons taken after voltage image experiments in vivo. (B) Post-hoc histological images of neurons being voltage-imaged.

      Regarding spike detection, we use validated algorithms (Abdelfattah et al., 2019 and 2023) to ensure robust and reliable detection of spikes. Spiking activity was first separated from slower subthreshold potentials using high-pass filtering. This way, a slow fluorescence increase will not be detected as a spike, even if its amplitude is large. We benchmarked the detection algorithm in computer simulation. The sensitivity and specificity of the algorithm exceed 98% at the level of signal-to-noise ratio of our recordings. While we acknowledge that a small number of spikes, particularly those occurring later in a burst, might be missed due to their smaller amplitudes (as illustrated in Figure 1 and 2 of the manuscript), we anticipate that any missed spikes would lead to a decrease rather than an increase in synchrony between neurons. Overall, we are confident that spike detection is performed in a rigorous and robust manner.

      To further strengthen these points, we will include the following in the revision:

      (1) Histological images of recorded neurons during and after experiments.

      (2) Further details regarding the validation of spike detection algorithms.

      (3) Analysis of publicly available electrophysiological datasets.

      (4) Discussion regarding the reasons behind the novelty of some of our findings compared to previous observations.

      In conclusion, we assert that our experimental and analysis approach upholds rigorous standards. We remain committed to reconciling our findings with previous observations and welcome further scrutiny and engagement from the scientific community to explore the intriguing implications of our findings.

      Reviewer #2 (Public Review):

      Summary:

      This study employed voltage imaging in the CA1 region of the mouse hippocampus during the exploration of a novel environment. The authors report synchronous activity, involving almost half of the imaged neurons, occurred during periods of immobility. These events did not correlate with SWRs, but instead, occurred during theta oscillations and were phased-locked to the trough of theta. Moreover, pairs of neurons with high synchronization tended to display non-overlapping place fields, leading the authors to suggest these events may play a role in binding a distributed representation of the context.

      We thank the reviewer for a thorough and thoughtful review of our paper.

      Strengths:

      Technically this is an impressive study, using an emerging approach that allows single-cell resolution voltage imaging in animals, that while head-fixed, can move through a real environment. The paper is written clearly and suggests novel observations about population-level activity in CA1.

      We thank the reviewer for pointing out the technical strength and the novelty of our observations.

      Weaknesses:

      The evidence provided is weak, with the authors making surprising population-level claims based on a very sparse data set (5 data sets, each with less than 20 neurons simultaneously recorded) acquired with exciting, but less tested technology. Further, while the authors link these observations to the novelty of the context, both in the title and text, they do not include data from subsequent visits to support this. Detailed comments are below:

      We understand the reviewer’s concerns regarding the size of the dataset. Despite this limitation, it is important to note that synchronous ensembles beyond what could be expected from chance (jittering) were detected in all examined data. In the revision, we plan to add more data, including data from subsequent visits, to further strengthen our findings.

      (1) My first question for the authors, which is not addressed in the discussion, is why these events have not been observed in the countless extracellular recording experiments conducted in rodent CA1 during the exploration of novel environments. Those data sets often have 10x the neurons simultaneously recording compared to these present data, thus the highly synchronous firing should be very hard to miss. Ideally, the authors could confirm their claims via the analysis of publicly available electrophysiology data sets. Further, the claim of high extra-SWR synchrony is complicated by the observation that their recorded neurons fail to spike during the limited number of SWRs recorded during behavior- again, not agreeing with much of the previous electrophysiological recordings.

      We understand the reviewer’s concern. We will examine publicly available electrophysiology datasets to gain further insights into any similarities and differences to our findings. Based on these results, we will discuss why these events have not been previously observed/reported.

      (2) The authors posit that these events are linked to the novelty of the context, both in the text, as well as in the title and abstract. However, they do not include any imaging data from subsequent days to demonstrate the failure to see this synchrony in a familiar environment. If these data are available it would strengthen the proposed link to novelty if they were included.

      We thank the reviewer’s constructive suggestion. We will acquire more datasets from subsequent visits to gain further insights into these synchronous events.

      3) In the discussion the authors begin by speculating the theta present during these synchronous events may be slower type II or attentional theta. This can be supported by demonstrating a frequency shift in the theta recording during these events/immobility versus the theta recording during movement.

      We thank the reviewer’s constructive suggestion. We did demonstrate a frequency shift to a lower frequency in the synchrony-associated theta during immobility than during locomotion (see Fig. 4B, the red vs. blue curves). We will enlarge this panel and specifically refer to it in the corresponding discussion paragraph.

      (4) The authors mention in the discussion that they image deep-layer PCs in CA1, however, this is not mentioned in the text or methods. They should include data, such as imaging of a slice of a brain post-recording with immunohistochemistry for a layer-specific gene to support this.

      We thank the reviewer’s constructive suggestion. We do have images of brain slices post-recordings (Author response image 2). Imaged neurons are clearly located in the deep CA1 pyramidal layer. We will add these images and quantification in the revised manuscript.

      Author response image 2.

      Imaged neurons are located in the deep pyramidal layer of the dorsal hippocampal CA1 region.

      Reviewer #3 (Public Review):

      Summary:

      In the present manuscript, the authors use a few minutes of voltage imaging of CA1 pyramidal cells in head-fixed mice running on a track while local field potentials (LFPs) are recorded. The authors suggest that synchronous ensembles of neurons are differentially associated with different types of LFP patterns, theta and ripples. The experiments are flawed in that the LFP is not "local" but rather collected in the other side of the brain, and the investigation is flawed due to multiple problems with the point process analyses. The synchrony terminology refers to dozens of milliseconds as opposed to the millisecond timescale referred to in prior work, and the interpretations do not take into account theta phase locking as a simple alternative explanation.

      We genuinely appreciate the reviewer’s feedback and acknowledge the concerns raised. However, we believe these concerns can be effectively addressed without undermining the validity of our conclusions. With this in mind, we respectfully disagree with the assessment that our experiments and investigation are flawed. Please allow us to address these concerns and offer additional context to support the validity of our study.

      Weaknesses:

      The two main messages of the manuscript indicated in the title are not supported by the data. The title gives two messages that relate to CA1 pyramidal neurons in behaving head-fixed mice: (1) synchronous ensembles are associated with theta (2) synchronous ensembles are not associated with ripples.

      There are two main methodological problems with the work:

      (1) Experimentally, the theta and ripple signals were recorded using electrophysiology from the opposite hemisphere to the one in which the spiking was monitored. However, both signals exhibit profound differences as a function of location: theta phase changes with the precise location along the proximo-distal and dorso-ventral axes, and importantly, even reverses with depth. And ripples are often a local phenomenon - independent ripples occur within a fraction of a millimeter within the same hemisphere, let alone different hemispheres. Ripples are very sensitive to the precise depth - 100 micrometers up or down, and only a positive deflection/sharp wave is evident.

      We appreciate the reviewer’s consideration regarding the collection of LFP from the contralateral hemisphere. While we acknowledge the limitation of this design, we believe that our findings still offer valuable insights into the dynamics of synchronous ensembles. Despite potential variations in theta phases with recording locations and depth, we find that the occurrence and amplitudes of theta oscillations are generally coordinated across hemispheres (Buzsaki et al., Neurosci., 2003). Therefore, the presence of prominent contralateral LFP theta around the times of synchronous ensembles in our study (see Figure 4A of the manuscript) strongly supports our conclusion regarding their association with theta oscillations, despite the collection of LFP from the opposite hemisphere.

      In addition, in our manuscript, we specifically mentioned that the “preferred phases” varied from session to session, likely due to the variability of recording locations (see Line 254-256). Therefore, we think that the reviewer’s concern regarding theta phase variability has already been addressed in the present manuscript.

      Regarding ripple oscillations, while we recognize that they can sometimes occur locally, the majority of ripples occur synchronously in both hemispheres (up to 70%, see Szabo et al., Neuron, 2022; Buzsaki et al., Neurosci., 2003). Therefore, using contralateral LFP to infer ripple occurrence on the ipsilateral side has been a common practice in the field, employed by many studies published in respectable journals (Szabo et al., Neuron, 2022; Terada et al., Nature, 2021; Dudok et al., Neuron, 2021; Geiller et al., Neuron, 2020). Furthermore, our observation that 446 synchronous ensembles during immobility do not co-occur with contralateral ripples, and the remaining 313 ensembles during locomotion are not associated with ripples, as ripples rarely occur during locomotion. Therefore, our conclusion that synchronous ensembles are not associated with ripple oscillations is supported by data.

      (2) The analysis of the point process data (spike trains) is entirely flawed. There are many technical issues: complex spikes ("bursts") are not accounted for; differences in spike counts between the various conditions ("locomotion" and "immobility") are not accounted for; the pooling of multiple CCGs assumes independence, whereas even conditional independence cannot be assumed; etc.

      We acknowledge the reviewer’s concern regarding spike train analysis. Indeed, complex bursts or different behavioral conditions can lead to differences in spike counts that could potentially affect the detection of synchronous ensembles. However, our jittering procedure (see Line 121-132) is designed to control for the variation of spike counts. Importantly, while the jittered spike trains also contain the same spike count variations, we found 7.8-fold more synchronous events in our data compared to jitter controls (see Figure 1G of the manuscript), indicating that these factors cannot account for the observed synchrony.

      To explicitly demonstrate that complex bursts cannot account for the observed synchrony, we have performed additional analysis to remove all latter spikes in bursts and only count the single and the first spikes of bursts. Importantly, we found that this procedure did not change the rate and size of synchronous ensembles, nor did it significantly alter the grand-average CCG (see Author response image 3). The results of this analysis explicitly rule out a significant effect of complex spikes on the analysis of synchronous ensembles.

      Author response image 3.

      Population synchrony remains after the removal of spikes in bursts. (A) The grand-average cross correlogram (CCG) was calculated using spike trains without latter spikes in bursts. The gray line represents the mean grand average CCG between reference cells and randomly selected cells from different sessions. (B) Pairwise comparison of the event rates of population synchrony between spike trains containing all spikes and spike trains without latter spikes in bursts. Bar heights indicate group means (n=10 segments, p=0.036, Wilcoxon signed-rank test). (C) Histogram of the ensemble sizes as percentages of cells participating in the synchronous ensembles.

      Beyond those methodological issues, there are two main interpretational problems: (1) the "synchronous ensembles" may be completely consistent with phase locking to the intracellular theta (as even shown by the authors themselves in some of the supplementary figures).

      We agree with the reviewer that the synchronous ensembles are indeed consistent with theta phase locking. However, it is important to note that theta phase locking alone does not necessarily imply population synchrony. In fact, theta phase locking has been shown to “reduce” population synchrony in a previous study (Mizuseki et al., 2014, Phil. Trans. R. Soc. B.). Thus, the presence of theta phase locking cannot be taken as a simple alternative explanation of the synchronous ensembles.

      To directly assess the contribution of theta phase locking to synchronous ensembles, we have performed a new analysis to randomize the specific theta cycles in which neurons spike, while keeping the spike phases constant. This manipulation disrupts spike co-occurrence while preserving theta phase locking, allowing us to test whether theta phase locking alone can explain the population synchrony, or whether spike co-occurrence in specific cycles is required. The grand-average CCG shows a much smaller peak compared to the original peak (Author response image 4A). Moreover, synchronous event rates show a 4.5-fold decrease in the randomized data compared to the original event rates (Author response image 4B). Thus, the new analysis reveals theta phase locking alone cannot account for the population synchrony.

      Author response image 4.

      Drastic reduction of population synchrony by randomizing spikes to other theta cycles while preserving the phases. (A) The grand-average cross correlogram (CCG) was calculated using original spike trains (black) and randomized spike trains where theta phases of the spikes are kept the same but spike timings were randomly moved to other theta cycles (red). (B) Pairwise comparison of the event rates of population synchrony between the original spike trains and randomized spike trains (n=10 segments, p=0.002, Wilcoxon signed-rank test). Bar heights indicate group means. ** p<0.01

      (2) The definition of "synchrony" in the present work is very loose and refers to timescales of 20-30 ms. In previous literature that relates to synchrony of point processes, the timescales discussed are 1-2 ms, and longer timescales are referred to as the "baseline" which is actually removed (using smoothing, jittering, etc.).

      Regarding the timescale of synchronous ensembles, we acknowledge that it varies considerably across studies and cell types. However, it is important to note that a timescale of dozens, or even hundreds of milliseconds is common for synchrony terminology in CA1 pyramidal neurons (see Csicsvari et al., Neuron, 2000; Harris et al., Science, 2003; Malvache et al., Science, 2016; Yagi et al., Cell Reports, 2023). In fact, a timescale of 20-30 ms is considered particularly important for information transmission and storage in CA1, as it matches the membrane time constant of pyramidal neurons, the period of hippocampal gamma oscillations, and the time window for synaptic plasticity. Therefore, we believe that this timescale is relevant and in line with established practices in the field.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      Connelly and colleagues provide convincing genetic evidence that importation from mainland Tanzania is a major source of Plasmodium falciparum lineages currently circulating in Zanzibar. This study also reveals ongoing local malaria transmission and occasional near-clonal outbreaks in Zanzibar. Overall, this research highlights the role of human movements in maintaining residual malaria transmission in an area targeted for intensive control interventions over the past decades and provides valuable information for epidemiologists and public health professionals.

      Reviewer #1 (Public Review):

      Zanzibar archipelago is close to achieving malaria elimination, but despite the implementation of effective control measures, there is still a low-level seasonal malaria transmission. This could be due to the frequent importation of malaria from mainland Tanzania and Kenya, reservoirs of asymptomatic infections, and competent vectors. To investigate population structure and gene flow of P. falciparum in Zanzibar and mainland Tanzania, they used 178 samples from mainland Tanzania and 213 from Zanzibar that were previously sequenced using molecular inversion probes (MIPs) panels targeting single nucleotide polymorphisms (SNPs). They performed Principal Component Analysis (PCA) and identity by descent (IBD) analysis to assess genetic relatedness between isolates. Parasites from coastal mainland Tanzania contribute to the genetic diversity in the parasite population in Zanzibar. Despite this, there is a pattern of isolation by distance and microstructure within the archipelago, and evidence of local sharing of highly related strains sustaining malaria transmission in Zanzibar that are important targets for interventions such as mass drug administration and vector control, in addition to measures against imported malaria.

      Strengths:

      This study presents important samples to understand population structure and gene flow between mainland Tanzania and Zanzibar, especially from the rural Bagamoyo District, where malaria transmission persists and there is a major port of entry to Zanzibar. In addition, this study includes a larger set of SNPs, providing more robustness for analyses such as PCA and IBD. Therefore, the conclusions of this paper are well supported by data.

      Weaknesses:

      Some points need to be clarified:

      (1) SNPs in linkage disequilibrium (LD) can introduce bias in PCA and IBD analysis. Were SNPs in LD filtered out prior to these analyses?

      Thank you for this point. We did not filter SNPs in LD prior to this analysis. In the PCA analysis in Figure 1, we did restrict to a single isolate among those that were clonal (high IBD values) to prevent bias in the PCA. In general, disequilibrium is minimal only over small distances <5-10kb without selective forces at play. This is much less than the average spacing of the markers in the panel. If there is minimal LD, the conclusions drawn on relative levels and connections at high IBD are unlikely to be confounded by any effects of disequilibrium.

      ( 2) Many IBD algorithms do not handle polyclonal infections well, despite an increasing number of algorithms that are able to handle polyclonal infections and multiallelic SNPs. How polyclonal samples were handled for IBD analysis?

      Thank you for this point. We added lines 157-161 to clarify. This section now reads:

      “To investigate genetic relatedness of parasites across regions, identity by descent (IBD) estimates were assessed using the within sample major alleles (coercing samples to monoclonal by calling the dominant allele at each locus) and estimated utilizing a maximum likelihood approach using the inbreeding_mle function from the MIPanalyzer package (Verity et al., 2020). This approach has previously been validated as a conservative estimate of IBD (Verity et al., 2020).”

      Please see the supplement in (Verity et al., 2020) for an extensive simulation study that validates this approach.

      Reviewer #1 (Recommendations For The Authors):

      (3) I think Supplementary Figures 8 and 9 are more visually informative than Figure 2.

      Thank you for your response. We performed the analysis in Figure 2 to show how IBD varies between different regions and is higher within a region than between.

      Reviewer #2 (Public Review):

      This manuscript describes P. falciparum population structure in Zanzibar and mainland Tanzania. 282 samples were typed using molecular inversion probes. The manuscript is overall well-written and shows a clear population structure. It follows a similar manuscript published earlier this year, which typed a similar number of samples collected mostly in the same sites around the same time. The current manuscript extends this work by including a large number of samples from coastal Tanzania, and by including clinical samples, allowing for a comparison with asymptomatic samples.

      The two studies made overall very similar findings, including strong small-scale population structure, related infections on Zanzibar and the mainland, near-clonal expansion on Pemba, and frequency of markers of drug resistance. Despite these similarities, the previous study is mentioned a single time in the discussion (in contrast, the previous research from the authors of the current study is more thoroughly discussed). The authors missed an opportunity here to highlight the similar findings of the two studies.

      Thank you for your insights. We appreciated the level of detail of your review and it strengthened our work. We have input additional sentences on lines 292-295, which now reads:

      “A recent study investigating population structure in Zanzibar also found local population microstructure in Pemba (Holzschuh et al., 2023). Further, both studies found near-clonal parasites within the same district, Micheweni, and found population microstructure over Zanzibar.”

      Strengths:

      The overall results show a clear pattern of population structure. The finding of highly related infections detected in close proximity shows local transmission and can possibly be leveraged for targeted control.

      Weaknesses:

      A number of points need clarification:

      (1) It is overall quite challenging to keep track of the number of samples analyzed. I believe the number of samples used to study population structure was 282 (line 141), thus this number should be included in the abstract rather than 391. It is unclear where the number 232 on line 205 comes from, I failed to deduct this number from supplementary table 1.

      Thank you for this point. We have included 282 instead of 391 in the abstract. We added a statement in the results at lines 203-205 to clarify this point, which now reads:

      “PCA analysis of 232 coastal Tanzanian and Zanzibari isolates, after pruning 51 samples with an IBD of greater than 0.9 to one representative sample, demonstrates little population differentiation (Figure 1A).”

      (2) Also, Table 1 and Supplementary Table 1 should be swapped. It is more important for the reader to know the number of samples included in the analysis (as given in Supplementary Table 1) than the number collected. Possibly, the two tables could be combined in a clever way.

      Thank you for this advice. Rather than switch to another table altogether, we appended two columns to the original table to better portray the information (see Table 1).

      Methods

      (3) The authors took the somewhat unusual decision to apply K-means clustering to GPS coordinates to determine how to combine their data into a cluster. There is an obvious cluster on Pemba islands and three clusters on Unguja. Based on the map, I assume that one of these three clusters is mostly urban, while the other two are more rural. It would be helpful to have a bit more information about that in the methods. See also comments on maps in Figures 1 and 2 below.

      Cluster 3 is a mix of rural/urban while the clusters 2, 4 and 5 are mostly rural. This analysis was performed to see how IBD changes in relation to local context within different regions in Zanzibar, showing that there is higher IBD within locale than between locale.

      (4) Following this point, in Supplemental Figure 5 I fail to see an inflection point at K=4. If there is one, it will be so weak that it is hardly informative. I think selecting 4 clusters in Zanzibar is fine, but the justification based on this figure is unclear.

      The K-means clustering experiment was used to cluster a continuous space of geographic coordinates in order to compare genetic relatedness in different regions. We selected this inflection point based on the elbow plot and based the number to obtain sufficient subsections of Zanzibar to compare genetic relatedness. This point is added to the methods at lines 174-178, which now reads:

      “The K-means clustering experiment was used to cluster a continuous space of geographic coordinates in order to compare genetic relatedness in different regions. We selected K = 4 as the inflection point based on the elbow plot (Supplemental Figure 5) and based the number to obtain sufficient subsections of Zanzibar to compare genetic relatedness.”

      (5) For the drug resistance loci, it is stated that "we further removed SNPs with less than 0.005 population frequency." Was the denominator for this analysis the entire population, or were Zanzibar and mainland samples assessed separately? If the latter, as for all markers <200 samples were typed per site, there could not be a meaningful way of applying this threshold. Given data were available for 200-300 samples for each marker, does this simply mean that each SNP needed to be present twice?

      Population frequency is calculated based on the average within sample allele frequency of each individual in the population, which is an unbiased estimator. Within sample allele frequency can range from 0 to 1. Thus, if only one sample has an allele and it is at 0.1 within sample frequency, the population allele frequency would be 0.1/100 = 0.001. This allele is removed even though this would have resulted in a prevalence of 0.01. This filtering is prior to any final summary frequency or prevalence calculations (see MIP variant Calling and Filtering section in the methods). This protects against errors occurring only at low frequency.

      Discussion:

      (6) I was a bit surprised to read the following statement, given Zanzibar is one of the few places that has an effective reactive case detection program in place: "Thus, directly targeting local malaria transmission, including the asymptomatic reservoir which contributes to sustained transmission (Barry et al., 2021; Sumner et al., 2021), may be an important focus for ultimately achieving malaria control in the archipelago (Björkman & Morris, 2020)." I think the current RACD program should be mentioned and referenced. A number of studies have investigated this program.

      Thank you for this point. We have added additional context and clarification on lines 275-280, which now reads:

      “Thus, directly targeting local malaria transmission, including the asymptomatic reservoir which contributes to sustained transmission (Barry et al., 2021; Sumner et al., 2021), may be an important focus for ultimately achieving malaria control in the archipelago (Björkman & Morris, 2020). Currently, a reactive case detection program within index case households is being implemented, but local transmission continues and further investigation into how best to control this is warranted (Mkali et al. 2023).”

      (7) The discussion states that "In Zanzibar, we see this both within and between shehias, suggesting that parasite gene flow occurs over both short and long distances." I think the term 'long distances' should be better defined. Figure 4 shows that highly related infections rarely span beyond 20-30 km. In many epidemiological studies, this would still be considered short distances.

      Thank you for this point. We have edited the text at lines 287-288 to indicate that highly related parasites mainly occur at the range of 20-30km, which now reads:

      “In Zanzibar, highly related parasites mainly occur at the range of 20-30km.”

      (8) Lines 330-331: "Polymorphisms associated with artemisinin resistance did not appear in this population." Do you refer to background mutations here? Otherwise, the sentence seems to repeat lines 324. Please clarify.

      We are referring to the list of Pfk13 polymorphisms stated in the Methods from lines 146-148. We added clarifying text on lines 326-329:

      “Although polymorphisms associated with artemisinin resistance did not appear in this population, continued surveillance is warranted given emergence of these mutations in East Africa and reports of rare resistance mutations on the coast consistent with spread of emerging Pfk13 mutations (Moser et al., 2021). “

      (9) Line 344: The opinion paper by Bousema et al. in 2012 was followed by a field trial in Kenya (Bousema et al, 2016) that found that targeting hotspots did NOT have an impact beyond the actual hotspot. This (and other) more recent finding needs to be considered when arguing for hotspot-targeted interventions in Zanzibar.

      We added a clarification on this point on lines 335-345, which now reads:

      “A recent study identified “hotspot” shehias, defined as areas with comparatively higher malaria transmission than other shehias, near the port of Zanzibar town and in northern Pemba (Bisanzio et al., 2023). These regions overlapped with shehias in this study with high levels of IBD, especially in northern Pemba (Figure 4). These areas of substructure represent parasites that differentiated in relative isolation and are thus important locales to target intervention to interrupt local transmission (Bousema et al., 2012). While a field cluster-randomized control trial in Kenya targeting these hotspots did not confer much reduction of malaria outside of the hotspot (Bousema et al. 2016), if areas are isolated pockets, which genetic differentiation can help determine, targeted interventions in these areas are likely needed, potentially through both mass drug administration and vector control (Morris et al., 2018; Okell et al., 2011). Such strategies and measures preventing imported malaria could accelerate progress towards zero malaria in Zanzibar.”

      Figures and Tables:

      (10) Table 2: Why not enter '0' if a mutation was not detected? 'ND' is somewhat confusing, as the prevalence is indeed 0%.

      Thank you for this point. We have put zero and also given CI to provide better detail.

      (11) Figure 1: Panel A is very hard to read. I don't think there is a meaningful way to display a 3D-panel in 2D. Two panels showing PC1 vs. PC2 and PC1 vs. PC3 would be better. I also believe the legend 'PC2' is placed in the wrong position (along the Y-axis of panel 2).

      Supplementary Figure 2B suffers from the same issue.

      Thank you for your comment. A revised Figure 1 and Supplemental Figure 2 are included, where there are separate plots for PC1 vs. PC2 and PC1 vs. PC3.

      (12) The maps for Figures 1 and 2 don't correspond. Assuming Kati represents cluster 4 in Figure 2, the name is put in the wrong position. If the grouping of shehias is different between the Figures, please add an explanation of why this is.

      Thank you for this point. The districts with at least 5 samples present are plotted in the map in Figure 1B. In Figure 2, a totally separate analysis was performed, where all shehias were clustered into separate groups with k-means and the IBD values were compared between these clusters. These maps are not supposed to match, as they are separate analyses. Figure 1B is at the district level and Figure 2 is clustering shehias throughout Zanzibar.

      The figure legend of Figure 1B on lines 410-414 now reads:

      “B) A Discriminant Analysis of Principal Components (DAPC) was performed utilizing isolates with unique pseudohaplotypes, pruning highly related isolates to a single representative infection. Districts were included with at least 5 isolates remaining to have sufficient samples for the DAPC. For plotting the inset map, the district coordinates (e.g. Mainland, Kati, etc.) are calculated from the averages of the shehia centroids within each district.”

      The figure legend of Figure 2 on lines 417-425 now reads:

      “Figure 2. Coastal Tanzania and Zanzibari parasites have more highly related pairs within their given region than between regions. K-means clustering of shehia coordinates was performed using geographic coordinates all shehias present from the sample population to generate 5 clusters (colored boxes). All shehias were included to assay pairwise IBD between differences throughout Zanzibar. Pairwise comparisons of within cluster IBD (column 1 of IBD distribution plots) and between cluster IBD (column 2-5 of IBD distribution plots) was done for all clusters. In general, within cluster IBD had more pairwise comparisons containing high IBD identity.”

      (13) Figure 2: In the main panel, please clarify what the lines indicate (median and quartiles?). It is very difficult to see anything except the outliers. I wonder whether another way of displaying these data would be clearer. Maybe a table with medians and confidence intervals would be better (or that data could be added to the plots). The current plots might be misleading as they are dominated by outliers.

      Thank you for this point and it greatly improved this figure. We changed the plotting mechanisms through using a beeswarm plot, which plots all pairwise IBD values within each comparison group.

      (14) In the insert, the cluster number should not only be given as a color code but also added to the map. The current version will be impossible to read for people with color vision impairment, and it is confusing for any reader as the numbers don't appear to follow any logic (e.g. north to south).

      Thank you very much for these considerations. We changed the color coding to a color blind friendly palette and renamed the clusters to more informative names; Pemba, Unguja North (Unguja_N), Unguja Central (Unguja_C), Unguja South (Unguja_S) and mainland Tanzania (Mainland).

      (15) The legend for Figure 3 is difficult to follow. I do not understand what the difference in binning was in panels A and B compared to C.

      Thank you for this point. We have edited the legend to reflect these changes. The legend for Figure 3 on lines 427-433 now reads:

      “Figure 3. Isolation by distance is shown between all Zanzibari parasites (A), only Unguja parasites (B) and only Pemba parasites (C). Samples were analyzed based on geographic location, Zanzibar (N=136) (A), Unguja (N=105) (B) or Pemba (N=31) (C) and greater circle (GC) distances between pairs of parasite isolates were calculated based on shehia centroid coordinates. These distances were binned at 4km increments out to 12 km. IBD beyond 12km is shown in Supplemental Figure 8. The maximum GC distance for all of Zanzibar was 135km, 58km on Unguja and 12km on Pemba. The mean IBD and 95% CI is plotted for each bin.”

      (16) Font sizes for panel C differ, and it is not aligned with the other panels.

      Thank you for pointing this out. Figure 3 and Supplemental Figure 10 are adjusted with matching formatting for each plot.

      (17) Why is Kusini included in Supplemental Figure 4, but not in Figure 1?

      In Supplemental Figure 4, all isolates were used in this analysis and isolates with unique pseudohaplotypes were not pruned to a single representative infection. That is why there are additional isolates in Kusini. The legend for Supplemental Figure 4 now reads:

      “Supplemental Figure 4. PCA with highly related samples shows population stratification radiating from coastal Mainland to Zanzibar. PCA of 282 total samples was performed using whole sample allele frequency (A) and DAPC was performed after retaining samples with unique pseudohaplotypes in districts that had 5 or more samples present (B). As opposed to Figure 1, all isolates were used in this analysis and isolates with unique pseudohaplotypes were not pruned to a single representative infection.”

      (18) Supplemental Figures 6 and 7: What does the width of the line indicate?

      The sentence below was added to the figure legends of Supplemental Figures 6 and 7 and the legends of each network plot were increased in size:

      “The width of each line represents higher magnitudes of IBD between pairs.”

      (19) What was the motivation not to put these lines on the map, as in Figure 4A? This might make it easier to interpret the data.

      Thank you for this comment. For Supplemental Figure 8 and 9, we did not put these lines that represent lower pairwise IBD to draw the reader's attention to the highly related pairs between and within shehias.

      Reviewer #2 (Recommendations For The Authors):

      (1) There is a rather long paragraph (lines 300-323) on COI of asymptomatic infections and their genetic structure. Given that the current study did not investigate most of the hypotheses raised there (e.g. immunity, expression of variant genes), and the overall limited number of asymptomatic samples typed, this part of the discussion feels long and often speculative.

      Thank you for your perspective. The key sections highlighted in this comment, regarding immunity and expression of variant genes, were shortened. This section on lines 300-303 now reads:

      “Asymptomatic parasitemia has been shown to be common in falciparum malaria around the globe and has been shown to have increasing importance in Zanzibar (Lindblade et al., 2013; Morris et al., 2015). What underlies the biology and prevalence of asymptomatic parasitemia in very low transmission settings where anti-parasite immunity is not expected to be prevalent remains unclear (Björkman & Morris, 2020).”

      (2) As a detail, line 304 mentions "few previous studies" but only one is cited. Are there studies that investigated this and found opposite results?

      Thank you for this comment. We added additional studies that did not find an association between clinical disease and COI. These changes are on lines 303-308, which now reads:

      “Similar to a few previous studies, we found that asymptomatic infections had a higher COI than symptomatic infections across both the coastal mainland and Zanzibar parasite populations (Collins et al., 2022; Kimenyi et al., 2022; Sarah-Matio et al., 2022). Other studies have found lower COI in severe vs. mild malaria cases (Robert et al., 1996) or no significant difference between COI based on clinical status (Earland et al. 2019; Lagnika et al. 2022; Conway et al. 1991; Kun et al. 1998; Tanabe et al. 2015)”

      (3) Table 2: Percentages need to be checked. To take one of several examples, for Pfk13-K189N a frequency of 0.019 for the mutant allele is given among 137 samples. 2/137 equals to 0.015, and 3/137 to 0.022. 0.019 cannot be achieved. The same is true for several other markers. Possibly, it can be explained by the presence of polyclonal infections. If so, it should be clarified what the total of clones sequenced was, and whether the prevalence is calculated with the number of samples or number of clones as the denominator.

      Thank you for this point. We mistakenly reported allele frequency instead of prevalence. An updated Table 2 is now in the manuscript. The method for calculating the prevalence is now at lines 148-151:

      “Prevalence was calculated separately in Zanzibar or mainland Tanzania for each polymorphism by the number of samples with alternative genotype calls for this polymorphism over the total number of samples genotyped and an exact 95% confidence interval was calculated using the Pearson-Klopper method for each prevalence.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents useful findings regarding the role of formin-like 2 in mouse oocyte meiosis. The submitted data are supported by incomplete analyses, and in some cases, the conclusions are overstated. If these concerns are addressed, this paper would be of interest to reproductive biologists.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The presented study focuses on the role of formin-like 2 (FMNL2) in oocyte meiosis. The authors assessed FMNL2 expression and localization in different meiotic stages and subsequently, by using siRNA, investigated the role of FMNL2 in spindle migration, polar body extrusion, and distribution of mitochondria and endoplasmic reticulum (ER) in mouse oocytes.

      Strengths:

      Novelty in assessing the role of formin-like 2 in oocyte meiosis.

      Weaknesses:

      Methods are not properly described.

      Overstating presented data.

      It is not clear what statistical tests were used.

      My main concern is that there are missing important details of how particular experiments and analyses were done. The material and methods section are not written in the way that presented experiments could be repeated - it is missing basic information (e.g., used mouse strain, timepoints of oocytes harvest for particular experiments, used culture media, image acquisition parameters, etc.). Some of the presented data are overstated and incorrectly interpreted. It is not clear to me how the analysis of ER and mitochondria distribution was done, which is an important part of the presented data interpretation. I'm also missing important information about the timing of particular stages of assessed oocytes because the localization of both ER and mitochondria differs at different stages of oocyte meiosis. The data interpretation needs to be justified by proper analysis based on valid parameters, as there is considerable variability in the ER and mitochondria structure and localization across oocytes based on their overall quality and stage.

      Thank you for your comment. We regret the oversight of omitting critical information in the manuscript. In the revised manuscript, we have included essential details such as mouse strains, culture media, stages of oocyte and statistical methods in the materials and methods section. Please find our details responses in the “Recommendations for the authors” part.

      Reviewer #2 (Public Review):

      Summary:

      This research involves conducting experiments to determine the role of Fmnl2 during oocyte meiosis I.

      Strengths:

      Identifying the role of Fmnl2 during oocyte meiosis I is significant.

      Weaknesses:

      The quantitative analysis and the used approach to perturb FMNL2 function are currently incomplete and would benefit from more confirmatory approaches and rigorous analysis.

      (1) Most of the results are expected. The new finding here is that FMNL2 regulates cytoplasmic F-actin in mouse oocytes, which is also expected given the role of FMNL2 in other cell types. Given that FMNL2 regulates cytoplasmic F-actin, it is very expected to see all the observed phenotypes. It is already established that F-actin is required for spindle migration to the oocyte cortex, extruding a small polar body and normal organelle distribution and functions.

      Thank you for your comment. In the recent decade, Arp2/3 complex (Nat Cell Biol 2011), Formin2 (Nat Cell Biol 2002, Nat Commun 2020), and Spire (Curr Biol 2011) were reported to be 3 key factors to involve into this process. These factors regulate actin filaments in different ways. However, how they cross with each other for the subcellular events were still fully clear. Our current study identified that FMNL2 played a critical role in coordinating these molecules for actin assembly in oocytes. Our findings demonstrate that FMNL2 interacts with both the Arp2/3 complex and Formin2 to facilitate actin-based meiotic spindle migration. Additionally, we discovered a novel role for FMNL2 in determining the distribution and function of the endoplasmic reticulum and mitochondria, which may in turn influence meiotic spindle migration in oocytes. Our results not only uncover the novel functions of FMNL2-mediated actin for organelle distribution, but also extend our understanding of the molecular basis for the unique meiotic spindle migration in oocyte meiosis.

      (2) The authors used Fmnl2 cRNA to rescue the effect of siRNA-mediated knockdown of Fmnl2. It is not clear how this works. It is expected that the siRNA will also target the exogenous cRNA construct (which should have the same sequence as endogenous Fmnl2) especially when both of them were injected at the same time. Is this construct mutated to be resistant to the siRNA?

      Thank you for your question. We regret any misunderstanding that may have been caused by the inappropriate description in our manuscript. In the rescue experiments, we initially injected FMNL2 siRNA into oocytes, followed by the microinjection of FMNL2 mRNA 18-20 hours later. After conducting our previous experiments, we have verified through Western blotting that endogenous FMNL2 is effectively suppressed 18-20 hours following the microinjection of FMNL2 siRNA. Additionally, we observed a significant increase in exogenous FMNL2 protein expression 2 hours after the injection of FMNL2 mRNA. We believe that the exogenous FMNL2 could compensate the decrease by FMNL2 knockdown, and this approach was adopted in many oocyte studies.

      (3) The authors used only one approach to knockdown FMNL2 which is by siRNA. Using an additional approach to inhibit FMNL2 would be beneficial to confirm that the effect of siRNA-mediated knockdown of FMNL2 is specific.

      Thank you for your question. Yes, the specificity is always the concern for siRNA or morpholino microinjection due to the off-target issue. Due to the limitation we could not generate the knock out model, and there are no known inhibitors with specific targeting capabilities for FMNL2. To solve this, we performed the rescue study with exogenous mRNA to confirm the effective knock down of FMNL2. These measures provide reassurance regarding the credibility of the experimental outcomes, and this is also the general way to avoid the off-target of siRNA or morpholino.

      Reviewer #3 (Public Review):

      Summary:

      The authors focus on the role of formin-like protein 2 in the mouse oocyte, which could play an important role in actin filament dynamics. The cytoskeleton is known to influence a number of cellular processes from transcription to cytokinesis. The results show that downregulation of FMNL2 affects spindle migration with resulting abnormalities in cytokinesis in oocyte meiosis I.

      Weaknesses:

      The overall description of methods and figures is overall dismissively poor. The description of the sample types and number of replicate experiments is impossible to interpret throughout, and the quantitative analysis methods are not adequately described. The number of data points presented is unconvincing and unlikely to support the conclusions. On the basis of the data presented, the conclusions appear to be preliminary, overstated, and therefore unconvincing.

      Thank you for your comment. We regret the oversight of omitting critical information in the manuscript. In the revised manuscript, we have incorporated your suggestions for modification, particularly regarding the Materials and Methods section. Please see the detailed revision and responses in the “Recommendations for the authors” part.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      My main concern is that there are missing important details of how particular experiments and analyses were done. The material and methods section is not written in the way that presented experiments could be repeated - it is missing basic information (e.g., used mouse strain, timepoints of oocytes harvest for particular experiments, used culture media, image acquisition parameters, etc.). Some of the presented data are overstated and incorrectly interpreted. It is not clear to me how the analysis of ER and mitochondria distribution was done, which is an important part of the presented data interpretation. I'm also missing important information about the timing of particular stages of assessed oocytes because the localization of both ER and mitochondria differs at different stages of oocyte meiosis. The data interpretation needs to be justified by proper analysis based on valid parameters, as there is considerable variability in the ER and mitochondria structure and localization across oocytes based on their overall quality and stage. My specific comments are listed below.

      (1) Information about statistical tests that were used needs to be provided for all quantification experiments.

      Thank you for your suggestion. Based on your suggestions, we revised the statistical analysis description in the Materials and Methods section. Additionally, we also included a description of the statistical methods in the legends of the relevant result figures.

      (2) I recommend replacing the plunger plots, used in most quantification data, with alternatives allowing evaluation of the distribution of the data (dot plots, box plots, whisker plots).

      Thank you for your suggestion. Following your suggestion, we replaced the plunger plots in Fig 2C, D, H, I and Fig3 B, C with dot plots.

      (3) Can the authors provide information about particular time points when were individual oocyte stages (GVBD, meiosis I, and meiosis II) harvested/used for immunofluorescence protein detection, western blotting, microinjection, and ER and mitochondria staining? Were the time points always the same in all presented experiments and experimental vs control group? If not, this needs to be clarified.

      Thank you for your suggestion. We used oocytes in the metaphase I (MI) stage for the statistical analysis of spindle migration, actin filament aggregation, endoplasmic reticulum localization, and mitochondrial localization. In the Western blot analysis, GV stage oocytes were utilized to evaluate the efficiency of knockdown and rescue experiments. The protein expression levels of Arp2, Formin2, INF2, Cofilin, Grp78, and Chop in different treatment groups were detected using MI-stage oocytes. In the revised version, we provided all the detailed information about the stages.

      (4) Figure 1B: Can the authors comment on why there is a missing representative image of MII oocyte FMBL2-Ab? I recommend including this in the figure to have a complete view of comparing overexpressed and endogenous FMNL2 localization in oocyte meiosis.

      Thank you for your suggestion. In the revised manuscript, we added immunostaining images of FMNL2 antibody in MII stage oocytes.

      (5) Figure 1C: The figure legend says, "FMNL2 and actin overlapped in cortex and spindle surrounding". In MI oocytes, there is usually no accumulated actin signal around the spindle, which is also true in the presented images, so there cannot be overlapping with the FMNL2 signal. The interpretation should be changed.

      We apologize for this inappropriate description that was used, and we deleted this sentence.

      (6) Figure 2B: What were the parameters of the "large" and "normal" polar bodies for performing the analysis?

      Thank you for your question. In order to assess the size of the polar body, we conducted a comparison between the diameter of the polar body and that of the oocyte. If the diameter of the polar body was found to be less than 1/3 of the oocyte's diameter, we categorized it as normal-sized polar body. Conversely, if the polar body's diameter exceeded 1/3 of the oocyte's diameter, we categorized it as a large polar body. We have included these details in the Results section of the manuscript.

      (7) Figure 2F: Can the authors comment on what can be the second band in the rescue group?

      Thank you for your question. In the rescue experiment, we microinjected exogenous FMNL2-EGFP mRNA into the oocytes. As a result, compared to endogenous FMNL2, the protein size increased due to the addition of the EGFP tag, approximately 27 kDa. Hence, in the Western blot bands of the rescue group, the upper band represents the expression of exogenous FMNL2-EGFP, while the lower band corresponds to the expression of endogenous FMNL2. We have provided annotations in the revised Figure 2F to clarify this.

      (8) Can the authors comment on the variability of PBE between 2C and 2H in the FMNL2-KD groups? In panel C, the PBE in the KD group was 59.5 {plus minus} 2.82%; in panel H, the PBE in the KD group was 48.34 {plus minus} 4.2%, and in the rescue group, the PBE was 62.62 {plus minus} 3.6%. The rescue group has a similar PBE rate as the KD group in panel C. How consistent was the FMNL2 knockdown across individual replicates? Can the authors provide more details on how the rescue experiment was performed?

      Thank you for your question. We believe that the difference in PBE observed in Figure 2C and 2H of the FMNL2-KD group was due to the microinjection times and the duration of in vitro arrest. The results shown in Figure 2C depict the outcome of a single injection of FMNL2 siRNA into GV stage oocytes, followed by 18 hours of in vitro arrest; the results shown in Figure 2H contain a subsequent additional injection of FMNL2-EGFP mRNA with another 2 hours of arrest. The two rounds of microinjection and the extended period of in vitro arrest both affect oocyte maturation rates.

      (9). Figure 2J and K: What groups were compared together? The used statistic needs to be properly described.

      Thank you for your question. The FMNL2-KD, FMNL3-KD, and FMNL2+3-KD groups were all compared to the Control group, therefore, t-test was used for analysis. We have provided explanations in the revised manuscript.

      (10) Figure 4B and C: Can the authors provide representative images without oversaturated actine signal?

      Thank you for your question. For the analysis of oocyte F-actin, the F-actin are divided into cortex actin and cytoplasmic actin. Due to the contrast during imaging, the strong cortex actin signals affected the detection of cytoplasmic actin, therefore, it is necessary to increase the scanning index, which will cause the overexpose the cortex actin signal. This is for the better observation of the cytoplasmic signals.

      (11) Figure 4G + 5H: Can the authors comment on why they used as a housekeeping gene actin instead of tubulin, which was used in the rest of the WB experiments?

      Thank you for your question. In most of the western blot experiments conducted in this study, we used tubulin as a housekeeping gene. However, due to the supply of antibodies by delivery period, we had GAPDH and actin as well for some experiments. These housekeeping genes were all valid for the study.

      (12) Based on what parameters was ER considered normally or abnormally distributed, and what stages of oocytes were assessed?

      Thank you for your question. In this study, we employed oocytes at the MI stage for the analysis of ER localization. In the MI stage, the ER localized around the spindle, which is regarded as the typical localization pattern. The ER displayed a dispersed distribution throughout the cytoplasm or clustered were categorized as aberrant positioning. We included relevant descriptions in the revised version of the manuscript.

      (13) Figure 5H: As a housekeeping gene was used actin - the quantification is labeled as a Grp78 to tubulin ratio.

      Thank you for pointing out the error. This is a label mistake and we corrected it.

      (14) Information about how JC-1 staining was done needs to be provided.

      Thank you for your carefully reading. We included a description of JC1 staining in the Materials and Methods section.

      (15). Line 231-232: "As shown in Figure 4A" - the text doesn't correspond to the figure.

      Thank you for pointing out the error. We revised this mistake in the revised manuscript by correcting "Fig3A" to "Fig4A."

      (16) Line 265: there is probably a missing word "Formin2".

      Thank you and we corrected the error and made the necessary changes in the revised manuscript.

      Reviewer #2 (Recommendations for The Authors):

      (1) Quantification and analysis:

      • Fig. 3B: The rate of spindle migration should be quantified based on the distance from the spindle to the cortex. Also, the orientation of the spindle (Z-position) needs to be taken into consideration.

      • Fig. 5C, D: It is unclear how the rate of ER distribution was calculated.

      • Western blot: In many experiments (such as Fig. 5H), the bands are saturated which will prevent accurate intensity measurements and quantifications.

      For spindle migration, we specifically focused on spindles exhibiting a distinctive spindle-like shape with clear bipolarity to eliminate any statistical discrepancies potentially caused by variations in Z-axis alignment. Our criterion for determining successful migration was based on the contact between the spindle pole and the cortical region of the oocyte. Therefore, we think that the rate is better to reflect the phenotype than the distance.

      For the examination of ER localization, Reviewer 1 also raised this issue. We utilized oocytes at the MI stage in this study. The ER localized around the spindle in MI stage. The ER displayed a dispersed distribution throughout the cytoplasm or clustered were categorized as aberrant positioning. We included relevant descriptions in the revised version of the manuscript.

      For the bands of the western blot results, during the experimental procedure we typically capture multiple images at different exposure levels (3-5 images). In the revised manuscript, we have replaced the inappropriate images with more suitable ones.

      (2) Given that all Immunoprecipitation experiments in this manuscript were performed on the whole ovary which contains more somatic cells than oocytes, the results do not necessarily reflect meiotic oocytes. Please consider this possibility during the interpretation.

      Thank you for your suggestion. Yes, we agree with you. In the revised manuscript, we made appropriate modifications to the relevant descriptions.

      (3) 351-365: The conclusion that Arp2/3 compensates for the decreased formin 2 in FMNL2 knockdown oocytes is a bit unconvincing. 1- In mouse oocytes, it is already known that Arp2/3 and formin 2 regulate different pools of F-actin nucleation. 2- The authors found an increase in Arp2/3 in FMNL2 knockdown oocytes compared to control oocytes without any change in cortical F-actin. Given that Arp2/3 is primarily promoting cortical F-actin, it is expected to see an increase in cortical F-actin in FMNL2 knockdown oocytes, which was not the case.

      Thank you for your question. Yes, previous studies showed that formin2 localizes to the cytoplasm of oocytes and accumulates around the spindle, which facilitate cytoplasmic actin assembly. While Arp2/3 is primarily responsible for actin assembly at the cortex region of oocytes. In invasive cells, FMNL2 is mainly localized in the leading edge of the cell, lamellipodia and filopodia tips, to improve cell migration ability by actin-based manner (Curr Biol 2012). We showed that FMNL2 localized both at spindle periphery and cortex, but depletion of FMNL2 did not affect cortex actin intensity. We think that FMNL2 and Arp2/3 both contribute to the cortex actin dynamics, when FMNL2 decreased, ARP2 increased to compensate for this, which maintained the cortex actin level. In the revised manuscript, we have made modifications to avoid excessive extrapolation from our results, ensuring that our conclusions are presented in a more objective manner.

      (4) Lines 195-197: The spindle is initially formed soon after the GVBD, so there is no spindle during GVBD. Also, I can't see oocytes at anaphase I or telophase I in this figure. Please revise.

      Thank you for your suggestion. We apologize for the inappropriate descriptions that were used. In the revised manuscript, we have made modifications to the respective descriptions in the Results part.

      (5) Fig. 2E: It seems that the control oocyte is abnormal with mild cytokinesis defects. Please replace or delete it since this information is already included in Fig. 3A.

      Thank you for your suggestion. Based on our observations, during the extrusion of the first polar body in oocytes, there is a temporary occurrence of cellular morphological fragmentation due to cortical reorganization (11h in control oocyte from Fig 2E). However, after the extrusion of the first polar body, the oocyte morphology returns to normal. Figure 2E illustrates the meiotic division process of oocytes, while Figure 3A primarily focuses on the process of oocyte spindle migration. We think that it is better to retain both to present our results.

      Reviewer #3 (Recommendations for The Authors):

      In the case of the observed phenotype, the stage of GV is important. The phenotypes presented also occur in meiotic or developmentally incompetent oocytes. In addition, the images of GV oocytes appear as NSN, which also show the KD phenotype in Figs. 2 and 3.

      Thank you for your concern. As the oocyte grows, the proportion of SN-type oocytes gradually increases. When the oocyte diameter reaches 70-80 μm, the proportion of SN oocytes is approximately 52.7% (Mol Reprod Dev. 1995). In our study, both the control and knockdown groups collected oocytes with a diameter of around 80 μm, which is considered as fully-grown oocytes, predominantly in the SN phase. Since the collection period and size of the oocytes were consistent, we can sure that the observed differences between the control and knockdown groups in phenotype analysis could be solid and reliable.

      MII is absent in Fig. 1B.

      In the revised manuscript, we added immunostaining images of FMNL2 in MII stage oocytes.

      The result of KD is not convincing. Also, discuss whether the heterozygous effect of Fmnl2 deletion affects reproductive fitness.

      Thank you for your concern. In our investigation, limited to the setup of knock out model, we employed siRNA to knockdown FMNL2 expression, to avoid the risk of off-target, we performed rescue experiment with exogenous mRNA, which we believe that it could solve this issue. When designing siRNA sequences, we ensured their specificity for binding to FMNL2 mRNA only, and we assessed the levels of FMNL2 and FMNL3 mRNA in oocytes after injection of FMNL2 siRNA. The results showed that, compared to the control group, the expression of FMNL2 mRNA decreased by approximately 70% after 18 hours of FMNL2 siRNA injection, while the level of FMNL3 mRNA was not decreased.

      Fig. 2F rescue experiment with double bands. What bands are seen here? Did the authors inject tagged or untagged FMNL2? Or does endogenous FMNL2 appear higher in the sample after KD?

      Thank you for your question. In the rescue experiment, we microinjected exogenous FMNL2-EGFP mRNA into the oocytes. As a result, compared to endogenous FMNL2, the protein size increased due to the addition of the EGFP tag, approximately 27 kDa. Hence, in the Western blot bands of the rescue group, the upper band represents the expression of exogenous FMNL2-EGFP, while the lower band corresponds to the expression of endogenous FMNL2. We provided annotations in the revised Figure 2F to clarify this.

      Variability in mitochondria and ER distribution patterns is also known in healthy and developing oocytes, although the authors described only a single phenotype.

      Thank you for your concern. Yes, mitochondria and ER show dynamic localization in different stage of oocyte maturation. However, in this study we employed oocyte MI stage for the analysis of ER and mitochondria localization, and in MI stage, both the ER and mitochondria localize around the spindle. This pattern is considered as the normal localization. Several studies showed that dispersed or clustered localization contributed to maturation defects. We included relevant descriptions in the revised manuscript.

      What exactly is meant by input in the IP experiments? Why is the target missing in the input sample?

      Thank you for your question. We subjected the input samples to electrophoresis on a single channel, all the analyzed proteins demonstrated normal expression, thereby confirming the viability of the input sample. However, upon simultaneous exposure with the IP samples, we observed a lack of clear signal for certain proteins in the input group. This phenomenon is due to the excessive signal intensity resulting from protein enrichment in the IP group, which caused the low exposure of proteins in input group.

      Explain the rationale for using, actin or tubulin as loading or normalization controls in the study focusing on the cytoskeleton.

      Thank you for your question. Actin and tubulin are both widely used as the control due to their stable expression. For actin, there are α-actin and β-actin isoforms. Formins and Arp2/3 complex regulate the polymerization of α-actin and β-actin to form F-actin, not isoform expression. In our study F-actin (the functional type) was examined. While α-tubulin and β-tubulin are two subtypes of tubulin, and they interact with each other to form stable α/β-tubulin heterodimers. The changes of cytoskeleton dynamics could not change the expression of α/β-tubulin. Therefore, β-actin and α-tubulin could be used as normalization controls.

      Fig. 6E shows only , but the legend says *.

      Thank you for pointing out the error. We correct the mistake in the revised manuscript.

      Spindle positioning appears to differ between control and KD. Does this affect the quantification of Fig. 6F? Adequate nomenclature should be used here.

      Thank you for your question. Yes, spindle positioning was affected by FMNL2 depletion. However, central spindle or cortex spindle all belong to MI stage, and JC1 is not related with the stage difference. To avoid misunderstanding we replaced the representative images and corresponding description in Figure 6F.

      The description of the methods and legends should be significantly improved.

      Thank you for your suggestion. Reviewer 1 and 2 also raised the similar concern. We enriched the description of methods and legends in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their thoughtful comments. We were pleased that they thought our study was "well crafted and written", "important", and that it provides a "valuable resource for researchers studying color vision". They also expressed several constructive criticisms, concerning – among other things – the lack of details regarding experimental procedures and analysis, the challenge in relating retinal data to cortical recordings, and consistency of results across animals. In response to the reviewers’ comments and following their suggestions, we performed additional analyses, and substantially revised the paper:

      We added a section in the Discussion about "Limitations of the stimulus paradigm". In addition, we added a new Suppl. Figure that illustrates the effect of deconvolution of calcium traces on our results and clarified in the text why we use deconvolved signals for all analyses. The new Suppl. Figure also shows an additional analysis with a more conservative threshold of neuron exclusion.

      We now clarify how retinal signaling relates to our cortical results and rewrote the text to be more conservative regarding our conclusions.

      In addition, we added a new Suppl. Figure showing the key analyses from Figures 2 and 4 separately for each animal. We now mention consistency across animals in the Results section and clearly state which analyses were performed an data pooled across animals.

      We are positive that these additions address the issues raised by the reviewers. Please find our point-by-point replies to all comments below.

      eLife assessment

      Franke et al. explore and characterize the color response properties in the mouse primary visual cortex, revealing specific color opponent encoding strategies across the visual field. The data is solid; however, the evidence supporting some conclusions and details about some procedures are incomplete. In its current form, the paper makes a useful contribution to how color is coded in mouse V1. Significance would be enhanced with some additional analyses and resolution of some technical issues.

      We thank the reviewers for appreciating our manuscript and their thoughtful comments.

      Referee 1 (Remarks to the Author):

      Summary:

      In this study, Franke et al. explore and characterize the color response properties across the primary visual cortex, revealing specific color opponent encoding strategies across the visual field. The authors use awake-behaving 2P imaging to define the spectral response properties of visual interneurons in Layer 2/3. They find that opponent responses are more prominent at photopic light levels, and diversity in color opponent responses exists across the visual science, with green ON/ UV OFF responses being stronger represented in the upper visual field. This is argued to be relevant for detecting certain features that are more salient when the chromatic space is used, possibly due to noise reductions.

      Strengths:

      The work is well crafted and written and provides a thorough characterization that reveals an uncharacterized diversity of visual properties in V1. I find this characterization important because it reveals how strongly chromatic information can modulate the response properties in V1. In the upper visual field, 25% of the cells differentially relay chromatic information, and one may wonder how this information will be integrated and subsequently used to aid vision beyond the detection of color per see. I personally like the last paragraph of the discussion that highlights this fact.

      We thank the reviewer for appreciating our manuscript.

      Weaknesses: One major point highlighted in this paper is the fact that Green ON/UV OFF responses are not generated in the retina. But glancing through the literature, I saw this is not necessarily true. Fig 1. of Joesch and Meister, a paper cited, shows this can be the case. Thus, I would not emphasize that this wasn’t present in the retina. This is a minor point, but even if the retina could not generate these signals, I would be surprised if the diversity of responses would only arise through feed-forward excitation, given the intricacies of cortical connectivity. Thus, I would argue that the argument holds for most of the responses seen in V1; they need to be further processed by cortical circuitries.

      We thank the reviewer for this comment. When analyzing available data from the retina using a similar center-surround color flicker stimulus (Szatko et al. 2020), we found that Green On/UV Off color opponency is very rare in the RF center of retinal ganglion cells (Suppl. Fig. 5). This suggests that center Green On/UV Off color opponency in V1 neurons is not inherited by the RF center of retinal neurons. However, we agree with the reviewer that retinal neurons might still contribute to V1 color opponency, for example by being center-surround color opponent (e.g. Joesch et al. 2016 and Szatko et al. 2020). We rephrased the text to acknowledge this fact.

      This takes me to my second point, defining center and surround. The center spot is 37.5 deg of visual angle, more than 1 mm of the retinal surface. That means that all retinal cells, at least half and most likely all of their surrounds will also be activated. Although 37.5 deg is roughly the receptive field size previously determined for V1 neurons, the one-to-one comparison with retinal recording, particularly with their center/surround properties, is difficult. This should be discussed. I assume that the authors tried a similar approach with sparse or dense checker white noise stimuli. If so, it would be interesting if there were better ways of defining the properties of V1 neurons on their complex/simple receptive field properties to define how much of their responses are due to an activation of the true "center" or a coactivation of the surround. Interestingly, at least some of the cells (Fig. 1d, cells 2 and 5) don’t have a surround. Could it be that in these cases, the "center" and "surround" are being excited together? How different would the overall statistics change if one used a full-filed flicker stimulus instead of a center/surround stimulus? How stable are the results if the center/surround flicker stimulus is shifted? These results won’t change the fact that chromatic coding is present in the VC and that there are clear differences depending on their position, but it might change the interpretation. Thus, I would encourage you to test these differences and discuss them.

      Thanks for this comment. We agree with the reviewer that a one-to-one comparison of retina and V1 data is challenging, due to differences in both RF and stimulus size. We rephrased the Results text to clarify this point and now also mention it in the Discussion.

      To be able to record from many V1 neurons simultaneously, we used a stimulus size of 37.5 degree visual angle in diameter, which is slightly larger than center RFs of single V1 neurons. As the reviewer mentions, the disadvantage of this approach is that the stimulus is only roughly centered on the neurons’ center RFs. To reduce the impact of potential stimulus misalignment on our results, we used the following steps:

      For each recording, we positioned the monitor such that the mean RF across all neurons lies within the center of the stimulus field of view.

      We confirmed that this procedure results in good stimulus alignment for the large majority of recorded neurons within individual recording fields by using a sparse noise stimulus (Suppl. Fig. 1a-c). Specifically, we found that for 83% of tested neurons, more than two thirds of their center RF, determined by the sparse noise stimulus, overlapped with the center spot of the color noise stimulus.

      For analysis, we excluded neurons without a significant center STA, which may be caused by misalignment of the stimulus.

      Together, we believe these points strongly suggest that the center spot and the surround annulus of the noise stimulus predominantly drive center (i.e. classical RF) and surround (i.e. extraclassical RF), respectively, of the recorded V1 neurons. This is further supported by the fact that color response types identified using an automated clustering method were robust across mice (Suppl. Fig. 6c), indicating consistent stimulus centering.

      Nevertheless, we cannot exclude that the stimulus was misaligned for a subset of the recorded neurons used for analysis. We agree with the reviewer that such misalignment might have contributed to cells not having surround STAs, due to simultaneous activation of antagonistic center and surround RF components by the surround stimulus. While a full-field stimulus would get rid of the misalignment problem, it would not allow to study color tuning in center and surround RF components separately. Instead, one could compare the results of our approach with an approach that centers the stimulus on individual neurons. However, we believe that performing these additional experiments is out of the scope of the current study.

      To acknowledge the experimental limitations of our study and the concerns brought up by the reviewer, we now explicitly mention the steps we perform to reduce the effects of stimulus misalignment in the Results section and discuss the problem of stimulus alignment in the Discussion. We believe these changes will help the reader to interpret our results.

      Referee 2 (Remarks to the Author):

      Summary:

      Franke et al. characterize the representation of color in the primary visual cortex of mice and how it changes across the visual field, with a particular focus on how this may influence the ability to detect aerial predators. Using calcium imaging in awake, head-fixed mice, they characterize the properties of V1 neurons (layer 2/3) using a large center-surround stimulation where green and ultra-violet were presented in random combinations. Using a clustering approach, a set of functional cell-types were identified based on their preference to different combinations of green and UV in their center and surround. These functional types were demonstrated to have varying spatial distributions in V1, including one neuronal type (Green-ON/UV-OFF) that was much more prominent in the posterior V1 (i.e. upper visual field). Modelling work suggests that these neurons likely support the detection of predator-like objects in the sky.

      Strengths:

      The large-scale single-cell resolution imaging used in this work allows the authors to map the responses of individual neurons across large regions of the visual cortex. Combining this large dataset with clustering analysis enabled the authors to group V1 neurons into distinct functional cell types and demonstrate their relative distribution in the upper and lower visual fields. Modelling work demonstrated the different capacity of each functional type to detect objects in the sky, providing insight into the ethological relevance of color opponent neurons in V1.

      We thank the reviewer for appreciating our manuscript.

      Weaknesses:

      While the study presents solid evidence a few weaknesses exist, including the size of the dataset, clarity regarding details of data included in each step of the analysis and discussion of caveats of the work. The results presented here are based on recordings of 3 mice. While the number of neurons recorded is reasonably large (n > 3000) an analysis that tests for consistency across animals is missing. Related to this, it is unclear how many neurons at each stage of the analysis come from the 3 different mice (except for Suppl. Fig 4).

      Thank you for this comment. We apologize that the original manuscript did not clearly indicate the consistency of our results across animals. We have revised the manuscript in the following ways:

      We have added an additional Suppl. Figure, which shows the variability of the data within and across animals (Suppl. Fig. 4). Specifically, we show the distribution of color and luminance selectivity for (i) center and surround components of V1 RFs and (ii) for upper and lower visual field. This data is used for all analyses shown in Figures 2-4. The figure legend of this figure also states the number of neurons per animal.

      We now clearly state in the Results section that all analyses in the main figures were performed by pooling data across animals, and refer to the Suppl. Figures for consistency across animals.

      We believe these changes help the reader to interpret our results.

      Finally, the paper would greatly benefit from a more in depth discussion of the caveats related to the conclusion drawn at each stage of the analysis. This is particularly relevant regarding the caveats related to using spike triggered averages to assess the response preferences of ON-OFF neurons, and the conclusions drawn about the contribution of retinal color opponency.

      Thanks. We substantially revised the text to discuss caveats and limitations of the approach. For example, we added a section into the Discussion called "Limitations of the stimulus paradigm". In addition, we clarified how retinal signals relate to cortical ones and phrased our conclusions more conservatively.

      The authors provide solid evidence to support an asymmetric distribution of color opponent cells in V1 and a reduced color contrast representation in lower light levels. Some statements would benefit from more direct evidence such as the integration of upstream visual signals for color opponency in V1.

      Based on the comments from Reviewer 1, we have rephrased the statements regarding the integration of upstream visual signals for color opponency in V1. We think these revisions increase the clarity of the results and help the reader with interpretation.

      Overall, this study will be a valuable resource for researchers studying color vision, cortical processing, and the processing of ethologically relevant information. It provides a useful basis for future work on the origin of color opponency in V1 and its ethological relevance.

      Thanks! We thank the reviewer again for the helpful comments.

      Referee 3 (Remarks to the Author):

      This paper studies chromatic coding in mouse primary visual cortex. Calcium responses of a large collection of cells are measured in response to a simple spot stimulus. These responses are used to estimate chromatic tuning properties - specifically sensitivity to UV and green stimuli presented in a large central spot or a larger still surrounding region. Cells are divided based on their responses to these stimuli into luminance or chromatic sensitive groups. Several technical concerns limit how clearly the data support the conclusions. If these issues can be fixed, the paper would make a valuable contribution to how color is coded in mouse V1.

      We thank the reviewer for the helpful comments.

      Analysis: The central tool used to analyze the data is a "spike triggered average" of the responses to randomly varying stimuli. There are several steps in this analysis that are not documented, and hence evaluating how well it works is difficult. Central to this is that the paper does not measure spikes. Instead, measured calcium traces are converted to estimated spike rates, which are then used to estimate STAs. There are no raw calcium traces shown, and the approach to estimate spike rates is not described in any detail. Confirming the accuracy of these steps is essential for a reader to be able to evaluate the paper. Further, it is not clear why the linear filters connecting the recorded calcium traces and the stimulus cannot be estimated directly, without the intermediate step of estimating spike rates.

      Thank you for this comment. We have used the genetically encoded calcium sensor GCaMP6s in our recordings. This sensor is a very sensitive GCaMP6 variant, but also one with slow kinetics. To remove the effect of the slow sensor kinetics from recorded calcium responses, the recorded traces are commonly deconvolved with the impulse function of the sensor to obtain the deconvolved calcium traces. We now include this reasoning in the Results section. To illustrate the effect of the deconvolution, we added a new Suppl. Figure (Suppl. Fig. 2) showing raw calcium and deconvolved traces, and the STAs estimated from both types of traces. This illustrates that the results regarding neuronal color preferences are consistent across raw and deconvolved calcium traces.

      We agree with the reviewer that the term STA might be confusing. We have replaced it with the term "even-triggered-average" (ETA). In addition, we have replaced the phrase "estimated spike rate" with "deconvolved calcium trace" throughout the manuscript because the unit of the deconvolved traces is not interpretable, like spike rate would be (spikes per second). In the revised version, we now clarify in the Methods section that we estimate the ETAs based on deconvolved calcium traces, which is correlated with and an approximation for spike rate.

      A further issue about the STAs is that the inclusion criterion (correlation of predicted vs measured responses of 0.25) is pretty forgiving. It would be helpful to see a distribution of those correlation values, and some control analyses to check whether the STA is providing a sufficiently accurate measure to support the results (e.g. do the central results hold for the cells with the highest correlations).

      We thank the reviewer for this comment. To exclude noisy neurons from analysis, we used the following procedure:

      For each of the four stimulus conditions (center and surround for green and UV stimuli), kernel quality was measured by comparing the variance of the STA with the variance of the baseline, defined as the first 500 ms of the STA. Only cells with at least 10-times more variance of the kernel compared to baseline for UV or green center STA were considered for further analysis.

      We have added the distribution of quality values to a new Suppl. Figure (Suppl. Fig. 2d,e). We now also show the percentage of neurons above threshold, given different quality thresholds. Finally, we have repeated the analysis shown in Figure 2 for a much more conservative threshold, including only the top 25% of neurons (Suppl. Fig. 2e,f). We now mention this new analysis in the Methods and Results section.

      Limitations of stimulus choice: The paper relies on responses to a large (37.5 degree diameter) modulated spot and surrounding region. This spot is considerably larger than the receptive fields of both V1 cells and retinal ganglion cells. As a result, the spot itself is very likely to strongly activate both center and surround mechanisms, and responses of cells are likely to depend on where the receptive fields are located within the spot (and, e.g., how much of the true neural surround samples the center spot vs the surround region). The impact of these issues on the conclusions is considered briefly at the start of the results but needs to be evaluated in considerably more detail. This is particularly true for retinal ganglion cells given the size of their receptive fields (see also next point).

      We agree with the reviewer that the centering of the stimulus is critical and apologize if this point was not discussed sufficiently. To be able to record from many V1 neurons simultaneously, we used a stimulus size of 37.5 degree visual angle in diameter, which is slightly larger than center RFs of single V1 neurons. As the reviewer mentions, the disadvantage of this approach is that the stimulus is only roughly centered on the neurons’ center RFs. To reduce the impact of potential stimulus misalignment on our results, we have used different experimental and analysis steps and controls (see also second comment of Reviewer 1):

      For each recording, we positioned the monitor such that the mean RF across all neurons lies within the center of the stimulus field of view.

      We confirmed that this procedure results in good stimulus alignment for the large majority of recorded neurons within individual recording fields by using a sparse noise stimulus (Suppl. Fig. 1a-c). Specifically, we found that for 83% of tested neurons, more than two thirds of their center RF, determined by the sparse noise stimulus, overlapped with the center spot of the color noise stimulus.

      For analysis, we excluded neurons without a significant center STA, which may be caused by misalignment of the stimulus.

      We now mention those clearly in the Results section and added the limitations of our approach to the Discussion section.

      Comparison with retina: A key conclusion of the paper is that the chromatic tuning in V1 is not inherited from retinal ganglion cells. This conclusion comes from comparing chromatic tuning in a previously-collected data set from retina with the present results. But the retina recordings were made using a considerably smaller spot, and hence it is not clear that the comparison made in the paper is accurate. This issue may be handled by the analysis presented in the paper, but if so it needs to be described more clearly. The paper from which the retina data is taken argues that rod-cone chromatic opponency originates largely in the outer retina. This mechanism would be expected to be shared across retinal outputs. Thus it is not clear how the Green-On/UV-Off vs Green-Off/UV-On asymmetry could originate. This should be discussed.

      We agree with the reviewer that a one-to-one comparison of retina and V1 data is challenging, due to differences in both RF and stimulus size. We rephrased the Results text to clarify this point and now also mention it in the Discussion.

      When analyzing available data from the retina using a similar center-surround color flicker stimulus (Szatko et al. 2020), we found that Green On/UV Off color opponency is very rare in the RF center of retinal ganglion cells (Suppl. Fig. 5). This suggests that center Green On/UV Off color opponency in V1 neurons is not inherited by the RF center of retinal neurons. However, we agree with the reviewer that retinal neurons might still contribute to V1 color opponency, for example by being center-surround color opponent (e.g. Joesch et al. 2016 and Szatko et al. 2020). We rephrased the text to acknowledge this fact.

      Residual chromatic cells at low mesopic light levels The presence of chromatically tuned cells at the lowest light level probed is surprising. The authors describe these conditions as rod-dominated, in which case chromatic tuning should not be possible. This again is discussed only briefly. It either reflects the presence of an unexpected pathway that amplifies weak cone signals under low mesopic conditions such that they can create spectral opponency or something amiss in the calibrations or analysis. Data collected at still lower light levels would help resolve this.

      Thank you for this comment. We call the lowest light level "low mesopic" and "rod-dominated" because the spectral contrast of V1 center responses in posterior recording fields is green-shifted for this light level (Fig. 3a). This is only expected if responses in the UV-cone dominant ventral retina are predominantly driven by rod photoreceptors. We now explain this rationale in the Results section. In addition, we mention in the Discussion that future studies are required to test whether cone signals need to be amplified for low light levels. While we agree with the reviewer that it would be exciting to use even lower light levels during recordings, we believe this is out of the scope of the current study due to the technical challenges involved in achieving scotopic stimulation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      As you will see, the main changes in the revised manuscript pertain to the structure and content of the introduction. Specifically, we have tried to more clearly introduce our paradigm, the rationale behind the paradigm, why it is different from learning paradigms, and why we study “relief”.

      In this rebuttal letter, we will go over the reviewers’ comments one-by-one and highlight how we have adapted our manuscript accordingly. However, because one concern was raised by all reviewers, we will start with an in-depth discussion of this concern.

      The shared concern pertained to the validity of the EVA task as a model to study threat omission responses. Specifically, all reviewers questioned the effectivity of our so-called “inaccurate”, “false” or “ruse” instructions in triggering an equivalent level of shock expectancy, and relatedly, how this effectivity was affected by dynamic learning over the course of the task.

      We want to thank the reviewers for raising this important issue. Indeed, it is a vital part of our design and it therefore deserves considerable attention. It is now clear to us that in the previous version of the manuscript we may have focused too little on why we moved away from a learning paradigm, and how we made sure that the instructions were successful at raising the necessary expectations; and how the instructions were affected by learning. We believe this has resulted in some misunderstandings, which consequently may have cast doubts on our results. In the following sections, we will go into these issues.

      The rationale behind our instructed design

      The main aim of our study was to investigate brain responses to unexpected omissions of threat in greater detail by examining their similarity to the reward prediction error axioms (Caplin & Dean, 2008), and exploring the link with subjective relief. Specifically, we hypothesized that omission-related responses should be dependent on the probability and the intensity of the expected-but-omitted aversive event (i.e., electrical stimulation), meaning that the response should be larger when the expected stimulation was stronger and more expected, and that fully predicted outcomes should not trigger a difference in responding.

      To this end, we required that participants had varying levels of threat probability and intensity predictions, and that these predictions would most of the time be violated. Although we fully agree with the reviewers that fear conditioning and extinction paradigms can provide an excellent way to track the teaching properties of prediction error responses (i.e., how they are used to update expectancies on future trials), we argued that they are less suited to create the varying probability and intensity-related conditions we required (see Willems & Vervliet, 2021). Specifically, in a standard conditioning task participants generally learn fast, rendering relatively few trials on which the prediction is violated. As a result, there is generally little intraindividual variability in the prediction error responses. This precludes an in-depth analysis of the probability-related effects. Furthermore, conditioning paradigms generally only include one level of aversive outcome: the electrical stimulation is either delivered or omitted. As a result, intensity-related effects cannot be tested. Finally, because CS-US contingencies change over the course of a fear conditioning and extinction study (e.g. from acquisition to extinction), there is never complete certainty about when the US will (not) follow. This precludes a direct comparison of fully predicted outcomes.

      Another added value of studying responses to the prediction error at threat omission outside a learning context is that it can offer a way to disentangle responses to the violation of threat expectancy, with those of subsequent expectancy updating.

      Also note that Rutledge and colleagues (2010), who were the first to show that human fMRI responses in the Nucleus Accumbens comply to the reward prediction error axioms also did not use learning experiences to induce expectancy. In that sense, we argued it was not necessary to adopt a learning paradigm to study threat omission responses.

      Adaptations in the revised manuscript: We included two new paragraphs in the introduction of the revised manuscript to elaborate on why we opted not to use a learning paradigm in the present study (lines 90-112).

      “However, is a correlation with the theoretical PE over time sufficient for neural activations/relief to be classified as a PE-signal? In the context of reward, Caplin and colleagues proposed three necessary and sufficient criteria all PE-signals should comply to, independent of the exact operationalizations of expectancy and reward (the socalled axiomatic approach24,25; which has also been applied to aversive PE26–28). Specifically, the magnitude of a PE signal should: (1) be positively related to the magnitude of the reward (larger rewards trigger larger PEs); (2) be negatively related to likelihood of the reward (more probable rewards trigger smaller PEs); and (3) not differentiate between fully predicted outcomes of different magnitudes (if there is no error in prediction, there should be no difference in the PE signal).”

      “It is evident that fear conditioning and extinction paradigms have been invaluable for studying the role of the threat omission PE within a learning context. However, these paradigms are not tailored to create the varying intensity and probability-related conditions that are required to evaluate the threat omission PE in the light of the PE axioms. First, conditioning paradigms generally only include one level of aversive outcome: the electrical stimulation is either delivered or omitted. As a result, the magnitude-related axiom cannot be tested. Second, in conditioning tasks people generally learn fast, rendering relatively few trials on which the prediction is violated. As a result, there is generally little intra-individual variability in the PE responses. Moreover, because of the relatively low signal to noise ratio in fMRI measures, fear extinction studies often pool across trials to compare omission-related activity between early and late extinction16, which further reduces the necessary variability to properly evaluate the probability axiom. Third, because CS-US contingencies change over the course of the task (e.g. from acquisition to extinction), there is never complete certainty about whether the US will (not) follow. This precludes a direct comparison of fully predicted outcomes. Finally, within a learning context, it remains unclear whether PErelated responses are in fact responses to the violation of expectancy itself, or whether they are the result of subsequent expectancy updating.”

      Can verbal instructions be used to raise the expectancy of shock?

      The most straightforward way to obtain sufficient variability in both probability and intensityrelated predictions is by directly providing participants with instructions on the probability and intensity of the electrical stimulation. In a previous behavioral study, we have shown that omission responses (self-reported relief and omission SCR) indeed varied with these instructions (Willems & Vervliet, 2021). In addition, the manipulation checks that are reported in the supplemental material provided further support that the verbal instructions were effective at raising the associated expectancy of stimulation. Specifically, participants recollected having received more stimulations after higher probability instructions (see Supplemental Figure 2). Furthermore, we found that anticipatory SCR, which we used as a proxy of fearful expectation, increased with increasing probability and intensity (see Supplemental Figure 3). This suggests that it is not necessary to have expectation based on previous experience if we want to evaluate threat omission responses in the light of the prediction error axioms.

      Adaptations in the revised manuscript: We more clearly referred to the manipulation checks that are presented in the supplementary material in the results section of the main paper (lines 135-141).

      “The verbal instructions were effective at raising the expectation of receiving the electrical stimulation in line with the provided probability and intensity levels. Anticipatory SCR, which we used as a proxy of fearful expectation, increased as a function of the probability and intensity instructions (see Supplementary Figure 3). Accordingly, post-experimental questions revealed that by the end of the experiment participants recollected having received more stimulations after higher probability instructions, and were willing to exert more effort to prevent stronger hypothetical stimulations (see Supplementary Figure 2).”

      How did the inconsistency between the instructed and experienced probability impact our results?

      All reviewers questioned how the inconsistency between the instructed and experienced probability might have impacted the probability-related results. However, judging from the way the comments were framed, it seems that part of the concern was based on a misunderstanding of the design we employed. Specifically, reviewer 1 mentions that “To ensure that the number of omissions is similar across conditions, the task employs inaccurate verbal instructions; I.e., 25% of shocks are omitted regardless of whether subjects are told that the probability is 100%, 75%, 50%, 25%, 0%.”, and reviewer 3 states that “... the fact remains that they do not get shocks outside of the 100% probability shock. So learning is occurring, at least for subjects who realize the probability cue is actually a ruse.” We want to emphasize that this was not what we did, and if it were true, we fully agree with the reviewers that it would have caused serious trust- and learning related issues, given that it would be immediately evident to participants that probability instructions were false. It is clear that under such circumstances, dynamic learning would be a big issue.

      However, in our task 0% and 100% instructions were always accurate. This means that participants never received a stimulus following 0% instructions and always received the stimulation of the given intensity on the 100% instructions (see Supplemental Figure 1 for an overview of the trial types). Only for the 25%, 50% and 75% trials an equal reinforcement rate (25%) was maintained, meaning that the stimulation followed in 25% of the trials, irrespective of whether a 25%, 50% or 75% instruction was given. The reason for this was that we wanted to maximize and balance the number of omission trials across the different probability levels, while also keeping the total number of presentations per probability instruction constant. We reasoned that equating the reinforcement rate across the 25%, 50% and 75% instructions should not be detrimental, because (1) in these trials there was always the possibility that a stimulation would follow; and (2) we instructed the participants that each trial is independent of the previous ones, which should have discouraged them to actively count the number of shocks in order to predict future shocks.

      Adaptations in the revised manuscript: We have tried to further clarify the design in several sections of the manuscript, including the introduction (lines 121-125), results (line 220) and methods (lines 478-484) sections:

      Adaptation in the Introduction section: “Specifically, participants received trial-by-trial instructions about the probability (0%, 25%, 50%, 75% and 100%) and intensity (weak, moderate, strong) of a potentially painful upcoming electrical stimulation, time-locked by a countdown clock (see Fig.1A). While stimulations were always delivered on 100% trials and never on 0% trials, most of the other trials (25%-75%) did not contain the expected stimulation and hence provoked an omission PE.”

      Adaptation in the Results section: “Indeed, the provided instructions did not map exactly onto the actually experienced probabilities, but were all followed by stimulation in 25% on the trials (except for the 0% trials and the 100% trials).”

      Adaptation in the Methods section: “Since we were mainly interested in how omissions of threat are processed, we wanted to maximize and balance the number of omission trials across the different probability and intensity levels, while also keeping the total number of presentations per probability and intensity instruction constant. Therefore, we crossed all non-0% probability levels (25, 50, 75, 100) with all intensity levels (weak, moderate, strong) (12 trials). The three 100% trials were always followed by the stimulation of the instructed intensity, while stimulations were omitted in the remaining nine trials. Six additional trials were intermixed in each run: Three 0% omission trials with the information that no electrical stimulation would follow (akin to 0% Probability information, but without any Intensity information as it does not apply); and three trials from the Probability x Intensity matrix that were followed by electrical stimulation (across the four runs, each Probability x Intensity combination was paired at least once, and at most twice with the electrical stimulation).”

      Could the incongruence between the instructed and experienced reinforcement rate have detrimental effects on the probability effect? We agree with reviewer 2 that it is possible that the inconsistency between instructed and experienced reinforcement rates could have rendered the exact probability information less informative to participants, which might have resulted in them paying less attention to the probability information whenever the probability was not 0% or 100%. This might to some extent explain the relatively larger difference in responding between 0% and 25% to 75% trials, but the relatively smaller differences between the 25% to 75% trials.

      However, there are good reasons to believe that the relatively smaller difference between 25% to 75% trials was not caused by the “inaccurate” nature of our instructions, but is inherent to “uncertain” probabilities.

      We added a description of these reasons to the supplementary materials in a supplementary note (supplementary note 4; lines 97-129 in supplementary materials), and added a reference to this note in the methods section (lines 488-490).

      “Supplementary Note 4: “Accurate” probability instructions do not alter the Probability-effect

      A question that was raised by the reviewers was whether the inconsistency between the probability instruction and the experienced reinforcement rate could have detrimental effects on the Probability-related results; especially because the effect of Probability was smaller when only including non-0% trials.

      However, there are good reasons to believe that the relatively smaller difference between 25% to 75% trials was not caused by the “inaccurate” nature of our instructions, but that they are inherent to “uncertain” probabilities.

      First, in a previously unpublished pilot study, we provided participants with “accurate” probability instructions, meaning that the instruction corresponded to the actual reinforcement rate (e.g., 75% instructions were followed by a stimulation in 75% of the trials etc.). In line with the present results and our previous behavioral study (Willems & Vervliet, 2021), the results of this pilot (N = 20) showed that the difference in the reported relief between the different probability levels was largest when comparing 0% and the rest (25%, 50% and 75%). Furthermore the overall effect size of Probability (excluding 0%) matched the one of our previous behavioral study (Willems & Vervliet, 2021): ηp2 = +/- 0.50.”

      Author response image 1.

      Main effect of Probability including 0% : F(1.74,31.23) = 53.94, p < .001, ηp2 = 0.75

      Main effect of Probability excluding 0%: F(1.50, 28.43) = 21.03, p < .001, ηp2 = 0.53

      Second, also in other published studies that used CSs with varying reinforcement rates (which either included explicit written instructions of the reinforcement rates or not) showed that the difference in expectations, anticipatory SCR or omission SCR was largest when comparing the CS0% to the other CSs of varying reinforcement rates (Grings & Sukoneck, 1971; Öhman et al., 1973; Ojala et al., 2022).

      Together, this suggests that when there is a possibility of stimulation, any additional difference in probability will have a smaller effect on the omission responses, irrespective of whether the underlying reinforcement rate is accurate or not.

      Adaptation to methods section: “Note that, based on previous research, we did not expect the inconsistency between the instructed and perceived reinforcement rate to have a negative effect on the Probability manipulation (see Supplementary Note 4).”

      Did dynamic learning impact the believability of the instructions?

      Although we tried to minimize learning in our paradigm by providing instructions that trials are independent from one another, we agree with the reviewers that this cannot preclude all learning. Any remaining learning effects should present themselves by downweighing the effect of the probability instructions over time. We controlled for this time-effect by including a “run” regressor in our analyses. Results of the Run regressor for subjective relief and omission-related SCR are presented in Supplemental Figure 5. These figures show that although there was a general drop in reported relief pleasantness and omission SCR over time, the effects of probability and intensity remained present until the last run. This indicates that even though some learning might have taken place, the main manipulations of probability and intensity were still present until the end of the task.

      Adaptations in the revised manuscript: We more clearly referred to the results of the Blockregressor which were presented in the supplementary material in the results section of the main paper (lines 159-162).

      Note that while there was a general drop in reported relief pleasantness and omission SCR over time, the effects of Probability and Intensity remained present until the last run (see Supplementary Figure 5). This further confirms that probability and intensity manipulations were effective until the end of the task.

      In the following sections of the rebuttal letter, we will go over the rest of the comments and our responses one by one.

      Reviewer #1 (Public Review):

      Summary:

      Willems and colleagues test whether unexpected shock omissions are associated with reward-related prediction errors by using an axiomatic approach to investigate brain activation in response to unexpected shock omission. Using an elegant design that parametrically varies shock expectancy through verbal instructions, they see a variety of responses in reward-related networks, only some of which adhere to the axioms necessary for prediction error. In addition, there were associations between omission-related responses and subjective relief. They also use machine learning to predict relief-related pleasantness, and find that none of the a priori "reward" regions were predictive of relief, which is an interesting finding that can be validated and pursued in future work.

      Strengths:

      The authors pre-registered their approach and the analyses are sound. In particular, the axiomatic approach tests whether a given region can truly be called a reward prediction error. Although several a priori regions of interest satisfied a subset of axioms, no ROI satisfied all three axioms, and the authors were candid about this. A second strength was their use of machine learning to identify a relief-related classifier. Interestingly, none of the ROIs that have been traditionally implicated in reward prediction error reliably predicted relief, which opens important questions for future research.

      Weaknesses:

      To ensure that the number of omissions is similar across conditions, the task employs inaccurate verbal instructions; i.e. 25% of shocks are omitted, regardless of whether subjects are told that the probability is 100%, 75%, 50%, 25%, or 0%. Given previous findings on interactions between verbal instruction and experiential learning (Doll et al., 2009; Li et al., 2011; Atlas et al., 2016), it seems problematic a) to treat the instructions as veridical and b) average responses over time. Based on this prior work, it seems reasonable to assume that participants would learn to downweight the instructions over time through learning (particularly in the 100% and 0% cases); this would be the purpose of prediction errors as a teaching signal. The authors do recognize this and perform a subset analysis in the 21 participants who showed parametric increases in anticipatory SCR as a function of instructed shock probability, which strengthened findings in the VTA/SN; however given that one-third of participants (n=10) did not show parametric SCR in response to instructions, it seems like some learning did occur. As prediction error is so important to such learning, a weakness of the paper is that conclusions about prediction error might differ if dynamic learning were taken into account.

      We thank the reviewer for raising this important concern. We believe we replied to all the issues raised in the general reply above.

      Lastly, I think that findings in threat-sensitive regions such as the anterior insula and amygdala may not be adequately captured in the title or abstract which strictly refers to the "human reward system"; more nuance would also be warranted.

      We fully agree with this comment and have changed the title and abstract accordingly.

      Adaptations in the revised manuscript: We adapted the title of the manuscript.

      “Omissions of Threat Trigger Subjective Relief and Prediction Error-Like Signaling in the Human Reward and Salience Systems”

      Adaptations in the revised manuscript: We adapted the abstract (lines 27-29).

      “In line with recent animal data, we showed that the unexpected omission of (painful) electrical stimulation triggers activations within key regions of the reward and salience pathways and that these activations correlate with the pleasantness of the reported relief.”

      Reviewer #2 (Public Review):

      The question of whether the neural mechanisms for reward and punishment learning are similar has been a constant debate over the last two decades. Numerous studies have shown that the midbrain dopamine neurons respond to both negative and salient stimuli, some of which can't be well accounted for by the classic RL theory (Delgado et al., 2007). Other research even proposed that aversive learning can be viewed as reward learning, by treating the omission of aversive stimuli as a negative PE (Seymour et al., 2004).

      Although the current study took an axiomatic approach to search for the PE encoding brain regions, which I like, I have major concerns regarding their experimental design and hence the results they obtained. My biggest concern comes from the false description of their task to the participants. To increase the number of "valid" trials for data analysis, the instructed and actual probabilities were different. Under such a circumstance, testing axiom 2 seems completely artificial. How does the experimenter know that the participants truly believe that the 75% is more probable than, say, the 25% stimulation? The potential confusion of the subjects may explain why the SCR and relief report were rather flat across the instructed probability range, and some of the canonical PE encoding regions showed a rather mixed activity pattern across different probabilities. Also for the post-hoc selection criteria, why pick the larger SCR in the 75% compared to the 25% instructions? How would the results change if other criteria were used?

      We thank the reviewer for raising this important concern. We believe the general reply above covers most of the issues raised in this comment. Concerning the post-hoc selection criteria, we took 25% < 75% as criterium because this was a quite “lenient” criterium in the sense that it looked only at the effects of interest (i.e., did anticipatory SCR increase with increasing instructed probability?). However, also when the criterium was more strict (e.g., selecting participants only if their anticipatory SCR monotonically increased with each increase in instructed probability 0% < 25% < 50% < 75% < 100%, N = 11 participants), the probability effect (ωp2 = 0.08), but not the intensity effect, for the VTA/SN remained.

      To test axiom 3, which was to compare the 100% stimulation to the 0% stimulation conditions, how did the actual shock delivery affect the fMRI contrast result? It would be more reasonable if this analysis could control for the shock delivery, which itself could contaminate the fMRI signal, with extra confound that subjects may engage certain behavioral strategies to "prepare for" the aversive outcome in the 100% stimulation condition. Therefore, I agree with the authors that this contrast may not be a good way to test axiom 3, not only because of the arguments made in the discussion but also the technical complexities involved in the contrast.

      We thank the reviewer for addressing this additional confound. It was indeed impossible to control for the delivery of shock since the delivery of the shock was always present on the 100% trials (and thus completely overlapped with the contrast of interest). We added this limitation to our discussion in the manuscript. In addition, we have also added a suggestion for a contrast that can test the “no surprise equivalence” criterium.

      Adaptations in the revised manuscript: We adapted lines 358-364.

      “Thus, given that we could not control for the delivery of the stimulation in the 100% > 0% contrast (the delivery of the stimulation completely overlapped with the contrast of interest), it is impossible to disentangle responses to the salience of the stimulation from those to the predictability of the outcome. A fairer evaluation of the third axiom would require outcomes that are roughly similar in terms of salience. When evaluating threat omission PE, this implies comparing fully expected threat omissions following 0% instructions to fully expected absence of stimulation at another point in the task (e.g. during a safe intertrial interval).”

      Reviewer #3 (Public Review):

      We thank the reviewer for their comments. Overall, based on the reviewer’s comments, we noticed that there was an imbalance between a focus on “relief” in the introduction and the rest of the manuscript and preregistration. We believe this focus raised the expectation that all outcome measures were interpreted in terms of the relief emotion. However, this was not what we did nor what we preregistered. We therefore restructured the introduction to reduce the focus on relief.

      Adaptations in the revised manuscript: We restructured the introduction of the manuscript. Specifically, after our opening sentence: “We experience a pleasurable relief when an expected threat stays away1” we only introduce the role of relief for our research in lines 79-89.

      “Interestingly, unexpected omissions of threat not only trigger neural activations that resemble a reward PE, they are also accompanied by a pleasurable emotional experience: relief. Because these feelings of relief coincide with the PE at threat omission, relief has been proposed to be an emotional correlate of the threat omission PE. Indeed, emerging evidence has shown that subjective experiences of relief follow the same time-course as theoretical PE during fear extinction. Participants in fear extinction experiments report high levels of relief pleasantness during early US omissions (when the omission was unexpected and the theoretical PE was high) and decreasing relief pleasantness over later omissions (when the omission was expected and the theoretical PE was low)22,23. Accordingly, preliminary fMRI evidence has shown that the pleasantness of this relief is correlated to activations in the NAC at the time of threat omission. In that sense, studying relief may offer important insights in the mechanism driving safety learning.”

      Summary:

      The authors conducted a human fMRI study investigating the omission of expected electrical shocks with varying probabilities. Participants were informed of the probability of shock and shock intensity trial-by-trial. The time point corresponding to the absence of the expected shock (with varying probability) was framed as a prediction error producing the cognitive state of relief/pleasure for the participant. fMRI activity in the VTA/SN and ventral putamen corresponded to the surprising omission of a high probability shock. Participants' subjective relief at having not been shocked correlated with activity in brain regions typically associated with reward-prediction errors. The overall conclusion of the manuscript was that the absence of an expected aversive outcome in human fMRI looks like a reward-prediction error seen in other studies that use positive outcomes.

      Strengths:

      Overall, I found this to be a well-written human neuroimaging study investigating an often overlooked question on the role of aversive prediction errors, and how they may differ from reward-related prediction errors. The paper is well-written and the fMRI methods seem mostly rigorous and solid.

      Weaknesses:

      I did have some confusion over the use of the term "prediction-error" however as it is being used in this task. There is certainly an expectancy violation when participants are told there is a high probability of shock, and it doesn't occur. Yet, there is no relevant learning or updating, and participants are explicitly told that each trial is independent and the outcome (or lack thereof) does not affect the chances of getting the shock on another trial with the same instructed outcome probability. Prediction errors are primarily used in the context of a learning model (reinforcement learning, etc.), but without a need to learn, the utility of that signal is unclear.

      We operationalized “prediction error” as the response to the error in prediction or the violation of expectancy at the time of threat omission. In that sense, prediction error and expectancy violation (which is more commonly used in clinical research and psychotherapy; Craske et al., 2014) are synonymous. While prediction errors (or expectancy violations) are predominantly studied in learning situations, the definition in itself does not specify how the “expectancy” or “prediction” arises: whether it was through learning based on previous experience or through mere instruction. The rationale why we moved away from a conditioning study in the present manuscript is discussed in our general reply above.

      We agree with the reviewer that studying prediction errors outside a learning context limits the ecological validity of the task. However, we do believe there is also a strength to this approach. Specifically, the omission-related responses we measure are less confounded by subsequent learning (or updating of the wrongful expectation). Any difference between our results and prediction error responses in learning situation can therefore point to this exact difference in paradigm, and can thus identify responses that are specific to learning situations.

      An overarching question posed by the researchers is whether relief from not receiving a shock is a reward. They take as neural evidence activity in regions usually associated with reward prediction errors, like the VTA/SN . This seems to be a strong case of reverse inference. The evidence may have been stronger had the authors compared activity to a reward prediction error, for example using a similar task but with reward outcomes. As it stands, the neural evidence that the absence of shock is actually "pleasurable" is limited-albeit there is a subjective report asking subjects if they felt relief.

      We thank the reviewer for cautioning us and letting us critically reflect on our interpretation. We agree that it is important not to be overly enthusiastic when interpreting fMRI results and to attribute carelessly psychological functions to mere activations. Therefore, we will elaborate on the precautions we took not to minimize detrimental reverse inference.

      First, prior to analyzing our results, we preregistered clear hypotheses that were based on previous research, in addition to clear predictions, regions of interest and a testing approach on OSF. With our study, we wanted to investigate whether unexpected omissions of threat: (1) triggered activations in the VTA/SN, putamen, NAc and vmPFC (as has previously been shown in animal and human studies); (2) represent PE signals; and (3) were related to self-reported relief, which has also been shown to follow a PE time-curve in fear extinction (Vervliet et al., 2017). Based on previous research, we selected three criteria all PE signals should comply to. This means that if omission-related activations were to represent true PE signals, they should comply to these criteria. However, we agree that it would go too far to conclude based on our research that relief is a reward, or even that the omission-related activations represent only PE signals. While we found support for most of our hypotheses, this does not preclude alternative explanations. In fact, in the discussion, we acknowledge this and also discuss alternative explanations, such as responding to the salience (lines 395-397; “One potential explanation is therefore that the deactivation resulted from a switch from default mode to salience network, triggered by the salience of the unexpected threat omission or by the salience of the experienced stimulation.”), or anticipation (line 425-426; “... we cannot conclusively dismiss the alternative interpretation that we assessed (part of) expectancy instead”).

      Second, we have deliberately opted to only use descriptive labels such as omission-related activations when we are discussing fMRI results. Only when we are talking about how the activations were related to self-reported relief, we talk about relief-related activations.

      I have some other comments, and I elaborate on those above comments, below:

      (1) A major assumption in the paper is that the unexpected absence of danger constitutes a pleasurable event, as stated in the opening sentence of the abstract. This may sometimes be the case, but it is not universal across contexts or people. For instance, for pathological fears, any relief derived from exposure may be short-lived (the dog didn't bite me this time, but that doesn't mean it won't next time or that all dogs are safe). And even if the subjective feeling one gets is temporary relief at that moment when the expected aversive event is not delivered, I believe there is an overall conflation between the concepts of relief and pleasure throughout the manuscript. Overall, the manuscript seems to be framed on the assumption that "aversive expectations can transform neutral outcomes into pleasurable events," but this is situationally dependent and is not a common psychological construct as far as I am aware.

      We thank the reviewer for their comment. We have restructured the introduction because we agree with the reviewer that the introduction might have set false expectations concerning our interpretation of the results. The statements related to relief have been toned down in the revised manuscript.

      Still, we want to note that the initial opening statement “unexpected absence of danger constitutes the pleasurable emotion relief” was based on a commonly used definition of relief that states that relief refers to “the emotion that is triggered by the absence of expected or previously experienced negative stimulation ” (Deutsch, 2015). Both aspects that it is elicited by the absence of an otherwise expected aversive event and that it is pleasurable in nature has received considerable empirical support in emotion and fear conditioning research (Deutsch et al., 2015; Leknes et al., 2011; Papalini et al., 2021; Vervliet et al., 2017; Willems & Vervliet, 2021).

      That said, the notion that the feeling of relief is linked to the (reward) prediction error underlying the learning of safety is included in several theoretical papers in order to explain the commonly observed dopaminergic response at the time of threat omission (both in animals and humans; Bouton et al., 2020; Kalisch et al., 2019; Pittig et al., 2020).

      Together, these studies indicate that the definition of relief, and its potential role in threat omission-driven learning is – at least in our research field – established. Still, we felt that more direct research linking feelings of relief to omission-related brain responses was warranted.

      One of the main reasons why we specifically focus on the “pleasantness” of the relief is to assess the hedonic impact of the threat omission, as has been done in previous studies by our lab and others (Leknes et al., 2011; Leng et al., 2022; Papalini et al., 2021; Vervliet et al., 2017; Willems & Vervliet, 2021). Nevertheless, we agree with the reviewer that the relief we measure is a short-lived emotional state that is subjected to individual differences (as are all emotions).

      (2) The authors allude to this limitation, but I think it is critical. Specifically, the study takes a rather simplistic approach to prediction errors. It treats the instructed probability as the subjects' expectancy level and treats the prediction error as omission related activity to this instructed probability. There is no modeling, and any dynamic parameters affected by learning are unaccounted for in this design . That is subjects are informed that each trial is independently determined and so there is no learning "the presence/absence of stimulations on previous trials could not predict the presence/absence of stimulation on future trials." Prediction errors are central to learning. It is unclear if the "relief" subjects feel on not getting a shock on a high-probability trial is in any way analogous to a prediction error, because there is no reason to update your representation on future trials if they are all truly independent. The construct validity of the design is in question.

      (3) Related to the above point, even if subjects veered away from learning by the instruction that each trial is independent, the fact remains that they do not get shocks outside of the 100% probability shock. So learning is occurring, at least for subjects who realize the probability cue is actually a ruse.

      We thank the reviewer for raising these concerns. We believe that the general reply above covers the issues raised in points 2 and 3.

      (4) Bouton has described very well how the absence of expected threat during extinction can create a feeling of ambiguity and uncertainty regarding the signal value of the CS. This in large part explains the contextual dependence of extinction and the "return of fear" that is so prominent even in psychologically healthy participants. The relief people feel when not receiving an expected shock would seem to have little bearing on changing the long-term value of the CS. In any event, the authors do talk about conditioning (CS-US) in the paper, but this is not a typical conditioning study, as there is no learning.

      We fully agree with the reviewer that our study is no typical conditioning study. Nevertheless, because our research mostly builds on recent advances in the fear extinction domain, we felt it was necessary to introduce the fear extinction procedure and related findings. In the context of fear extinction learning, we have previously shown that relief is an emotional correlate of the prediction error driving acquisition of the novel safety memory (CSnoUS; Papalini et al., 2021; Vervliet et al., 2017). The ambiguity Bouton describes is the result of extinguished CS holding multiple meanings once the safety memory is acquired. Does it signal danger or safety? We agree with Bouton that the meaning of the CS for any new encounter will depend on the context, and the passage of time, but also on the initial strength of the safety acquisition (which is dependent on the size of the prediction error, and hence the amount of relief; Craske et al., 2014). However, it was not our objective to directly study the relation of relief to subsequent CS value, and our design is not tailored to do so post hoc.

      (5) In Figure 2 A-D, the omission responses are plotted on trials with varying levels of probability. However, it seems to be missing omission responses in 0% trials in these brain regions. As depicted, it is an incomplete view of activity across the different trial types of increasing threat probability.

      We thank the reviewer for pointing out this unclarity. The betas that are presented in the figures represent the ROI averages from each non-0% vs 0% contrasts (i.e., 25%>0%; 50%>0%; and 75%>0% for the weak, moderate and strong intensity levels). Any positive beta therefore indicates a stronger activation in the given region compared to a fully predicted omission. Any negative beta indicates a weaker activation.

      Adaptations in the revised manuscript: We have adapted the figure captions of figures 2 and 3.

      “The extracted beta-estimates in figures A-D represent the ROI averages from each non0% > 0% contrast (i.e., 25%>0%; 50%>0%; and 75%>0% for the weak, moderate and strong intensity levels). Any positive beta therefore indicates a stronger activation in the given region compared to a fully predicted omission. Any negative beta indicates a weaker activation.”

      (6) If I understand Figure 2 panels E-H, these are plotting responses to the shock versus no-shock (when no-shock was expected). It is unclear why this would be especially informative, as it would just be showing activity associated with shocks versus no-shocks. If the goal was to use this as a way to compare positive and negative prediction errors, the shock would induce widespread activity that is not necessarily reflective of a prediction error. It is simply a response to a shock. Comparing activity to shocks delivered after varying levels of probability (e.g., a shock delivered at 25% expectancy, versus 75%, versus 100%) would seem to be a much better test of a prediction error signal than shock versus no-shock.

      We thank the reviewer for this comment. The purpose of this preregistered contrast was to test whether fully predicted outcomes elicited equivalent activations in our ROIs (corresponding to the third prediction error axiom). Specifically, if a region represents a pure prediction error signal, the 100% (fully predicted shocks) > 0% (fully predicted shock omissions) contrast should be nonsignificant, and follow-up Bayes Factors would further provide evidence in favor of this null-hypothesis.

      We agree with the reviewer that the delivery of the stimulation triggers widespread activations in our regions of interest that confounded this contrast. However, given that it was a preregistered test for the prediction error axioms, we cannot remove it from the manuscript. Instead, we have argued in the discussion that future studies who want to take an axiomatic stance should consider alternative tests to examine this axiom.

      Adaptations in the revised manuscript: We adapted lines 358-364.

      “Thus, given that we could not control for the delivery of the stimulation in the 100% > 0% contrast (the delivery of the stimulation completely overlapped with the contrast of interest), it is impossible to disentangle responses to the salience of the stimulation from those to the predictability of the outcome. A fairer evaluation of the third axiom would require outcomes that are roughly similar in terms of salience. When evaluating threat omission PE, this implies comparing fully expected threat omissions following 0% instructions to fully expected absence of stimulation at another point in the task (e.g. during a safe intertrial interval).”

      Also note that our task did not lend itself for an in-depth analysis of aversive (worse-thanexpected) prediction error signals, given that there was only one stimulation trial for each probability x intensity level (see Supplemental Figure 1). The most informative contrast that can inform us about aversive prediction error signals contrasts all non-100% stimulation trials with all 100% stimulation trials. The results of this contrast are presented in Supplemental Figure 16 and Supplemental Table 11 for completeness.

      (7) I was unclear what the results in Figure 3 E-H were showing that was unique from panels A-D, or where it was described. The images looked redundant from the images in A-D. I see that they come from different contrasts (non0% > 0%; 100% > 0%), but I was unclear why that was included.

      We thank the reviewer for this comment. Our answer is related to that of the previous comment. Figure 3 presents the results of the axiomatic tests within the secondary ROIs we extracted from a wider secondary mask based on the non0%>0% contrast.

      (8) As mentioned earlier, there is a tendency to imply that subjects felt relief because there was activity in "the reward pathway ."

      We thank the reviewer for their comment, but we respectfully disagree. Subjective relief was explicitly probed when the instructed stimulations stayed away. In the manuscript we only talk about “relief” when discussing these subjective reports. We found that participants reported higher levels of relief-pleasantness following omissions of stronger and more probable threat. This was an observation that matches our predictions and replicates our previous behavioral study (Willems & Vervliet, 2021).

      The fMRI evidence is treated separately from the “pleasantness” of the relief. Specifically, we refrain from calling the threat omission-related neural responses “relief-activity” as this would indeed imply that the activation would only be attributed to this psychological function. Instead, we talked about omission-related activity, and we assessed whether it complied to the prediction error criteria as specified by the axiomatic approach.

      Only afterwards, because we hypothesized that omission-related fMRI activation and selfreported relief-pleasantness were related, and because we found a similar response pattern for both measures, we examined how relief and omission-related fMRI activations within our ROIs were related on a trial-by-trial basis. To this end, we entered relief-pleasantness ratings as a parametric modulator to the omission regressor.

      By no means do we want to reduce an emotional experience (relief) to fMRI activations in isolated regions in the brain. We agree with the reviewer that this would be far too reductionist. We therefore also ran a pre-registered LASSO-PCR analysis in order to identify whether a whole-brain pattern of activations can predict subjective relief (independent from the exact instructions we gave, and independent of our a priori ROIs). This analysis used trialby-trial patterns of activation across all voxels in the brain as the predictor and self-reported relief as the outcome variable. It is therefore completely data-driven and can be seen as a preregistered exploratory analysis that is intended to inform future studies.

      (9) From the methods, it wasn't entirely clear where there is jitter in the course of a trial. This centers on the question of possible collinearity in the task design between the cue and the outcome. The authors note there is "no multicollinearity between anticipation and omission regressors in the firstlevel GLMs," but how was this quantified? b The issue is of course that the activity coded as omission may be from the anticipation of the expected outcome.

      We thank the reviewer for pointing out this unclarity. Jitter was introduced in all parts of the trial: i.e., the duration of the inter-trial interval (4-7s), countdown clock (3-7s), and omission window (4-8s) were all jittered (see fig. 1A and methods section, lines 499-507). We added an additional line to the method section.

      Adaptations in the revised manuscript: We added an additional line of to the methods section to further clarify the jittering (lines 498-500).

      “The scale remained on the screen for 8 seconds or until the participant responded, followed by an intertrial interval between 4 and 7 seconds during which only a fixation cross was shown. Note that all phases in the trial were jittered (i.e., duration countdown clock, duration outcome window, duration intertrial interval).”

      Multicollinearity between the omission and anticipation regressors was assessed by calculating the variance inflation factor (VIF) of omission and anticipation regressors in the first level GLM models that were used for the parametric modulation analyses.

      Adaptations in the revised manuscript: We replaced the VIF abbreviation with “variance inflation factor” (line 423-424).

      “Nevertheless, there was no multicollinearity between anticipation and omission regressors in the first-level GLMs (VIFs Variance Inflation Factor, VIF < 4), making it unlikely that the omission responses purely represented anticipation.”

      (10) I did not fully understand what the LASSO-PCR model using relief ratings added. This result was not discussed in much depth, and seems to show a host of clusters throughout the brain contributing positively or negatively to the model. Altogether, I would recommend highlighting what this analysis is uniquely contributing to the interpretation of the findings.

      The main added value of this analyses is that it uses a different approach altogether. Where the (mass univariate) parametric modulation analysis estimated in each voxel (and each ROI) whether the activity in this voxel/ROI covaried with the reported relief, a significant activation only indicated that this voxel was related to relief. However, given that each voxel/ROI is treated independently in this analysis, it remains unclear how the activations were embedded in a wider network across the brain, and which regions contributed most to the prediction of relief. The multivariate LASSO-PCR analysis approach we took attempts to overcome this limitation by examining if a more whole-brain pattern can predict relief. Because we use the whole-brain pattern (and not only our a priori ROIs), this analysis is completely data-driven and is intended to inform future studies. In addition, the LASSO-PCR model was cross-validated using five-fold cross-validation, which is also a difference (and a strength) compared to the mass univariate GLM approach.

      One interesting finding that only became evident when we combined univariate and multivariate approaches is that despite that the parametric modulation analysis showed that omission-related fMRI responses in the ROIs were modulated by the reported relief, none of these ROIs contributed significantly to the prediction of relief based on the identified signature. Instead, some of the contributing clusters fell within other valuation and errorprocessing regions (e.g. lateral OFC, mid cingulate, caudate nucleus). This suggests that other regions than our a priori ROIs may have been especially important for the subjective experience of relief, at least in this task. However, all these clusters were small and require further validation in out of sample participants. More research is necessary to test the generalizability and validity of the relief signature to new individuals and tasks, and to compare the signature with other existing signature models (e.g., signature of pain, fear, reward, pleasure). However, this was beyond the scope of the present study.

      Adaptations in the revised manuscript: We altered the explanation of the LASSO-PCR approach in the results section (lines 286-295) and the discussion (lines 399-402)

      Adaptations in the Results section: “The (mass univariate) parametric modulation analysis showed that omission-related fMRI activity in our primary and secondary ROIs correlated with the pleasantness of the relief. However, given that each voxel/ROI is treated independently in this analysis, it remains unclear how the activations were embedded in a wider network of activation across the brain, and which regions contributed most to the prediction of relief. To overcome these limitations, we trained a (multivariate) LASSO-PCR model (Least Absolute Shrinkage and Selection Operator-Regularized Principle Component Regression) in order to identify whether a spatially distributed pattern of brain responses can predict the perceived pleasantness of the relief (or “neural signature” of relief)31. Because we used the whole-brain pattern (and not only our a priori ROIs), this analysis is completely data driven and can thus identify which clusters contribute most to the relief prediction.”

      Adaptations in the Discussion section: “In addition to examining the PE-properties of neural omission responses in our a priori ROIs, we trained a LASSO-PCR model to establish a signature pattern of relief. One interesting finding that only became evident when we compared the univariate and multivariate approach was that none of our a priori ROIs appeared to be an important contributor to the multivariate neural signature, even though all of them (except NAc) were significantly modulated by relief in the univariate analysis.”

      In addition to the public peer review, the reviewers provided some recommendation on how to further improve our manuscript. We will reply to the recommendations below.

      Reviewer #1 (Recommendations For The Authors):

      Given that you do have trial-level estimates from the classifier analysis, it would be very informative to use learning models and examine responses trial-by-trial to test whether there are prediction errors that vary over time as a function of learning.

      We thank the reviewer for the suggestion. However, based on the results of the run-regressor, we do not anticipate large learning effects in our paradigm. As we mentioned in our responses above, we controlled for time-related drops in omission-responding by including a “run” regressor in our analyses. Results of this regressor for subjective relief and omission-related SCR showed that although there was a general drop in reported relief pleasantness and omission SCR over time, the effects of probability and intensity remained present until the last run. This suggests that even though some learning might have taken place, its effect was likely small and did not abolish our manipulations of probability and intensity. In any case, we cannot use the LASSO-PCR signature model to investigate learning, as this model uses the trial-level brain pattern at the time of US omission to estimate the associated level of relief. These estimates can therefore not be used to examine learning effects.

      Reviewer #2 (Recommendations For The Authors):

      The LASSO-PCR model feels rather disconnected from the rest of the paper and does not add much to the main theme. I would suggest to remove this part from the paper.

      We thank the reviewer for this suggestion. However, the LASSO-PCR analysis was a preregistered. We therefore cannot remove it from the manuscript. We hope to have clarified its added value in the revised version of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We have revised the manuscript mainly in the following aspects: (1) the data of electrophysiological and behavioral responses of larvae and adults to trehalose have been added, and the related figures and texts have been modified accordingly; (2) the photos of taste organs of larvae and adults indicating the position of recorded sensilla have been added; (3) the potential off-target effects of GR knock-out on other GR expressions has been carefully explained and revised in the relevant text; (4) the abstract has been revised to present the findings more technically in a limited number of words; (5) some details of experiments in Materials and Methods and some new literatures have been added; (6) a new figure (Figure 8) summarizing the main findings of the study has been added.

      In the following, we respond to the reviewers’ comments and suggestions one by one. We hope that our answers will satisfy you and the three reviewers. We are also very happy to get further valuable advices from you.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The process of taste perception is significantly more intricate and complex in Lepidopteran insects. This investigation provides valuable insights into the role of Gustatory receptors and their dynamics in the sensation of sucrose, which serves as a crucial feeding cue for insects. The article highlights the differential sensitivity of Grs to sucrose and their involvement in feeding and insect behavior.

      Strengths:

      To support the notion of the differential specificity of Gr to sucrose, this study employed electrophysiology, ectopic expression of Grs in Xenopus, genome editing, and behavioral studies on insects. This investigation offers a fundamental understanding of the gustation process in lepidopteran insects and its regulation of feeding and other gustation-related physiological responses. This study holds significant importance in advancing our comprehension of lepidopteran insect biology, gustation, and feeding behavior.

      Thank you for your recognition of our research.

      Weaknesses:

      While this manuscript demonstrates technical proficiency, there exists an opportunity for additional refinement to optimize comprehensibility for the intended audience. Several crucial sugars have been overlooked in the context of electrophysiology studies and should be incorporated. Furthermore, it is imperative to consider the potential off-target effects of Gr knock-out on other Gr expressions. This investigation focuses exclusively on Gr6 and Gr10, while neglecting a comprehensive narrative regarding other Grs involved in sucrose sensation.

      We accept the reviewer's suggestion. Because trehalose is a main sugar in insect blood, and it is converted by insects after feeding on plant sugars, we have added the new data on electrophysiological and behavioral responses of larvae and adults of Helicoverpa armigera to trehalose (see Figure 1-2, Figure 1-figure supplement 1, Figure 2-figure supplement 1). Now, the total eight sugars include 2 pentoses (arabinose and xylose), 4 hexoses (fructose, fucose, galactose and glucose), and 2 disaccharides (sucrose and trehalose), which were chosen because they are mainly present in host-plants of H. armigera and/or representative in the structure and source of sugars.

      We fully agree to the reviewer’s opinion and have already taken the potential off-target effects of CRISPR/Cas9 knockout of Gr on other GR expressions into consideration. To predict the potential off-target sites of sgRNA of Gr6 and Gr10 establishing homozygous mutants using CRISPR/Cas9 technology, we first use online software CasOFFinder (http://www.rgenome.net/cas-offinder/) to blast the genome of the wild type cotton bollworm and set the mismatch number less than or equal to 3. We found that Gr10 sgRNA had no potential potential off-target site, and the sgRNA of Gr6 had only one potential off-target site. Therefore, we designed primers according to the sequence of potential off-target sites of Gr6 sgRNA, and conducted PCR using genomic DNA of homozygous mutant as a template, performed Sanger sequencing on the PCR products obtained, and found that the potential off-target sites of Gr6 sgRNA were no different from those of the wild type. Particularly, concerning the sgRNA of Gr6 and Gr10 may produce off-target effects on other sugar receptor genes of H. armigera, we conducted the same off-target site analysis with the designed sgRNA on each of the other eight sugar receptor genes, and found that there were no off-target sites on these receptor genes (see Line254-256).

      Reviewer #2 (Public Review):

      Summary:

      To identify sugar receptors and assess the capacity of these genes the authors first set out to identify behavioral responses in larvae and adults as well as physiological response. They used phylogenetics and gene expression (RNAseq) to identify candidates for sugar reception. Using first an in vitro oocyte system they assess the responses to distinct sugars. A subsequent genetic analysis shows that the Gr10 and Gr6 genes provide stage specific functions in sugar perception.

      Strengths:

      A clear strength of the manuscript is the breadth of techniques employed allowing a comprehensive study in a non-canonical model species.

      Thank you for your recognition of our research.

      Weaknesses:

      There are no major weaknesses in the study for the current state of knowledge in this species. Since it is much basic work to establish a broader knowledge, context with other modalities remains unknown. It might have been possible to probe certain contexts known from the fruit fly, which would have strengthened the manuscript.

      Thank you so much for your suggestion. According to this suggestion, we further added some sentences probing sugar sensing and behaviors of fruit fly larvae in the Introduction and discussion sections (Line 68-71 in Introduction section, Line 395-399 in Discussion section).

      Reviewer #3 (Public Review):

      In this study, the authors combine electrophysiology, behavioural analyses, and genetic editing techniques on the cotton bollworm to identify the molecular basis of sugar sensing in this species.

      The larval and adult forms of this species feed on different plant parts. Larvae primarily consume leaves, which have relatively lower sugar concentrations, while adults feed on nectar, rich in sugar. Through a series of experiments-spanning electrophysiological recordings from both larval and adult sensillae, qPCR expression analysis of identified GRs from these sensillae, response profiles of these GRs to various sugars via heterologous expression in Xenopus oocytes, and evaluations of CRISPR mutants based on these parameters-the authors discovered that larvae and adults employ distinct GRs for sugar sensing. While the larva uses the highly sensitive GR10, the adult uses the less sensitive and broadly tuned GR6. This differential use of GRs are in keeping with their behavioral ecology.

      The data are cohesive and consistently align across the methodologies employed. They are also well presented and the manuscript is clearly written.

      Recommendations for the authors:

      While appreciating the quality of the work and its presentation, we have a few comments for the authors, should they wish to consider them, that would significantly improve the presentation of the work.

      Title: Could the authors please revisit their title to better reflect the main finding of their work?

      The title has been changed into “The larva and adult of Helicoverpa armigera use differential gustatory receptors to sense sugars”.

      Text: There are a few comments related to the text, and these are listed below:

      (1) Could the authors place their work in the context of what's known about sugar sensing in Drosophila larva and adult?

      In the Introduction section, we added the status of research on sugar perception in Drosophila larvae, pointing out "No external sugar-sensing mechanism in Drosophila larvae has yet been characterized." (Line 70-71); in the Discussion section, the research progress of sugar sensing in Drosophila adults and larvae was also summarized (Line 397-399).

      (2) For each results section, could the authors please include a sentence or two that interprets the data in the context of previously presented data?

      We accept the reviewer's suggestion. In order to make it easy for readers to follow up, we included a sentence interprets the above data at the beginning of each part of the Results on the premise of avoiding duplication.

      (3) Could the authors please provide details of the generation and screening of the CRISPR mutants?

      We have added more details on mutant establishment and screening in the Materials and Methods section (Line 722-726, 729-732).

      Figures: Could the authors please include images and schematics wherever possible? For example, a schematic depicting the position of the sense organs and one summarising the main findings of the studies.

      In Figure 1 we added the photo of each taste organ, on which the recorded sensilla were indicated. We also added a new figure, Figure 8, summarizing the main findings of the study.

      Choice of Sugars: Could the authors please justify their choice of sugars they have used in the analyses?

      In the first paragraph of the Results section of the article, we further explain the reasons for using the sugars in the study. “We first investigated the electrophysiological responses of the lateral and medial sensilla styloconica in the larval maxillary galea to eight sugars. These sugars were chosen because they are mostly found in host-plants of H. armigera or are representative in the structure and source of sugars.”

      In addition to this, there are several specific comments in the detailed reviewers comments below, which the authors could consider responding to.

      Reviewer #1 (Recommendations For The Authors):

      The article titled "Sucrose taste receptors exhibit dissimilarities between larval and adult stages of a moth" by Shuai-Shuai Zhang and colleagues provides an intriguing analysis. The authors have conducted a meticulously planned and executed study. However, I do have some inquiries.

      (1) What precisely does the term "differ" signify in the title? It can be expounded upon in terms of differing in expression or sensitivity. The title could benefit from being more informative. The authors should appropriately specify the insect species in the title of the paper. This would make it more comprehensible to readers. Merely mentioning the term "moth" does not provide any information about the model organism. Hence, it would be preferable to mention Helicoverpa armigera instead of using the generic term "moth" in the title.

      Thank you for your suggestions. We considered it better to emphasize that the receptors for sucrose are different, and we have accepted the suggestion of adding the name of the animal. The title has been changed into “The larva and adult of Helicoverpa armigera use differential gustatory receptors to sense sugars”.

      (2) The abstract is written in a simple and easily understandable manner, but it overlooks important findings from a technical standpoint.

      We add some key experimental techniques to illustrate some important findings in the Abstract.

      (3). Almost all herbivorous insects are said to consume plants and utilize sucrose as a stimulus for feeding, as stated by the authors. Sucrose, glucose, and fructose sugar are among the commonly observed stimulants for feeding in numerous insects. It would be appropriate to incorporate not only sucrose but also glucose and fructose as feeding stimulants for almost all herbivorous insects.

      Thank you for your suggestion. Sucrose is the major sugar in plants, and its concentration varies greatly from tissue to tissue, while the concentration of the hexose sugars is much lower and the concentration does not change much. In Line 48, we state that sucrose, glucose, and fructose are feeding stimuli for herbivorous insects. From the previous studies, it seems that sucrose is the strongest, followed by fructose, and finally glucose. The cotton bollworm larvae showed no electrophysiological and behavioral response to glucose.

      (4) The reason why trehalose is not considered in the electrophysiology analysis is unclear. Given that trehalose is a major sugar in insects and plants, it would be intriguing to include it in the analysis.

      We have accepted the reviewer's suggestion, and supplemented the electrophysiological responses of taste organs in larvae and adults of Helicoverpa armigera to trehalose (Figure 1, Figure 1-Figure Supplement 1), and also tested the behavioral responses of the larvae and adults to trehalose (Figure 2, Figure 2-Figure Supplement 1). Therefore, all the related figures have been changed.

      (5) The author's intention regarding the co-receptor relationship between Gr5 and Gr6 (line 211) is unclear. If this is indeed the case, then the reason for considering Gr5 in further studies remains uncertain.

      We have changed the sentence as follows: “Since Gr5 was highly expressed with Gr6 in the proboscis and tarsi (Figure 3D-3E, Figure 3—figure supplement 1), we suspected that Gr5 and Gr6 might be expressed in the same cells, and then tested the response profile of their co-expression in oocytes.”

      (6) The homologous nature of Grs is emphasized by the authors. It is not specified how the author ensured that the guide RNA targeting Gr6 or Gr10 did not result in off-target effects on other Grs.

      Thank you so much for your suggestion. We have rewritten the relevant paragraph (Line 238-251), detailing our tests and the results on the potential off-target effects of knocking out GRs by CRISPR/Cas9: “In order to predict the potential off-target sites of sgRNA of Gr6 and Gr10, we used online software Cas-OFFinder (http://www.rgenome.net/cas-offinder/) to blast the genome of H. armigera, and the mismatch number was set to less than or equal to 3. According to the predicted results, the Gr10 sgRNA had no potential off-target region but Gr6 sgRNA had one. Therefore, we amplified and sequenced the potential off-target region of Gr6-/- and found there was no frameshift or premature stop codon in the region compared to WT (Figure 5—figure supplement 2). It is worth mentioning that there was no potential off-target region of Gr6 and Gr10 sgRNA in other sugar receptor genes of H. armigera, Gr4, Gr5, Gr7, Gr8, Gr9, Gr11 and Gr12. We further found there was no difference in the response to xylose of the medial sensilla styloconica among WT, Gr10-/- and Gr6-/- (Figure 5—figure supplement 2). Furthermore, WT, Gr10-/- and Gr6-/- did not show differences in the larval body weight, adult lifespan, and number of eggs laid per female (Figure 5—figure supplement 2). All these results suggest that no off-target effects occurred in the study.”

      (7) Is it possible that knocking out Gr10 is not compensated for by the overexpression of Gr6 or other sucrose sensing Grs? Similarly, would the vice versa scenario hold true?

      In the Discussion section, we have added some sentences to discuss this issue: “From our results, knocking out Gr10 or Gr6 is unlikely to be compensated by overexpression of other sugar GRs. One of our recent studies showed that Orco knockout had no significant effect on the expression of most OR, IR and GR genes in adult antennae of H. armigera, but some genes were up- or down-regulated (Fan et al., 2022).”

      (8) What was the rationale for selecting nine candidate GR genes for expression analysis?

      Based on the reviewer's suggestion, we expanded the relevant paragraphs to illustrate the rationale for selecting nine candidate GR genes for expression analysis: “To reveal the molecular basis of sugar reception in the taste sensilla of H. armigera, we first analyzed the putative sugar gustatory receptor genes based on the reported gene sequences of GRs in H. armigera and their phylogenetic relationship of D. melanogaster sugar gustatory receptors (Jiang et al., 2015; Pearce et al., 2017; Xu et al., 2017). Nine putative sugar GR genes, Gr4–12 were identified, and their full-length cDNA sequences were cloned (The GenBank accession number is provided in Appendix—Table S1).” (Line 155-161)

      (9) What is the potential reason for the difference between the major larval sugar receptors of Drosophila and Lepidopterans?

      The difference between the major larval sugar receptors of Drosophila and Lepidopterans is probably due to differences in the food their larvae feed on. Fruit fly larvae feed on rotten fruit, the main sugar of which is fructose. The larvae of Lepidoptera mainly feed on plants, and the main sugar is sucrose. In the Discussion section, we have added a sentence “This is most likely due to fruit fly larvae feeding on rotten fruits, which contain fructose as the main sugar.” (Line 399-401)

      (10) There is a disparity in GRs, specifically GR5 and GR6, between the female antenna, proboscis, and tarsi. What could be the possible justification and significance of this?

      Thank you so much for this question. We have added a sentence in the Discussion section, “In this study, the expression patterns of 9 sugar GRs in three taste organs of adult H. armigera show that there is a disparity in GRs, specifically GR5 and GR6, between the female antenna, tarsi and proboscis, which may be an evolutionary adaptation reflecting subtle differentiation in the function of these taste organs in adult foraging. Antennae and tarsi play a role in the exploration of potential sugar sources, while the proboscis plays a more precise role in the final decision to feed.” (Line 433-438)

      (11) I suggest that a visual representation illustrating the positioning of GSNs, particularly the lateral and medial sensilla, in both larva and adult stages would enhance the correlation with the results.

      In Figure 1 we added the photo of each taste organ and the position of the recorded sensilla, and also added a new figure, Figure 8 summarizing the main findings of the studies.

      (12) Further experiments can be conducted to elucidate the precise molecular mechanisms, particularly the downstream effects of GRs, in order to establish the specificity of GRs more convincingly.

      Thank you so much for your suggestion. We have discussed the further experiments in the Discussion section, “To elucidate the precise molecular mechanisms of sugar reception in H. armigera is necessary to compare a series of single, double and even multiple Gr knock-out lines and investigate the downstream effects of the GRs.” (Line 363-369)

      (13) Figure 6 caption: In Figure 6 (D to I), the percentage of PER is depicted. There is redundancy in the Y-axis title (Percentage of PER) and the legend. This appears to be repetitive. I suggest that it would be better to include the Y-axis title only in Figure D or in Figures D and G.

      We accept the suggestion. Figure 7 (not Figure 6) has been revised accordingly.

      (14) In Figures 6A and 6C, there is inconsistency in the colors used for WT, Gr6, and Gr10. This could potentially confuse the reader. I recommend using the same colors in both figures instead of using a blue color. Please specify how the authors calculated the feeding area in Figure 6.

      We accept the reviewer's suggestion and have changed the color of Figure 7A, B. We have also added the detail method for calculating feeding area (Line 541-545).

      (15) In Two-choice tests, why did the authors use 0.01% Tween 80? Please provide comments on this.

      Use of 0.01% Tween 80 is to reduce the surface tension and increase the malleability of the solution. We have given detailed explanation in the Method section and cite the reference. (Line538-540)

      (16) It would be valuable if the authors could comment on the prospects of this study, considering that GRs play a vital role in controlling behavior and developmental pathways. What are the potential consequences of blocking or disrupting these receptors in terms of behavioral and developmental phenotypic deformities? Could this potentially lead to increased insect mortality?

      Thank you so much for your suggestions. In the last paragraph of the Discussion section, we have added the following perspectives, “Knockout of Gr10 or Gr6 led to a significant decrease in sugar sensitivity and food preference of the larvae and adults of H. armigera, respectively, which is bound to bring adverse consequences to survival and reproduction of the insects. Therefore, studying the molecular mechanisms underlying sugar perception in phytophagous insects may provide new insights into the behavioral ecology of this important and highly diverse group of insects, and measures blocking or disrupting sugar receptors could also have applications to control agricultural pests and improve crop yields worldwide” (Line 449-456).

      Reviewer #2 (Recommendations for The Authors):

      There are a few comments, that I feel would be beneficial to be addressed.

      • The authors used 7 different sugars for their experimental approach. While I agree that this is a sufficiently large collection for a study, I was wondering why they specifically chose these sugars; an explanatory section might be helpful for a reader to follow the reasoning.

      According to reviewer 1's suggestion, we increased trehalose to 8 sugars in experiments. Trehalose is a main sugar in insect blood. It is converted by insects after feeding on plant sugars. The 8 sugars were chosen because they are present in host-plants of H. armigera or are representative in the structure and source of sugars. They contain 2 pentoses (arabinose and xylose), 4 hexoses (fructose, fucose, galactose and glucose), and 2 disaccharides (sucrose and trehalose).

      • It might be beneficial to provide some broader overview on the gustatory system in the cotton bollworm, particularly at the larval stage since this may not be common knowledge. Along these lines eg. the complexity of sensilla types, organs and overall number (or estimation) of neurons might be good to know, a graphical representation of the sense organs might be informative.

      In the Introduction section, we give a more specific description on sugar sensitive GSNs in the taste system of the larva and adult of H. armigera, and cite the corresponding references.

      • Concerning phylogeny of GRs, it might be relevant to know how complete the genome information is and some more general background on GR diversity in the cotton bollworm.

      We agree to your opinion. According to this idea, we got the putative sugar GRs from the previously published genome (Pearce et al. 2017) and the related annotation of GRs (Jiang et al. 2015, Xu et al. 2012). We have made a more detailed explanation about this in the new version of the manuscript, “We first analyzed the putative sugar gustatory receptor genes based on the genome data of H. armigera (Pearce et al. 2017), the reported gene sequences of sugar GRs in H. armigera and their phylogenetic relationship of D. melanogaster sugar gustatory receptors (Jiang et al. 2015, Xu et al. 2012). All nine putative sugar GR genes in H. armigera, Gr4–12 were validated, and their full-length cDNA sequences were cloned (The GenBank accession number is provided in Appendix—Table S1).” (Line 155-161).

      • Generation of mutants based on CRISPR is intriguing and a powerful step. While the techniques are well described in the method section, there is no information concerning efficiency or broader feasibility of the approach. I feel it would be quite interesting to learn about how feasible or laborious the approach is to generate mutants (e.g. number of initial injected eggs, the resulting F0 offspring, number of back-crosses, number of screened F1s ....).

      In the Materials and Methods section, we have added specific success rates for each step in the process of building the two mutants (Line 722-726, 729-732).

      Reviewer #3 (Recommendations For The Authors):

      I want to congratulate the authors on this very nice study and have only minor comments for them.

      (1) It would be very nice to include pictures of the larva and adult of H. armigera. It would also help to have schematics of where the sensilla they are recording from are.

      We have added photos of four taste organs on which the recoded sensilla were indicated (Figure 1), and picture of the larva and adult on which the stimulating site was indicated (Figure 2).

      (2) A schematic summarising their findings, including the relevance to the animal's behavioural ecology, will greatly improve interpretations for the broader audience.

      A schematic summarizing the findings has been added.

      (3) The manner in which PIs are represented in figure 2A, B (among others) is confusing. Can the authors please plot the PI and not the feeding area? From the PI values listed beside the plot, it actually suggests that the larvae don't really show a preference. Could the authors please comment on this?

      Yes, sucrose has a significant stimulating effect on larva feeding, but the effect is not as large as the predicted based on the sensitivity of the sensillum, the main reasons are as follows: (1) there are many factors affecting larva feeding, sucrose is only one of them; (2) due to the substrate leaf discs also contain sugar, the effect of newly added sucrose may be reduced. After careful consideration, we think it is better to display the feeding area and PI together so that readers have a complete understanding of the data.

      (4) The heterologous expression experiments suggest that co-expression of GR6 with either GR10 or GR5 somehow suppress the response of the GR6 alone to fucose. Am I reading the data correctly? Why would this be? Perhaps the authors could discuss this. In this context, it would help to reproduce all the GR6 data together.

      Your interpretation is reasonable to a certain extent. The result of co-injection might be that Gr10 or Gr5 inhibited the response of Gr6. However, there is another possibility that the amount of Gr6 sRNA was diluted by co-injection of two GRs, resulting in a reduced response of Gr6 to fucose.

      (5) In general, for each results section, it would help to have a sentence or two that interprets the data in the context of previously presented data. This would help the reader digest the data and interpret it as they read along. Currently, the authors summarise the observations and leave all the interpretation to the discussion section.

      We accept the suggestion. In each part of the results, we have added a sentence to explain the above data, which will help readers to clarify the context of the research more easily.

      (6) Is the GR6 data in 4C not lined up correctly?

      Yes, it is right.

      (7) Line 228 suggests that the mutants were validating with qPCRs - I don't see that data.

      The mutants were not validating with qPCR. We used the ordinary PCR technology at the mRNA level to verify whether the related sequences were really deleted in the mutants.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors present a detailed study of a nearly complete Entomophthora muscae genome assembly and annotation, along with comparative analyses among related and non-related entomopathogenic fungi. The genome is one of the largest fungal genomes sequenced, and the authors document the proliferation and evolution of transposons and the presence/absence of related genetic machinery to explore how this may have occurred. There has also been an expansion in gene number, which appears to contain many "novel" genes unique to E. muscae. Functionally, the authors were interested in CAZymes, proteases, circadian clock related genes (due to entomopathogenicity/ host manipulation), other insect pathogenspecific genes, and secondary metabolites. There are many interesting findings including expansions in trahalases, unique insulinase, and another peptidase, and some evidence for RIP in Entomophthoralean fungi. The authors performed a separate study examining E. muscae species complex and related strains. Specifically, morphological traits were measured for strains and then compared to the 28S+ITSbased phylogeny, showing little informativeness of these morpho characters with high levels of overlap.

      This work represents a big leap forward in the genomics of non-Dikarya fungi and large fungal genomes. Most of the gene homologs have been studied in species that diverged hundreds of millions of years ago, and therefore using standard comparative genomic approaches is not trivial and still relatively little is known. This paper provides many new hypotheses and potential avenues of research about fungal genome size expansion, entomopathogenesis in zygomycetes, and cellular functions like RIP and circadian mechanisms.

      Strengths:

      There are many strengths to this study. It represents a massive amount of work and a very thorough functional analysis of the gene content in these fungi (which are largely unsequenced and definitely understudied). Too often comparative genomic work will focus on one aspect and leave the reader wondering about all the other ways genome(s) are unique or different from others. This study really dove in and explored the relevant aspects of the E. muscae genome.

      The authors used both a priori and emergent properties to shape their analyses (by searching for specific genes of interest and by analyzing genes underrepresented, expanded, or unique to their chosen taxa), enabling a detailed review of the genomic architecture and content. Specifically, I'm impressed by the analysis of missing genes (pFAMs) in E. muscae, none of which are enriched in relatives, suggesting this fungus is really different not by gene loss, but by its gene expansions.

      Analyzing species-level boundaries and the data underlying those (genetic or morphological) is not something frequently presented in comparative genomic studies, however, here it is a welcome addition as the target species of the study is part of a species complex where morphology can be misleading and genetic data is infrequently collected in conjunction with the morphological data.

      Thank you for your careful reading of our work. We’re glad that you identified these areas as strengths.

      Weaknesses:

      The conclusions of this paper are mostly well supported by data, but a few points should be clarified.

      In the analysis of Orthogroups (OGs), the claim in the text is that E. muscae "has genes in multi-species OGs no more frequently than Enotomophaga maimaiga. (Fig. 3F)" I don't see that in 3F. But maybe I'm really missing something.

      Thank you for catching this. You were, in fact, not missing anything at all. There was a mismatch between the data plotted in F and G and how the caption described these data. We very much apologize for the confusion that this must have caused. We have corrected these plots and also made changes to improve interpretability (see below).

      Also related, based on what is written in the text of the OG section, I think portions of Figure 3G are incorrect/ duplicated. First, a general question, related to the first two portions of the graph. How do "Genes assigned to an OG" and "Genes not assigned to an OG" not equal 100% for each species? The graph as currently visualized does not show that. Then I think the bars in portion 3 "Genes in speciesspecific OG" are wrong (because in the text it says "N. thromboides had just 16.3%" species-specific OGs, but the graph clearly shows that bar at around 50%. I think portion 3 is just a duplicate of the bars in portion 4 - they look exactly the same - and in addition, as stated in the text portion 4 "Potentially speciesspecific genes" should be the simple addition of the bars in portion 2 and portion 3 for each species.

      As mentioned above, we sincerely regret the error made in the plot and for the confusion that this caused. F now reflects the percentage of orthogroups (OGs) that possess at least one representative from the indicated species (left) and the percentage of OGs that are species-specific (only possess genes from one species; right). The latter is a subset of the former. G now reflects the percentage of annotated genes that were assigned an OG, per species, as well as the inverse of this - genes that were not assigned to any OG. These should, and now do, sum to 100%. The “Within species-specific OG” data summed with the “Not assigned OG” data yields the “Potentially species-specific data” in the rightmost column.

      In the introduction, there is a name for the phenomenon of "clinging to or biting the tops of plants," it's called summit disease. And just for some context for the readers, summit disease is well-documented in many of these taxa in the older literature, but it is often ignored in modern studies - even though it is a fascinating effect seen in many insect hosts, caused by many, many fungi, nematodes (!), etc. This phenomenon has evolved many times. Nice discussions of this in Evans 1989 and Roy et al. 2006 (both of whom cite much of the older literature).

      You’re right. We have now clarified that this behavior is called “summit disease” and referenced the suggested articles, along with a more recent review.

      Reviewer #2 (Public Review):

      In their study, Stajich and co-authors present a new 1.03 Gb genome assembly for an isolate of the fungal insect parasite Entomophthora muscae (Entomophthoromycota phylum, isolated from Drosophila hydei). Many species of the Entomophthoromycota phylum are specialised insect pathogens with relatively large genomes for fungi, with interesting yet largely unexplored biology. The authors compare their new E. muscae assembly to those of other species in the Entomophthorales order and also more generally to other fungi. For that, they first focus on repetitive DNA (transposons) and show that Ty3 LTRs are highly abundant in the E. muscae genome and contribute to ~40% of the species' genome, a feature that is shared by closely related species in the Entomophthorales. Next, the authors describe the major differences in protein content between species in the genus, focusing on functional domains, namely protein families (pfam), carbohydrate-active enzymes, and peptidases. They highlight several protein families that are overrepresented/underrepresented in the E. muscae genome and other

      Entomophthorales genomes. The authors also highlight differences in components of the circadian rhythm, which might be relevant to the biology of these insect-infecting fungi. To gain further insights into E. muscae specificities, the authors identify orthologous proteins among four Entomophthorales species. Consistently with a larger genome and protein set in E. muscae, they find that 21% of the 17,111 orthogroups are specific to the species. To finish, the authors examine the consistency between methods for species delineation in the genus using molecular (ITS + 28S) or morphological data (# of nuclei per conidia + conidia size) and highlight major incongruences between the two.

      Although most of the methods applied in the frame of this study are appropriate with the scripts made available, I believe there are some major discrepancies in the datasets that are compared which could undermine most of the results/conclusions. More precisely, most of the results are based on the comparison of protein family content between four Entomophthorales species. As the authors mention on page 5, genome (transcriptome) assembly and further annotation procedures can strongly influence gene discovery. Here, the authors re-annotated two assemblies using their own methods and recovered between 30 and 60% more genes than in the original dataset, but if I understand it correctly, they perform all downstream comparative analyses using the original annotations. Given the focus on E. muscae and the small sample size (four genomes compared), I believe performing the comparisons on the newly annotated assemblies would be more rigorous for making any claim on gene family variation.

      Thank you for this comment. While we did compare gene model predictions for two of these assemblies to assess if this difference could account for discrepancies in gene counts, completely reannotating all non-E. muscae datasets was outside of the scope of this study. In our opinion, the total number of predicted genes in a genome is not a best representation of differences since splitting or fusing gene models can inflate seeming differences; the orthology and domain counts are a more accurate assessment of the content. It’s possible that annotation differences may have inflated some gene family counts, however we will note that similar domain trends were observed between the closest species to E. muscae, Entomophaga maimaiga, suggesting that these differences were not sufficient to prevent us from detecting real biological signals. We look forward to continued improvement of our genome through additional sequencing and more clarity on total gene content of E. muscae.

      The authors also investigate the putative impact of repeat-induced point mutation on the architecture of the large Entomophthorales genomes (for three of the eight species in Figure 1) and report low RIP-like dinucleotide signatures despite the presence of RID1 (a gene involved in the RIP process in Neurospora crassa) and RNAi machinery. They base their analysis on the presence of specific PFAM domains across the proteome of the three Entomophthorales species. In the case of RID1, the authors searched for a DNA methyltransferase domain (PF00145), however other proteins than RID1 bear such functional domain (DNMT family) so that in the current analysis it is impossible to say if the authors are actually looking at RID1 homologs (probably not, RID1 is monophyletic to the Ascomycota I believe). Similar comments apply to the analysis of components of the RNAi machinery. A more reliable alternative to the PFAM analysis would be to work with full protein sequences in addition to the functional domains.

      While we understand this concern regarding domain vs. full length protein, the advantage of the domain search is that HMM-based searches are sensitive to detecting more distantly related homologs. Entomophthoralean fungi are distantly related from the ascomycetes in which these mechanisms have been characterized, so we chose a broader search approach that may identify proteins with similar domain structure, but are not necessarily homologs. These searches are presented in the manuscript as preliminary, but worth further investigation. However, our RID-based analysis did not identify convincing homologs for RID1 in entomophthoralean fungi included in our investigation, and we reported low homology (i.e., 12-14%) among our orthogroup of interest and RID1. We have further edited this section to clarify our understanding that these candidates are not RID1 homologs. We had hoped to avoid this implication, but we felt this investigation and null result were worth reporting.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Specific points:

      Results:

      "1.03 Gb genome consisting of 7,810 contigs (N50 = 301.1 kb). Additional... resulted in a final contig count of 7,810 (N50 = 329.6 kb)" So you started and ended with the same contig count but a different N50? Is this a typo?

      Yes, this was a typo. Thank you for bringing this to our attention.

      Figure 1D.

      The colors of Complete1x and Complete2x are too similar to tell them apart.

      The colors have been made more distinct.

      Figure 4B.

      I know C. rosea has been found from insects before, but it's mostly a mycoparasite and occasionally an endophyte, and has bioactivity against a lot of things. I just saw that it's listed as an entomopathogen, and I was surprised. Anyway, leave it as is if you want to, but it's definitely better studied and better known (Google Scholar) as a mycoparasite.

      Thanks for this comment. For the sake of including a more diverse representation of entomopathogenic fungi, we have opted to leave this as is.

      Full references (from the public comment)

      Evans, H.C., 1989. Mycopathogens of insects of epigeal and aerial habitats. Insect-fungus interactions, pp.205-238.

      Roy, H.E., Steinkraus, D.C., Eilenberg, J., Hajek, A.E. and Pell, J.K., 2006. Bizarre interactions and endgames: entomopathogenic fungi and their arthropod hosts. Annu. Rev. Entomol., 51, pp.331-357.

      Reviewer #2 (Recommendations For The Authors):

      I believe the manuscript could largely benefit from restructuring the results section to enhance clarity. The results section reads like a lot of descriptive back and forth, so that the reader lacks a clear rationale. The absence of a consistent dataset used for the different comparisons made all along the manuscript makes it hard to follow.

      Minor comments:

      (No line numbers were available so I refer to page numbers).

      p1

      • not sure about the use of "allied" to describe other fungal species in the title and after (sister species?).

      We didn’t want to use the word sister because not all of these species could be considered sister.

      • Genomic defence against transposable elements rather than "anti"?

      We have rephrased to genomic defense.

      p3

      • Extra parenthesis at Bronski et al.

      This is now corrected.

      • What does newly-available mean here?

      We mean recent. A lot of the datasets we used were very new, and we wanted to emphasize that point.

      • The back and forth between genomes and transcriptomes makes it hard to follow, would clarify from the beginning (in addition to the sequencing method - short vs long-read assemblies as in Figure 1B) or perhaps use a consistent dataset for all subsequent comparative analysis in the Entomophthorales.

      We have denoted our transcriptomic datasets in Fig 1C using parentheses.

      p5

      • Perhaps clarify that class II DNA transposons can also "copy" (single-strand excisions can be repaired by the host machinery).

      We have now included mention of “copy” as well as “jump” mechanisms of Class II transposons per your suggestion.

      p6

      • "beginning roughly concurrently", not clear what "began".

      This is now corrected.

      • "control" rather than "protect against"?

      We’ve changed “protect against” to “counter”.

      • I believe RIP has only been observed (experimentally) in a handful of fungal species, all from the Ascomycota phylum.

      Hood et al, 2005 found signatures of RIP in anther-smut fungus and Horns et al, 2012, found evidence of hypermutability across repeat elements within several Pucciniales species.

      • "RID1 contains two DNA_methylase domains", RID1 has one methyltransferase domain according to the reference Freitag et al, 2002.

      Thank you for drawing this to our attention. It is true RID1 has one methyltransferase region; however, the sequence deposited by Freitag et al, 2002 (AAM27408) is predicted by HMMer to have two adjacent Pfam DNA_methylase domains (i.e., PF00145). In this exploratory analysis, we tried to leverage this characteristic to identify candidate proteins of interest. We have reworded this section to clarify this.

      p8

      • Here and after I would use more informative titles for each paragraph.

      With the exception of the headings for Pfam, CAZy and MEROPs analyses, we believe the other headings are informative. We appreciate this comment, but opt to leave the heading titles as is.

      • I believe presenting the orthology analysis before the more in-depth protein family domain search.

      We leveraged the OG analysis mostly as a way to identify potentially unique genes in E. muscae, so we think the current order makes the most sense.

      p10

      • Figures 3F and G are confusing. The legend for Figure 3F mentions "OGs with >= 2 species" while the figure shows "multi-species OGs", and reads as redundant with the "species-specific" OGs. For the "OGs within species" do I understand it correctly that it represents the number of genes assigned to OGs for each species? If yes, the numbers are in contradiction with Figure 3G. And in Figure 3G shouldn't the sum of "genes assigned in OGs" and "genes nor assigned in OGs" add up to 100? I'm probably missing something here, but I would clarify what the different sets of orthogroups are in the figure and in the text (perhaps adopting a pangenome-like nomenclature).

      Thanks for this comment. This legend, unfortunately, reflected an earlier version of the figure and was overlooked prior to submission. We have since amended this and sincerely apologize for the error on our part.

      p12

      • The whole first paragraph reads more like it should be part of an introduction/discussion.

      We’ve moved some of this paragraph to the discussion but left the background information necessary for the reader to understand why we were looking for homologs of wc and frq.

      p13

      • The last paragraph reads like discussion.

      We have revised this paragraph so it now reads: “Because E. muscae is an obligate insect-pathogen only living inside live flies, we investigate the presence of canonical entomopathogenic enzymes in the genome. We find that E. muscae appear to have an expanded group of acid-trehalases compared to other entomopathogenic and non-entomopathogenic Entomophthorales (Fig. 4A), which correlates with the primary sugar in insect blood (hemolymph) being trehalose (Thompson, 2003). The obligate insectpathogenic lifestyle is also evident when comparing the repertoire of lipases, subtilisin-like serine proteases, trypsins, and chitinases in our focal species versus Zoopagomycota and Ascomycota fungi that are not obligate insect pathogens (Fig. 4B). Sordariomycetes within Ascomycota contains the other major transition to insect-pathogenicity within the kingdom Fungi (Araújo and Hughes, 2016). Based on our comparison of gene numbers, Entomophthorales possess more enzymes suitable for cuticle penetration than Sordariomycetes (Fig. 4B). In contrast, insect-pathogenic fungi within Hypocreales possess a more diverse secondary metabolite biosynthesis machinery as evidenced by the absence of polyketide synthase (PKS) and indole pathways in Entomophthorales (Fig. 4C).”

      p15 and 16

      • This all reads as redundant with the previous protein family domain analysis. I would try to merge them.

      Thank you for this comment, however we have opted to maintain the current structure.

      p18

      • In the first sentence, I'm not sure about what was performed here.

      This has been reworded to clarify.

      p20

      • Regarding the assembly, do I understand it correctly that a nuclear genome can be partially haploid / diploid?

      Thanks for your comment. The genome itself is, of course, some integer multiple of n, but based on BUSCO scores our assembly doesn’t appear to have completely collapsed into a haploid genome. We think it makes more sense here to say “partially haploid” than “partially diploid” so have altered this.

      p21

      • RIP has only been observed in a couple of Ascomycetes. RIP-like genomic signatures (GC bias) have been observed elsewhere.

      Hood et al, 2005 found signatures of RIP in anther-smut fungus and Horns et al, 2012, found evidence of hypermutability across repeat elements within several Pucciniales species.

      p23

      • Interesting that the peptidase A2B domain is found uniquely in E. muscae genome and is associated with Ty3 activity. Does the domain often overlap with annotated Ty3 in E. muscae genome? Or how come the domain is not present in other sister species with large genomes full of Ty3 transposons? Could it relate to a new active transposon in E. muscae specifically?

      Thanks for this comment. The domain-based analysis was only performed on the predicted transcriptome of the genome assembly, which does not include the repeat elements (e.g., Ty3). It could be that this peptidase reflects a new active transposon that’s specific to E. muscae, which would certainly be very interesting. We’ve now included this idea in the discussion.

      p26

      • In the case of fungal genomes, I would not advise masking the assembly for repeated sequences prior to gene annotation (in particular given the current focus on protein family variation).

      Thank you for this comment, however we disagree with this assertion as a typical approach for genome annotation in fungi and eukaryotic genomes is to use soft masking of transposable elements before performing gene prediction to avoid over-prediction. While there could be alternative approaches that compare masked or unmasked. This is a recommended protocol for underlying tools like Augustus (10.1002/cpbi.57) and in general descriptions of genome annotation (10.1002/0471250953.bi0401s52). The false positive rate of genes predicted through TE regions is likely to be more a problem than false negatives of missed genes in our experience. Further it seems appropriate to use consistent approach to annotation throughout when including genomes from other sources (e.g., Joint Genome Institute annotated genomes) which also use a repeat masking approach first before annotation. It seems most appropriate to use consistent methods when generating datasets to be used for comparative analyses. It is outside the scope of this project to reannotate all genomes with and without repeat masking.

      p27

      • Interrupted sentence at "Classification of DNA and LTR .. by similarity The".

      This was an unnecessary partial phrase as the information on classification of elements via RepBase was made a few sentences above this.

      p28

      • Enriched/depleted rather than "significantly different"?

      Thank you for this comment, however we have opted to maintain the current phrasing.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for a careful review of the manuscript and for their comments, which we address below.

      Reviewer #1:

      (1) …the authors could examine division in a population of cells with only one centrosome. Seeing some restoration of mitotic progression in the absence of SAC-dependent delays would suggest that even one centrosome with uninhibited Eg5 is sufficient to negate SAC-dependent delays, and would limit models for what exactly centrosomes contribute.

      We agree that the one-centrosome question (i.e. whether cells with a single centriole, and therefore a single centrosome, have the same SAC dependence) would be interesting to address. It is known that cells with a single centriole generated through centrinone treatment also have elongated mitoses, like cells lacking centrioles (see Chinen, et. al. 2021, compare Fig 2C to Fig 2D), We have tried this experiment in RPE-1 cells with preliminary results confirming that there is a mitotic delay. It is not known whether this delay requires SAC activity, and we hope to address that in future work. In addition, we note that we show in Fig. 4b-c that cells with the normal centrosome number but with a single focus of microtubules due to Eg5 inhibition, were also sensitive to MPS1 inhibition. This suggests that centrosome presence alone cannot overcome the requirement for SAC activity, rather, the centrosomes need to be able to separate in a timely fashion.

      Reviewer #2:

      (1) An example is how to interpret the effect of Aurora B inhibition, which does not block acentrosomal cell division. If Aurora B is required for SAC activity, it suggests this effect of MPS1 may be a function other than SAC. Given the complexity of the SAC, it would be informative to test other SAC components. Instead, the authors conclude that the mitotic delay caused by MPS is required for acentrosomal cell division. I don't think they have ruled out, or even addressed other functions of MPS1.

      We agree that it is possible that functions of the MPS1 kinase other than those involved in the SAC could be important. Although we have not directly tested other SAC components, we did “mimic” SAC activity by delaying anaphase onset using APC/C inhibition while also inhibiting MPS1 (Fig. 2b-b’’). The fact that this restored division suggests that it is the SAC function of MPS1 kinase activity that is relevant to this delay. 

      (2) The authors find that when both the APC and MPS1 are inhibited, the cells eventually divide. These results are intriguing, but hard to interpret. The authors suggest that the failure to divide in MPS1-inhibited cells is because they enter anaphase, and then must back out. This is hard to understand and there is not data supporting some kind of aborted anaphase. Is the division observed with double inhibition some sort of bypass of the block caused by MPS1 inhibition alone? It is not clear why inhibition of APC causes increased cell division when MPS1 is inhibited.

      As described in the response to 1), we believe that reinstating the delay to anaphase onset by APC/C inhibition provided the time needed to establish a functional bipolar spindle even in the absence of the SAC, and that cells eventually overcome the proTAME block and proceed through mitosis, as observed in control cells in our experiments. We note that we chose concentrations of proTAME specifically for each cell line (RPE-1 and U2OS) that would result only in a temporary block, following on the work of Lara-Gonzalez and Taylor (2012), who reported similar findings for HeLa cells.

      (3) The authors characterize MTOC formation in these cells, which is also interesting. MTOCs are established after NEB in acentrosomal cells. Indeed, forming these MTOCs is probably a key mechanism for how these cells complete a division, like mouse oocytes.

      We agree that the observed intermediates of MTOCs are interesting and likely crucial to the mechanism of cell division in acentrosomal somatic cells. We are investigating further the differences and similarities between somatic cell MTOC formation in the absence of centrosomes and the naturally-occurring form of that process in oocytes.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity

      Seleit and colleagues set out to explore the genetics of developmental timing and tissue size by mapping natural genetic variation associated with segmentation clock period and presomitic mesoderm (PSM) size in different species of Medaka fish. They first establish the extent of variation between five different Medaka species of in terms of organismal size, segmentation rate, segment size and presomitic mesoderm size, among other traits. They find that these traits are species-specific but strongly correlated. In a massive undertaking, they then perform developmental QTL mapping for segmentation clock period and PSM size in a set of ~600 F2 fish resulting from the cross of Orizyas sakaizumii (Kaga) and Orizyas latipes (Cab). Correlation between segmentation period and segment size was lost among the F2s, indicating that distinct genetic modules control these traits. Although the researchers fail to identify causal variants driving these traits, they perform proof of concept perturbations by analyzing F0 Crispants in which candidate genes were knocked out. Overall, the study introduces a completely new methodology (QTL mapping) to the field of segmentation and developmental tempo, and therefore provides multiple valuable insights into the forces driving evolution of these traits.

      Major comments: - The first sentence in the abstract reads "How the timing of development is linked to organismal size is a longstanding question". It is therefore disappointing that organismal size is not reported for the F2 hybrids. Was larval length measured in the F2s? If so, it should be reported. It is critical to understand whether the correlation between larval size and segmentation clock period is preserved in F2s or not, therefore determining if they represent a single or separate developmental modules. If larval length data were not collected, the authors need to be more careful with their wording.

      The question the reviewer raises here is indeed a very relevant one, and a question that we also were curious about ourselves. While it was not possible (logistically) to grow the 600 F2 fish to adulthood, we did measure larval length in a subset of F2 hatchling (n=72) to ask precisely the question the reviewer raises here. Our results (new Supplementary Figure 5) show that the correlation between larval length and segmentation timing (which we report across the Oryzias species) is absent in the F2s. This indeed argues that the traits represent separate developmental modules.

      In the current version of the paper, organismal size is often incorrectly equated to tissue size (e.g. PSM size, segment size). For example, in page 3 lines 33-34, the authors state that faster segmentation occurred in embryos of smaller size (Fig. 1D). However, Fig. 1D shows correlation between segmentation rate and unsegmented PSM area. The appropriate data to show would be segmentation rate vs. larval or adult length.

      The reviewer is correct. We have now linked the data more clearly to data we show in Supplementary Figure 1, which shows that adult length and adult mass are strongly correlated (S1A) and that adult mass is in turn strongly correlated with segmentation rate in the different Oryzias species (S1B). Additionally main Figure 1B shows that larval length is correlated with PSM length. We have corrected the main text to reflect these relationships more clearly.

      • Is my understanding correct in that the her7-venus reporter is carried by the Cab F0 but not the Kaga F0? Presumably only F2s which carried the reporter were selected for phenotyping. I would expect the location of the reporter in the genome to be obvious in Figure 3J as a region that is only Cab or het but never Kaga. Can the authors please point to the location of the reporter?

      The reviewer is correct. Indeed the location of our her7-venus KI is on chromosome 16 and the recombination patterns on this chromosome overwhelmingly show either Hom Cab (green) or Het Cab/Kaga (Black). This is expected as we selected fish carrying the her7-venus KI for phenotyping.

      • devQTL mapping in this study seems like a wasted opportunity. The authors perform mapping only to then hand pick their targets based on GO annotations. This biases the study towards genes known to be involved in PSM development, when part of the appeal of QTL mapping is precisely its unbiased nature and the potential to discover new functionally relevant genes. The authors need to better justify their rationale for candidate prioritization from devQTL peaks. The GO analysis should be shown as supplemental data. What criteria were used to select genes based on GO annotations?

      We have now commented on these valid points and outlined our rationale in more detail in the text (page 4, lines 20-30). Our rationale now also includes selection of differentially expressed genes (n=5 genes) that fall within segmentation timing devQTL hits (for more details see below). Essentially, while we indeed finally focused on the proof of principle using known genes, these genes were previously not known to play a role in either setting the timing of segmentation or controlling the size of the PSM. Hence, we do think our strategy demonstrates the "the potential to discover new functionally relevant genes", even though the genes themselves had been involved overall in somitogenesis. We added the GO analysis as supplemental data as requested (new Supplementary Figure 7E).

      • Analysis of the predicted functional consequence of divergent SNPs (Fig. S6B, F) is superficial. Among missense variants, which genes harbor the most deleterious mutations? Which missense variants are located in highly conserved residues? Which genes carry variants in splice donors/acceptors? Carefully assessing the predicted effect of SNPs in coding regions would provide an alternative, less biased approach to prioritize candidate genes.

      We now included our analysis of SNPs based on the Variant effect predictor (VEP) tool from ensembl. This analysis does rank the predicted severity of the SNP on protein structure and function (Impact: low, moderate, high) and does annotate which variants can affect splice donors/acceptors. The VEP analysis for both phenotypes is now added to the manuscript as supplemental data (new Supplementary Data S2, S5).

      • Another potential way to prioritize candidate genes within devQTL peaks would be to use the RNA seq data. The authors should perform differential expression analysis between Kaga and Cab RNA-seq datasets. Do any of the differentially expressed genes fall within the devQTL peaks?

      As suggested we have performed this additional experiment and report the RNAseq differential analysis in new Supplement Figure 7C-D. The analysis revealed 2606 differentially expressed genes in the PSM between Kaga and Cab, five of which were candidate genes from the devQTL analysis. We now tested all of these (5 in total, 4 new and 1 previously targeted adgrg1) for segmentation timing by CRISPR/Cas9 KO in the her7-venus background, none of which showed a timing phenotype (new Supplementary Figure 7F-F'). We provide the complete set of results in new Supplementary Figure 7 , Supplementary Data file 3 (DE-genes), all data were deposited on publicly available repository Biostudies under accession number: E-MTAB-13927.

      • The use of crispants to functionally test candidate genes is inappropriate. Crispants do not mimic the effect of divergent SNPs and therefore completely fail to prove causality. While it is completely understandable that Medaka fish are not amenable to the creation of multiple knock-in lines where divergent SNPs are interconverted between species, better justification is needed. For instance, is there enough data to suggest that the divergent alleles for the candidate genes tested are loss of function? Why was a knockout approach chosen as opposed to overexpression?

      We agree with the reviewer that we do not address the causality of SNPs with the CRISPR/Cas9 KO approach we followed. And medaka does offer the genome editing capabilities to create tailored sequence modifications. So in principle, this can be done. In practice, however, we reasoned that any given SNP will contribute only partially to the observed phenotypes and combinatorial sequence edits are simply very laborious given the current state of the art in genome editing technologies. We therefore opted for an alternative proof of principle approach that aims to "to discover new functionally relevant genes", not SNPs.

      -Along the same line, now that two candidate genes have been shown to modulate the clock period in crispants (mespb and pcdh10b), the authors should at least attempt to knock in the respective divergent SNPs for one of the genes. This is of course optional because it would imply several months of work, but it would significantly increase the impact of the study.

      As above, this is in principle the correct rationale to follow though very time, cost and labour intensive. It is for the later practical consideration that we decided not to follow this option.

      Minor Comments - It would be highly beneficial to describe the ecological differences between the two Medaka species. For example, do the northern O. sakaizumii inhabit a colder climate than the southern O. latipes? Is food more abundant or easily accessible for one species compared to the other? What, if anything, has been described about each species' ecology?

      There are indeed differences in the ecology of both species, with the northern O.sakaizumii inhabiting a colder climate than the southern O. latipes. In addition, it is known that the breeding season is shorter in the north than the south, and also there is the fact that northern species have been shown to have a faster juvenile growth rate than southern species. While it would be premature to link those ecological factors to the timing differences we observe, we can certainly speculate. A line to this effect has been added to the main text (Page 5, line 28-30).

      • The authors describe two different methods for quantifying segmentation clock period (mean vs. intercept). It is still unclear what is the difference between Figs. 3A (clock period), S4A (mean period) and S4B (intercept period). Is clock period just mean period? Are the data then shown twice? How do Fig. 3A and S4A differ?

      The clock period shown in all the main figures is the intercept period, which was also used for the devQTL analysis. Both measurements (mean and intercept) are indeed highly correlated and we include both in supplement for completeness.

      • devQTL as shorthand for developmental QTL should be defined in page 4 line 1 (where the term first appears), not later in line 12 of the same page.

      Noted and corrected, we thank the reviewer for spotting this error.

      • Python code for period quantification should be uploaded to Github and shared with reviewers.

      All period quantification code that was used in this study was obtained from the publicly available tool Pyboat (https://www.biorxiv.org/content/10.1101/2020.04.29.067744v3). All code that is used in PyBoat is available from the Github page of the creator of the tool (https://github.com/tensionhead/pyBOAT). Both are linked in the references and materials and methods sections.

      • RNA-seq data should be uploaded to a publicly accessible repository and the reviewer token shared with reviewers.

      We have uploaded all RNA-sequencing Data to public repository BioStudies under accession numbers : E-MTAB-13927, E-MTAB-13928. This information is now also added to material and methods in the manuscript text.

      Why are the maintenance (27-28C) vs. imaging (30C) temperatures different?

      Medaka fish have a wide range of temperatures they can physiologically tolerate, i.e. 17-33. The temperature 30C was chosen for practical reasons, i.e. a slightly faster developmental rate enables higher sample throughput in overnight real-time imaging experiments.

      • For Crispants, control injections should have included a non-targeting sgRNA control instead of simply omitting the sgRNA.

      We agree a non-targeting sgRNA control can be included, though we choose a different approach. For clarity, we now also include a control targeting Oca2, a gene involved in the pigmentation of the eye to probe for any injection related effect on timing and PSM size. As expected, 3 sgRNAs + Cas9 against Oca2 had no impact on timing or PSM size. This data is now shown in new Supplementary Figure 9 F-G'.

      It is difficult to keep track of the species and strains. It would be most helpful if Fig. S1 appeared instead in main figure 1.

      We agree and included an overview of the phylogenetic relationship of all species and their geographical locales in new Figure 1 A-B.

      Significance

      • The study introduces a new way of thinking about segmentation timing and size scaling by considering natural variation in the context of selection. This new framing will have an important impact on the field.
      • Perhaps the most significant finding is that the correlation between segment timing and size in wild populations is driven not by developmental constraints but rather selection pressure, whereas segment size scaling does form a single developmental module. This finding should be of interest to a broad audience and will influence how researchers in the field approach future studies.
      • It would be helpful to add to the conclusion the author's opinion on whether segmentation timing is a quantitative trait based on the number of QTL peaks identified.
      • The authors should be careful not to assign any causality to the candidate genes that they test in crispants.
      • The data and results are generally well-presented, and the research is highly rigorous.
      • Please note I do have the expertise to evaluate the statistical/bioinformatic methods used for devQTL mapping.

      Reviewer #2

      Evidence, reproducibility and clarity

      Seleit et al. investigate the correlation between segment size, presomitic mesoderm and the rhythm of periodic oscilations in the segmentation clock of developing medaka fish. Specifically, they aim to identify the genetic determinants for said traits. To do so, they employ a common garden approach and measure such traits in separate strains (F0) and in interbreedings across two generations (F1 and F2). They find that whereas presomitic mesoderm and segment size are genetically coupled, the tempo of her7 oscilations it is not. Genetic mapping of the F0 and F2 progeny allows them to identify regions associated to said traits. They go on an perturb 7 loci associated to the segmentation clock and X related to segment size. They show that 2/7 have a tempo defect, and 2/ affect size.

      Major comments: The conclusions are convincing and well supported by the data. I think the work could be published as is in its current state, and no additional experiments that I can think of are needed to support the claims in the paper.

      Minor comments: - The authors could provide a more detailed characterization of the identified SNPs associated to the clock and to PSM size. For the segmentation clock, the authors identify 46872 SNPs, most of which correspond to non-coding regions and are associated to 57 genes. They narrow down their approach to those expressed in the PSM of Cab Kaga. Was the RNA selected from F1 hybrids? I wonder if this would impact the analysis for tempo and or size in any way, as F2 are derived from these, and they show broader variability in the clock period than the F0 and F1 fishes.

      The RNA was obtained from the pure F0 strains and we have now extended this analysis by deep bulk-RNA sequencing and differential gene expression analysis. As indicated also to reviewer 1, this revealed 2606 differentially expressed genes in the unsegmented tails of Kaga and Cab embryos, some of which occurred in devQTL peaks. Based on this information we expanded our list of CRISPR/Cas9 KOs by targeting all differentially expressed genes (5 in total, 4 new and 1 previously targeted) for segmentation timing, none of which showed a timing phenotype (new Supplementary figure 7C-D). We provide the complete set of results in new Supplementary Figure 7, Supplementary Data file 3 (DE-genes). All data were deposited on publicly available repository Biostudies under accession number: E-MTAB-13927.

      It would be good if the authors could discuss if there were any associated categories or overall functional relationships between the SNPs/genes associated to size. And what about in the case of timing?

      In the case of PSM size there were no clear GO terms or functional relationships between the genes that passed the significance threshold on chromosome 3.

      For the 35 genes related to segmentation timing, there were a number of GO enrichment terms directly related to somitogenesis. We have included the GO analysis in the new Supplementary Figure 7E.

      • Have any of the candidate genes or regulatory loci been associated to clock defects (57) or segment size (204) previously in the literature?

      To the best of our knowledge none of the genes have been associated with clock or PSM size defects so far. It might be worthwhile using our results to probe their function in other systems enabling higher throughput functional analysis, such as newly developed organoid models.

      • When the authors narrow down the candidate list, it is not clear if the genes selected as expressed in the PSM are tissue specific. If they are, I wonder if genes with ubiquitous expression would be more informative to investigate tempo of development more broadly. It would be good if the authors could specifically discuss this point in the manuscript.

      We have not addressed the spatial expression pattern of the 35 identified PSM genes in this study, so we cannot speculate further. But the reviewer raises an important point, how timing of individual processes (body axis segmentation) are linked at organismal scale is indeed a fundamental, additional, question that will be addressed in future studies, indeed the in-vivo context we follow here would be ideal for such investigations.

      Can the authors speculate mechanistically why mespb or pchd10b accelerates the period of her7 oscillations?

      While we do not have a mechanistic explanation yet, an additional experiment we performed, i.e. bulk-RNAsequencing on WT and mespb mutant tails, provided additional insight, we now added this data to the manuscript . This analysis revealed 808 differentially expressed genes between wt and mespb mutants. Interestingly, many of these affected genes are known to be expressed outside of the mespb domain, i.e. in the most posterior PSM (i.e. tbxt, foxb1,msgn1, axin2, fgf8, amongst others). This indicates that the effect of mespb downregulation is widespread and possibly occurs at an earlier developmental stage. This requires more follow up studies. This data is now shown in new Supplementary figure 9A, Supplementary Data file S4. We now comment on this point in the revised manuscript.

      • Are there any size difference associated to the functionally validated clock mutants?

      We addressed this point directly and added this analysis as supplementary Figure 9H-H'. While pcdh10b mutants do not show any detectable difference in PSM size, we find a small, statistically significant reduction in PSM size (area but not length) in mespb mutants. All this data is now included in the revised manuscript.

      -Ref 27 shows a lack of correlation between body size and the segmentation period in various species of mammals. The work supports their findings, and it would be good to see this discussed in the text.

      We are not certain how best to compare our in-vivo results in externally developing fish embryos to in-vitro mammalian 2-D cell cultures. In our view, the correlation of embryo size, larval and adult size that we find in Oryzias might not necessarily hold in mammalian species, which would make a comparison more difficult. We do cite the work mentioned so the reader is pointed towards this interesting, complementary literature.

      Significance

      The work is quite remarkable in terms of the multigenerational genetic analysis performed. The authors have analysed >600 embryos from three separate generations to obtain quantitative data to answer their question (herculean task!). Moreover, they have associated this characterization to specific SNPs. Then, to go beyond the association, they have generated mutant lines and identified specific genes associated to the traits they set out to decipher.

      To my knowledge, this is the first project that aims to identify the genetic determinants for developmental timing. Recent work on developmental timing in mammals has focused on interspecies comparisons and does not provide genetic evidence or insight into how tempo is regulated in the genome. As for vertebrates, recent work from zebrafish has profiled temperature effects on cell proportions and developmental timing. However, the genetic approach of this work is quite elegant and neat.

      Conceptually, it is quite important and unexpected that overall size and tempo are not related. Body size, lifespan, basal metabolic rates and gestational period correlate positively and we tend to think that mechanistically they would all be connected to one another. This paper and Lazaro et al. 2023 (ref 27) are one of the first in which this preconception is challenged in a very methodical and conclusive manner. I believe the work is a breakthrough for the field and this work would be interesting for the field of biological timing, for the segmentation clock community and more broadly for all developmental biologists.

      My field is quantitative stem cell biology and I work on developmental timing myself, so I acknowledge that I am biased in the enthusiasm for the work. It should be noted that as an expert on the field, I have identified instances where other work hasn't been as insightful or well developed in comparison to this piece. It is also worth noting that I am not an expert in fish development, phylogenetic studies or GWAS analyses, so I am not capable to asses any pitfalls in that respect.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __

      __Summary: __

      This manuscript explores the temporal and spatial regulation of vertebrate body axis development and patterning. In the early stages of vertebrate embryo development, the axial mesoderm (presomitic mesoderm - PSM) undergoes segmentation, forming structures known as somites. The exact genetic regulation governing somite and PSM size, and their relationship to the periodicity of somite formation remains unclear.

      To address this, the authors used two evolutionarily closely related Medaka species, Oryzias sakaizumii and Oryzias latipes, which, although having distinct characteristics, can produce viable offspring. Through analysis spanning parental (generation F0) and offspring (generations F1 and F2) generations, the authors observed a correlation between PSM and somite size. However, they found that size scaling does not correlate with the timing of somitogenesis.

      Furthermore, employing developmental quantitative trait loci (devQTL) mapping, the authors identified several new candidate loci that may play a role during somitogenesis, influencing timing of segment formation or segment size. The significance of these loci was confirmed through an innovative CRISPR-Cas9 gene editing approach.

      This study highlights that the spatial and temporal aspects of vertebrate segmentation are independently controlled by distinct genetic modular mechanisms.

      __Major comments: __

      1) In the main text page 3, lines 11 and 12, the authors state that the periodicity of the embryo clock of the F1 generation is the intermediate between the parental F0 lineages. However, the authors look only at the periodicity of the Cab strain (Oryzias latipes) segmentation clock. The authors should have a reporter fish line for the Kaga strain (Oryzias sakaizumii) to compare the segmentation clock of both parental strains and their offspring. Since it could be time consuming and laborious, I advise to alternatively rephrase the text of the manuscript.

      We agree a careful distinction between segment forming rate (measured based on morphology) and clock period (measured using the novel reporter we generated) is essential. We show that both measures correlate very well in Cab, in both F0 and F1 and F2 carrying the Cab allele. For Kaga F0, we indeed can only provide the rate of somite formation, which nevertheless allows comparison due to the strong correlation to the clock period we have found. We have rephrased the text accordingly.

      2) It is evident that only a few F0 and F1 animals were analyzed in comparison with the F2 generation. Could the authors kindly explain whether and how this could bias or skew the observed results?

      We provide statistical evidence through the F-test of equality that the variances between the F0, F1 and F2 samples are equal. Additionally if we sub-sample and separate the F2 data into groups of 100 embryos (instead of all 638) we get the same distribution of the F2s. We therefore believe that this is sufficient evidence against a bias or skew in the results.

      3) It would be interesting to create fish lines with the validated CRISPR-Cas9 gene manipulations in different genetic contexts (Cab or Kaga) to analyze the true impact on the segmentation clock and/or PSM & somite sizes.

      We agree with the reviewer this would in principle be of interest indeed, please see our response to reviewer 1 earlier.

      4) Please add the results of the Go Analysis as supplementary material.

      We have added the GO analysis in new Supplementary Figure 7E.

      __Minor comments: __

      1) In the main text, page 2, line 29, Supplementary Figure 1D should be referenced.

      We have added a clearer phylogeny and geographical location of the different species in new Figure 1 A-B. And reference it at the requested location.

      2) In the main text, page 2, line 32, the authors refer to Figure 1B, but it should be 1C.

      We have corrected the information.

      3) Regarding the topic "Correlation of segmentation timing and size in the Oryzias genus" the authors should also give information on the total time of development of the different Oryzias species, as well as the total number of formed somites.

      We follow this recommendation and have added this information in new Supplementary Figure 5. We also now include segment number measured in F2 embryos. We indeed view segmentation rate as a proxy for developmental rate, which however needs to be distinguished from total developmental time. The latter can be measured for instance by quantifying hatching time, which we did. These measurements show that Kaga, Cab and O.hubbsi embryos kept at constant 28 degrees started hatching on the same day while O.minutillus and O.mekongensis embryos started hatching one day earlier. We have not included this data in the manuscript because we think a distinction should be made between rate of development and total development time.

      4) In Figures 3A and B, please add info on the F1 lines for comparison.

      The information on F1 lines is provided in Supplementary Figure 3

      5) Supplementary Figures 2F shows that the generation F1 PSM is similar to Cab F0, and not an intermediate between Kaga F0 and Cab F0. This is interesting and should be discussed.

      We show that the F1 PSM is indeed closer to the PSM of Cab than it is to the Kaga PSM. This is indeed intriguing and we have now commented on this point directly in the text.

      6) Supplementary Figures 6C to H are not mentioned either in the main text or in the extended information. Please add/mention accordingly.

      We have added references to both in the text

      7) The order of Supplementary Figure 8 E to H and A to D appears to be not correct and not following the flow of the text. Please update/correct accordingly.

      We have updated the text accordingly.

      8) The authors should choose between "Fig.", "Fig", "fig.", "fig" or "Figure". All 'variants' can be found in the text.

      Noted, and updated. Fig. is used for main figures and fig. is used for supplementary figures.

      9) The color scheme of several figures (graphs with colored dots) should be revised. Several appear to be difficult to discern and analyze.

      We have enhanced the colours and increased the font on the figure panels. The colour panel was chosen to be colour-blind friendly.

      10) Please address/discuss following questions: What are the known somitogenesis regulating genes in Medaka? How do they correlate with the new candidates?

      The candidates we found and tested had not been implicated in regulating the tempo of segmentation or PSM size, while for some a role in somite formation had been previously established, hence the enrichment in GO analysis Somitogenesis.

      Reviewer #3 (Significance (Required)):

      General assessment:

      This interesting manuscript describes a novel approach to study and find new players relevant to the regulation of vertebrate segmentation. By employing this innovative methodology, the authors could elegantly demonstrate that the segmentation clock periodicity is independent from the sizes of the PSM and forming somites. The authors were further able to find new genes that may be involved in the regulation of the segmentation clock periodicity and/or the size of the PSM & somites. A limitation of this study is the fact that the results mainly rely on differences between the two species. The integration of additional Medaka species would be beneficial and may help uncover relevant genes and genetic contexts.

      Advance:

      To my best knowledge this is the first time that such a methodology was employed to study the segmentation clock and axial development. Although the topic has been extensively studied in several model organisms, such as mice, chicken, and zebrafish, none of them correlated the size of the embryonic tissues and the periodicity of the embryo clock. This study brings novel technological and functional advances to the study of vertebrate axial development.

      Audience:

      This work is particularly interesting to basic researchers, especially in the field of developmental biology and represents a fresh new approach to study a core developmental process. This study further opens the exciting possibility of using a similar methodology to investigate other aspects of vertebrate development. It is a timely and important manuscript which could be of interest to a wider scientific audience and readership.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary:* In this paper the authors explore the function of Syndecan in Drosophila stem cells focussing primarily on the intestinal stem cells. They use RNAi knockdown to conclude that Syndecan is required for long term stem cell maintenance as its knockdown results in apoptosis. They suggest that this effect is independent of LINC complex proteins but is associated with changes to nuclear morphology and DNA damage. They go on to show that a similar impact on nuclear shape can be seen in larval neuroblasts but not in stem cells of the female germline. *

      Major Comments: *The key conclusion that underpins the paper is that reduced Syndecan causes loss of stem cells. This is based entirely on evidence from cell-type specific RNAi using 3 independent RNAi lines. Overexpression has no phenotype and there is no analysis of loss of function mutants. SdcRNAi3 gives strong phenotypes that are statistically significant and is used throughout the paper. SdcRNAi2 gives comparatively moderate phenotypes which trend in the same direction but it is not clear if these are statistically significant (Fig S1). SdcRNAi line 1 appears to have very little effect (and if anything trends in the opposite direction in S1A). In addition, the knockdown efficiency of the three lines has not been assessed. Another possible concern given the dependence on RNAi3 is that the RNAi control line used is not an ideal match for the VDRC GD RNAi lines as it is in a different genetic background. In order to robustly draw conclusions: the phenotypes with RNAi lines 1 and 2 should be tested for significance; the extent of knockdown in each should be quantified either by qPCR in whole tissue knockdown, or by staining for protein levels if possible, to assess whether the variation in phenotypes is due to different knockdown levels. The use of a loss of function mutant in clones or tissue specific CRISPR-Cas9 KO or KD would also significantly increase confidence in the findings. *

      • Our qPCR data indicate that SdcRNAi3 produces the most efficient knockdown, whilst SdcRNAi1 generates the weakest knockdown. The new manuscript version will incorporate this data in figure S1. Knockdown efficacy of SdcRNAi 3 has also been previously reported (Eveland et al., 2016).

      • We apologise for omitting to add the statistical tests on phenotypic categories in figure S1A, this will be revised. We confirm that all Sdc RNAi phenotypic distributions are significantly different to that seen for age-matched controls (p- It should also be noted that despite weaker knockdowns with SdcRNAi1 and 2, we still observed statistically significant ISC depletion after 28 days of RNAi expression - we will add this data in figure S1. Overall, we are confident about Sdc’s role in maintaining intestinal stem cells.

      *Similarly, the evidence for a lack of LINC protein role in the phenotype relies on single RNAi lines without validation of knockdowns. The authors should ideally validate these lines in this system or reference other studies that have validated the lines in this or other contexts. *

      • The klarsicht RNAi line (BDSC 36721) and klaroid RNAi line (BDSC 40924) used in this study have been validated and used in other studies. (Falo-Sanjuan & Bray, 2022; Collins et al., 2017)

      • For Msp300 RNAi knockdown we have used two independent RNAi lines which gave similar results. We will amend the text to clarify these points. In addition, the line reported in the manuscript was previously validated (Dondi et al., 2021; Frost et al., 2016).

      Minor Comments: *The figures are generally very clear but some of the IF image panels are very small and require significant on-screen enlargement to be legible. In particular in Figure 1B the cross section views make it difficult to assess expression in the different cell types (and don't show very many cells), could this be shown in wholemount or as separated channels in a supplementary figure? In addition, it would strengthen the argument to include counterstains for markers of the different cell types (particularly to distinguish ISC/EB from EE). This could include esg-lacZ to mark ISC/EBs or prospero for EEs. However, if a broader view of these panels makes it clearer that all epithelial cells are expressing Syndecan this may not be essential. *

      • We are happy to incorporate larger fields of view, and co-immunostaining with different cell type markers.

      *Syndecan is referred to throughout as a stem cell regulator. This implies that in certain contexts or in response to certain stimuli its expression may be altered to elicit a stem cell response but no examples of this are shown. Moreover, only knockdown and not overexpression gives phenotypes suggesting its role may be as a required protein than a regulator. Either examples of its expression being modulated in homeostasis or in response to a challenge could be included or the wording could be amended. *

      • We agree with the reviewer and will amend the wording.

      *Expression of Syndecan in neuroblasts is described as data not shown, it would be better to include this for completeness. *

      • We will add this data in figure 4.

      *In addition to the intestinal validation of the Syndecan RNAi lines, validation of knockdown in the germline would be valuable to support the conclusions of Fig S4 given differences of knockdown in the germline with some RNAi lines (although inclusion of Dicer in the driver line should have overcome this). *

      • Sdc expression is very low in the germline, compared to the surrounding somatic cells, therefore we are not confident that we can detect differences in expression level after knockdown. We suggest adding a panel in figure S4 to show the low expression and adding a comment in the text. Reviewer #1 (Significance (Required)): *The study describes a potentially very interesting, novel link between Syndecan, nuclear shape and apoptosis in cycling cells that could have broad relevance. If fully validated this could have implications for other stem cell populations, including those in mammals and disease relevance in the context of cancer. The paper is fundamentally descriptive in nature and so the level of significance hinges on the strength of evidence and how interesting the phenotype itself is. At this stage the audience will be primarily in the areas of fundamental research in biology of the nucleus and cytoskeleton. Defining the mechanistic link between Syndecan and nuclear morphology will be a critical next step and while not essential for this study would significantly increase the likely interest in the paper. *

      • We thank the reviewer for these constructive comments. We agree that discovering the mechanistic links between Syndecan and nuclear morphology in future studies, in this and other model systems, will be relevant to many areas of biological research.

      *In terms of significance in stem cell biology the distinction between a regulator and a requirement to prevent stem cell apoptosis is important and the lack of evidence for a context in which Syndecan plays a regulatory role somewhat detracts from the breadth of impact. My field of expertise is in epithelial stem cell biology. *

      • We agree and will amend our wording.

      Reviewer #2 *(Evidence, reproducibility and clarity (Required)): ** Summary: Stem cell (SC) maintenance and proliferation are necessary for tissue morphogenesis and homeostasis. The basement membrane (BM) has been shown to play a key role in regulating stem cell behavior. In this work, the authors unravel a new connection between the receptor for BM components Syndecan (Sdc) and SC behavior, using Drosophila as model system. They show that Sdc is required for intestine stem cell (ISC) maintenance, as Sdc depletion results in their progressive loss. At a cellular level, they also find that Sdc depletion in ISCs affects cell survival, cell and nuclear shape, nuclear lamina and DNA damage. In addition, they show that the defects in shape are not related to cell death. They also find that Sdc depletion in neural stem cells also results in nuclear envelope remodeling during cell division. This is in contrast to what happens in female germline stem cells where Sdc does not seem to be required for their survival or maintenance. In general, I believe that this work unravels a connection between Sdc and stem cell behavior. However, I think the study is still at a preliminary stage, as how Sdc regulates different facets of stem cell behavior remains unclear.

      Major comments: 1. To clearly show that the cellular changes produced by loss of Sdc are not due to cell death, one should quantify the ISC area and shape of Sdc-depleted ISCs expressing DIAP1 and compare it to that of Sdc-depleted ISCs. As DIAP1 overexpression only partially rescues ISC loss due to Sdc depletion, one should show that the Sdc-depleted ISCs expressing DIAP1 that still show cellular changes are not dying, as overexpression of Diap1 might not be sufficient to completely rescue cell death in all Sdc-depleted ISCs. In fact, apoptosis in Sdc depleted guts and the ability of Diap1 overexpression to rescue cell death should be analyzed using markers of caspase activity, this will provide a better idea of the contribution of apoptosis to the phenotypes associated to Sdc depletion. *

      • We can, as suggested by the reviewer, quantify the area and shape of Sdc-depleted ISCs expressing DIAP1 and compare it to that of Sdc-depleted ISCs. However, our immunostainings with anti-Caspase 3 or Drice do not pick up apoptotic cells in the fly gut. This is not entirely unexpected, as apoptosis is unfortunately not easily detected in this tissue. In the absence of a positive readout of apoptosis, we will not be able to discriminate between apoptotic and non-apoptotic stem cells when quantifying area and shape and will only have global quantifications.

      • The authors show that ISC loss is associated with reduced cell density, suggesting that this is most likely due to failure in new cell production. What do they mean with cell production? Is this related to a problem in regulating cell division or to the fact that as some ISCs are lost by apoptosis there is progressively less ISCs or to a combination of both? I think that cell division should be monitored throughout time as well as cell death in ISCs.*

      • Based on esgF/O experiments (fig. 1D-F and S1C) where we can trace the production of new cells with GFP, we know that Sdc RNAi expression (i) impairs the appearance of newly differentiated cells in the tissue and (ii) results in the disappearance of progenitor cells (fig. S1C). Supporting these points, (i) we have observed PH3+ mitotic stem cells upon Sdc RNAi, so we are confident the cells are able to initiate cell division (see also fig. 2G), and (ii) we have occasionally noted in fixed samples stem cells looking like they were in the process of delaminating. Overall, the failure of cell production is likely related to problems with both completion of cell division and progressive stem cell loss. High resolution live imaging will in future give us a better insight into stem cell division dynamics/behaviour, however, the technical improvements required are beyond the scope of this project. In the meantime, we propose to clarify our statement in the text.

      • The authors report that in contrast to what happens when Sdc is eliminated from ISCs, its elimination from EEs results in an increase in the number of these cells. An explanation for this result is missing.*

      • Based on known roles of Syndecan in other Drosophila tissues (Johnson et al., 2004; Steigemann et al., 2004; Chanana et al., 2009; Schulz et al., 2011), we speculate that Syndecan may contribute to robo/slit signalling, which is an important regulator of EE activity in the Drosophila gut (Biteau & Jasper 2014; Zeng et al., 2015). We propose to amend the text to express this hypothesis.

      • The authors suggest that "Sdc function is unlikely to be fully accounted for by individual LINC complex proteins, although these proteins might act redundantly". Checking redundancy seems a straight forward experiment, which only requires the simultaneous expression of RNAis against several of these proteins. This would help to settle the implication of LINC complex proteins on Sdc function.*

      • To check redundancy, we propose to combine Klaroid RNAi with Msp300 or Klarsicht RNAis, and express two RNAis at a time in ISCs. We will then measure stem cell proportions and the proportion of ISCs with DNA damage.

      • Although quantification of DNA damage, by immunolabelling with gH2Av, reveals that knockdown of individual LINC complex components did not recapitulate the damage observed upon Sdc depletion (Fig.3G), the image shown in Fig.3F reflects much higher levels of gH2Av in Msp300 RNAi cells compared to Sdc RNAi cells. Authors should clarify this. *

      • Like the reviewer, we are intrigued by the higher levels of H2Av staining in the tissue, despite Msp300 knockdown in stem cells only (fig. 3F). It is worth noting that we observed this with two independent RNAi lines (we showed only one RNAi in the manuscript, but we will amend the text to indicate this). In fig. 3F, we will indicate with an arrow the only ISC that is H2Av positive, and mention in the text that the majority of DNA damage signal observed in the Msp300 RNAi condition is in enterocytes, not ISCs. We currently do not have an explanation for why loss of Msp300 in ISCs should cause DNA damage in neighboring cells.

      *In addition, the consequences of the simultaneous elimination of more than one component of the LINC complex on DNA damage should be analyzed. *

      • We agree, and as we check for redundancy (as in point 4), we will also immunostain the tissues for H2Av.

      • The authors claim that the fact that "DNA damage was found more frequently in Sdc-depleted ISCs with lamina invaginations compared to those without (Figure 3H), supports a model whereby the development of nuclear lamina invaginations precedes the acquisition of DNA damage". However, to me, these results show that there is a relation between these two phenotypes, but not that one precedes the other. In order to show which one is the possible cause and which the consequence, the authors should perform a time course of the appearance of each of these phenotypes.*

      • We agree with the reviewer that we should rephrase our statement to indicate a relationship between lamina invaginations and DNA damage, rather than a causality (as stated in fig. 3H).

      (In terms of performing a time course analysis, the difficulty is that after 3 days of Sdc RNAi expression, the apparent DNA damage (fig. 3G) corresponds to a very small proportion of stem cells, meaning that an exceptionally large sample size would be required to achieve robust statistical analysis.)

      • When studying the role of Sdc in neural stem cells, the authors show that elimination of Sdc in neuroblasts also affect nuclear envelope and shape. Furthermore, in this case, they also show that Sdc elimination affects cell division. To look for a more conserved role of Sdc in stem cell behavior, I believe the authors should also analyze whether Sdc elimination in neural stem cells results in an increase in DNA damage, as it is the case in ISCs.*

      • We will stain larval brains for H2Av to see if DNA damage is also observed following Sdc knockdown in neuroblasts.

      • When analyzing a possible role of Sdc in fGSCs, quantification of germline stem cells and gH2Av levels in control nosGal4 and nos>Sdc RNAi germaria should be done. In addition, it is not clear to me whether Sdc is in fact expressed in fGSCs.*

      • *

      • As mentioned in comments to reviewer 1, we will add a panel in figure S4 to show the low Sdc expression in fGSCs. We will also clarify in the text that we do not see any H2Av staining in the fGSCs (thus, there is nothing to quantify in this case).

      * The authors should show presence of Sdc in neuroblasts.*

      • Yes, we agree, as also mentioned in comments to reviewer 1.

      Reviewer #2 (Significance (Required)): *In general, although this work reveals that elimination of Sdc affects different aspects of intestinal and neural stem cell behavior, including cell survival, cell production, nuclear shape, nuclear lamina or DNA damage, their contribution to stem cell loss and interactions between them have not been analyzed in detail. The role of the basement membrane in stem cell behavior has been extensively studied. In particular, the role of syndecan in stem cell regulation has been primarily confined to cancer, muscle, neural and hematopoietic stem cells. Thus, the study here presented could extend the role of Sdc to intestinal stem cells and could potentially reveals a conserved role for Sdc in neural stem cell behavior. However, the problem with the data mentioned above, hinders the assessment of the significance of this work. *

      • We thank the reviewer for their assessment and are glad that they also find that our study provides novel connections between Syndecan and the regulation of intestinal and neural stem cell behaviors. To strengthen our conclusions, we will include additional experiments or amend the text, as indicated above.

      Reviewer #3* (Evidence, reproducibility and clarity (Required)): ** Peer-review: The transmembrane protein Syndecan regulates stem cell nuclear properties and cell maintenance.

      In this work, the authors investigate the role of the transmembrane protein Syndecan (Sdc) in nuclear organisation and stem cell maintenance. Theys show that Sdc knockdown in intestinal stem cells (ISCs) results in a reduction of the ISC pool as well as of their progeny. They hypothesise that these ISCs might get eliminated via cell death, however, expression of the apoptotic inhibitor DIAP1 only rescued ISC loss by 50%. Hence, they suggest that apoptosis can not account for the total decrease in ISCs observed upon Sdc loss. ISCs depleted from Sdc exhibited abnormal cytoplasmic and nuclear morphologies. As Sdc has previously been implicated in the abscission machinery in mammalian cultured cells, they tested if Sdc could be playing a similar role in the abscission of ISCs. However, ISCs were capable of undergoing cytokinesis. Next, they tested if Sdc depletion could be altering the linkage between the plasma membrane and the nucleus mediated by the Linker of Nucleoskeleton and Cytoskeleton (LINC) complex. However, individual knockdowns of the different components of the complex did not disrupt the nuclear morphology to the same extent as Sdc knockdown, suggesting that Sdc function may be independent of the LINC complex. Finally, they observed that Sdc-depleted ISCs exhibited DNA damage, suggesting that Sdc may play a role in DNA protection. The authors next tested if Sdc played similar roles in other stem cell types such as the female germline stem cells (fGSCs) and larval neural stem cells (NSCs). While Sdc depletion appeared dispensable for fGSC maintenance, it prolonged NSC divisions and altered the nuclear morphology of NSCs. Upon further investigations, they observed that the NSC's nuclear envelope was disrupted upon division, hence causing defects in the nuclear size ratio of NSC and their progeny. This study provides with interesting findings in the field and proves a new role for Sdc in the regulation of intestinal and neural stem cell maintenance. I would recommend this manuscript to be accepted if the authors address the following comments.

      __Major comments: __ 1. In Figure 2 A-B, Sdc RNAi should ideally have a UAS control transgene to match the number of UAS being expressed to that of Sdc RNAi, DIAP1. Otherwise, it is plausible that reduced RNAi expression of Sdc RNAi, DIAP1 animals is the cause of the partial rescue. Staining against cell death markers such as Dcp-1 or TUNEL might also quantify the number of cells undergoing cell death in each of the genotypes. *

      • As mentioned in comments to reviewer 2 (point 1), it is difficult to label apoptotic cells in the fly gut. However, we could set up an additional control to test that the partial rescue observed upon DIAP1 expression is not a result of Gal4 dilution.

      • " These phenotypes were observed both with and without DIAP1 expression (Figure 2C), indicating that these cell shapes are not caused by apoptosis."Misleading, as DIAP overexpression in Sdc knockdown background only rescued apoptosis by 50%. Hence, it is possible that those cells undergoing morphological defects, protrusions and blebbing might still undergo death - also considering those morphological changes are typically observed in apoptotic cells...Therefore, to rule apoptosis out, these cells should be shown to be negative for cell death markers. *

      • We agree, however, it is difficult to label apoptotic cells. We think that the quantification of shape and area (as suggested by reviewer 2, point 1) will clearly show that the cell shapes resulting from Sdc depletion are not caused by apoptosis.

      • Show if Sdc is expressed in fGSCs - the lack of phenotype caused by Sdc knockdown might be due to lack of expression of Sdc.*

      • As mentioned in comments to reviewers 1&2, we will add a panel in figure S4 to show the low sdc expression in fGSCs.

      • "After confirming the presence of Sdc in neuroblasts (data not shown)."Data should be shown. It would be of great interest for researchers if you showed a staining of different brain cell types (NBs, glia, neurons) and the Sdc expression patterns.*

      • As mentioned in comments to reviewers 1&2, we will add a panel in figure 4 to show sdc expression in NBs and the overall expression pattern.

      • You show how Slc-depleted NBs have disrupted nuclear morphologies. However, does Slc KD in NB lineages affect their ability to self-renew and generate differentiated progeny? Is the number of NBs and of their progeny cells altered as it is for ISCs?*

      • We propose to knockdown Sdc in NBs and quantify brain size in 3rd instar larvae to test if the ability to generate progeny is affected.

      • Does protection against DNA damage in an Slc knockdown background prevent the defects observed with the single knockdown and ISC elimination?*

      • This is a good question, and we should emphasize this point in the discussion. However, because of the multiple routes of DNA damage response, and the multiple lines needed to explore this connection, we feel that investigating this question is beyond this project.

      • Any idea the similarities between ISC and NBs that can account for why Sdc knockdown has effects in those systems, while no effect was observed in the germ cells?*

      • Besides the differences in expression level, we speculate that GSCs may have a different nuclear / lamina architecture which might reflect differences in how GSCs control the physical integrity of their nuclei. It is also possible that the differences observed between tissues reflect the way stem cells connect to their microenvironment. Notably, fGSCs rely extensively on E-Cadherin mediated adhesion with neighbouring cells, and it is possible that contact with the extracellular matrix is dispensable. We will consider these possibilities in the discussion.

      Minor comments:* ** 8. Lamina invaginations, for example in Figure 3 A, could be indicated with an arrow for easier detection. *

      • Thanks for this suggestion, we will amend the figure.

      Specify the type and location of NB imaged during live cell experiments.

      • The NBs were imaged in the brain lobes, and we did not distinguish between type I and II NBs. We will add a sentence in the method section to clarify.

      *Reviewer #3 (Significance (Required)): Expertise: Drosophila stem cells *

      • Many thanks for the constructive comments.
    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) The data strongly suggest that iron depletion in urine leads to conditional essentiality of some genes. It would be informative to test the single gene deletions (Figure 3G) for growth in urine supplemented with iron, to determine how many of those genes support growth in urine due to iron limitation.

      We appreciate this suggestion. We have now included this suggested experiment as a new panel (Figure 5G).

      (2) Line 641. The authors raise the intriguing possibility that some mutants can "cheat" by benefitting from the surrounding cells that are phenotypically wild-type. Growing a fepA deletion strain in urine, either alone or mixed with wild-type cells, would address this question. Given that other mutants may be similarly "masked", it is important to know whether this phenomenon occurs.

      We thank the reviewer for this suggestion but believe that this would be very difficult to ascertain in K. pneumoniae as several redundant iron uptake systems exist. This would require significantly more time to construct sequential/combinatorial iron-uptake mutants to exactly determine this “cheating” and “masking” phenomenon and such work is beyond the scope of the current study.

      (3) In cases where there are disparities between studies, e.g., for genes inferred to be essential for serum resistance, it would be informative to test individual deletions for genes described as essential in only one study.

      We thank the reviewer for this suggestion, and we agree that deleting conditionally essential genes (i.e. serum resistance) could help identify discrepancies in methodology with other studies but this is beyond the scope of this study. Furthermore, we do not have these other strains readily available to us and importing these strains into Australia is challenging due to the strict import/quarantine laws.

      Reviewer #1 (Recommendations For The Authors)

      (4) Line 529. Why was 50 chosen as the read count threshold?

      This was chosen as the minimum threshold needed to exclude essential genes from the comparative analysis, as these can contribute false positive results where a change from, for example, 2 to 5 reads between conditions is considered a >2-fold change. We have updated the manuscript text to highlight this: “were removed from downstream analysis to exclude confounding essential genes and minimize the effect of stochastic mutant loss” (line 539

      (5) The titles for Figure 5 and Figure 6 appear to be switched.

      Thank you, we have now corrected this error.

      (6) Line 381. "Forty-six of these regions contain potential open reading frames that could encode proteins". How is a potential ORF defined?

      This was based on submitting the selected 145bp regions to BLASTx using default parameters and listing the top hit (if one was found). We have now edited the manuscript text to make this clearer. (Line 394)

      (7) Two previous TnSeq studies looking at Escherichia coli and Vibrio cholerae suggest that H-NS can prevent transposon insertion, leading to false positive essentiality calls. Is there any evidence of this phenomenon here? A/T content could be used as a proxy for H-NS occupancy.

      We thank the reviewer for this point and also agree that H-NS or other DNA-binding proteins could indeed lead to false-positive essentiality calls using TraDIS. Based on this, we have now included a sentence in the conclusion section mentioning this methodological caveat (Line 631). We believe that A/T content could potentially be used as a proxy for H-NS occupancy,

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors may wish to reformat the manuscript by decanting a number of panels and figures as supplementary material. These include the panels related to the description of TraDIS (for example Fig 1D, 1E, 1F. 1G, Fig 2A, Fig 3C, 3D, 3E, 3F, Fig 5C, Fig 6D). This is a well-established method.

      We thank the reviewer for this suggestion but believe that these panels allow the methodology and resulting insertion plots to be more followable and allow other researchers, of varying expertise, to better understand this functional genetic screen technique.

      (2) The authors need to indicate how relevant the strain they have probed is. Is it a good reference strain of the KpI group?

      This is a great suggestion and we have now included a new figure illustrating the genetic context and relatedness of K. pneumoniae ECL8 within the KpI phylogroup (New Figure 3).

      (3) The authors need to provide an extensive comparison between the data obtained and those reported testing other Klebsiella strains. A Table identifying the common and different genes, as well as a figure, may suffice. I would encourage authors to compare also their data against E. coli and Salmonella. For example, igaA seems to be not essential in Kebsiella although data indicates it is in Salmonella.

      We thank the reviewer for their comment and appreciate that our data could be extended and compared to other relevant Enterobacteriaceae members. However, we believe this is beyond the scope of this study as the focus is more on K. pneumoniae.

      (4) None of the mutants tested further are complemented. Without these experiments, it cannot be rigorously claimed that these loci play any role in the phenotypes investigated.

      We agree that complementation is an important tenet for validation of mutant gene phenotypes to specific gene loci, in this case wbbY has already been complemented and believe complementation for an already known molecular mechanism would be redundant. Please refer to our response in point 6.

      We complemented isolated transposon mutants hns7::Tn5 and hns18::Tn5 with a mid-copy IPTG inducible . We observed a slight increase in serum susceptibility but not full rescue of the WT phenotype (i.e. serum susceptibility). We suspect that the imperfect rescue of the serum-resistance phenotype observed could be due to the expression levels and copy number of the complement hns plasmid used. As hns is a known global regulator its possible pleiotropic role is complex as many aspects of stress response, metabolism or capsule could be affected in Klebsiella (doi.org/10.1186/1471-2180-6-72, doi.org/10.3389/fcimb.2016.00013). We have now included in the text our efforts in complementation and have included a new supplementary figure (Figure S11).

      (5) The contribution of siderophores to survival in urine is not conclusively established. Authors may wish to test the transcription of relevant genes, and to assess whether the expression is fur dependent in urine. Also, authors may wish to identify the main siderophore needed for survival in urine by probing a number of mutants; this will allow us to assess whether there is a degree of selection and redundancy.

      We thank the reviewer for their comment and agree siderophore uptake is important. We have now included an additional panel (Figure 5G) interrogating the importance of iron-uptake genes grown in urine which is iron limited. We do appreciate that further experiments looking into the Fur regulon and siderophore biosynthesis would be interesting but believe this is outside the scope of this study.

      (6) The role of wbbY is intriguing, pointing towards the importance of high molecular weight O-polysaccharide. In this mutant background, the authors need to assess whether the expression of the capsule, and ECA is affected. Authors need also to complement the mutant. Which is the mechanism conferring resistance?

      We thank the reviewer for their comment and would like to mention that wbbY has already been shown to play a role in LPS profile/biosynthesis and serum-resistance (10.3389/fmicb.2014.00608 ). Furthermore, blast analysis shows that the wbbY gene between the NTUH-K2044 (strain used in aforementioned study) and ECL8 shares 100% sequence identity and also shares lps operon structure. Hence, we do not find it pertinent to complement this mutant as we believe its molecular mechanism has already been established. We have now in the text more prominently highlighted the results of this study and how our screen was robust enough to also identify this gene for serum resistance.

      (7) hns and gnd mutants most likely will have their capsule affected. The authors need to assess whether this is the case. Which is the mechanism conferring resistance?

      As mentioned in point 6, we believe that the serum resistance phenotype is attributable to the LPS phenotype. Previous studies have listed hns and gnd mutants would likely have differences in capsule but due to hns being pleiotropic and gnd being intercalated/adjacent to the LPS/O-antigen biosynthesis it would be difficult to exactly delineate which cellular surface structure is involved.

      (8) The conclusion section can be shortened significantly as much of the text is a repetition of the results/discussion section.

      We thank the reviewer for their suggestion and have made edits to limit repetition in the conclusion section.

      Reviewer #3 (Public Review):

      Below I include several comments regarding potential weaknesses in the methodology used:

      • The study was done with biological duplicates. In vitro studies usually require 3 samples for performing statistical robust analysis. Thus, are two duplicates enough to reach reproducible results? This is important because many genes are analyzed which could lead to false positives. That said, I acknowledge that genes that were confirmed through targeted mutagenesis led to similar phenotypic results. However, what about all those genes with higher p and q values that were not confirmed? Will those differences be real or represent false positives? Could this explain the differences obtained between this and other studies?

      We thank the reviewer for their comment and apologize for the confusion, data were only pooled for the statistical analysis of gene essentiality. Here, two technical replicates of the input library were sequenced and the number of insertions per gene quantified (insertion index scores). These replicates had a correlation coefficient of r2 = 0.955, and the insertions per gene data were pooled to give total insertions index scores to predict gene essentiality. For conditional analyses (growth in urine or serum), replicate data were not combined. As mentioned previously, differences between this and other studies could also be attributed to inherent genomic differences or due to differences in experimental methodology, computational approaches, or the stringency of analysis used to categorize these genes.

      • Two approaches are performed to investigate genes required for K. pneumoniae resistance to serum. In the first approach, the resistance to complement in serum is investigated. And here a total of 356 genes were identified to be relevant. In contrast, when genes required for overall resistance to serum are studied, only 52 genes seem to be involved. In principle, one would expect to see more genes required for overall resistance to serum and within them identify the genes required for resistance to complement. So this result is unexpected. In addition, it seems unlikely that 356 genes are involved in resistance to complement. Thus, is it possible false positives account for some of the results obtained?

      We thank the reviewer for their comment and do believe false positives may account for some of the identified genes. Specifically, to the large contrast in genes, we believe this is due to the methodology as alluded to in our conclusion section. For overall resistance to serum, we used a longer time point (180 min exposure) where fewer surviving mutants are recovered hence fewer overall genes will be identified, whereas strains with short killing windows will have more (i.e. complement-mediated killing, 90 minute exposure).

      Reviewer #3 (Recommendations For The Authors):

      • In Figure 4 it is shown that genes important for growth in urine include several that are required for enterobactin uptake. Moreover, an in vitro experiment shows that the complementation of urine with iron increases K. pneumoniae growth. It would have been informative to do a competition experiment between the WT and Fep mutants in urine supplemented with iron. This could demonstrate that the genes identified are only necessary for conditions in which iron is in limiting concentrations and confirm that the defect of the mutants is not due to other characteristics of urine.

      We appreciate this suggestion. We have now included a new panel (Figure 5G) addressing the supplementation of iron in urine for these select mutants.

      • Considering the results section, the title for Figure 6 seems to be more appropriate for Figure 5.

      Thank you, this has now been corrected.

      Other points:

      • Line 44: treat instead of treating

      Thank you, this has now been corrected.

      • Line 63: found that only 3 genes played a role instead of "found only 3 genes played a role"

      Thank you, this has now been corrected.

      • Line 105: is there any reason for only using males? Since UTIs are frequent in women? Why not use urine from women volunteers?

      Due to accessibility of willing volunteers and human ethic application processes, only male samples were available. We are currently undertaking further studies to understand how male and female urine influences growth of uropathogens.

      • Line 105: since the urine was filter-sterilized, maybe the authors can comment that another point that is missing in urine - and that it may be important to study - will be the presence of the urine microbiome and how this affects growth of K. pneumoniae.

      We again thank the reviewer for this comment and have now edited the manuscript discussing how the absence of urine microbiome could affect growth (Line 659). As an aside, future studies in our lab are interested in looking at the role of commensal/microbiome co-interactions for essentiality/pathogenesis using TraDIS.

      • Line 116: I understand that the 8 healthy volunteers combined males and females

      Thank you, we have now edited this methods line to make this clearer.

      • Line 120: incubate in serum 90 min and 180 RPM shaking: any reasons for using these conditions, any reference supporting these conditions?

      Thank you for pointing this out, we were mirroring a previous K. pneumoniae serum-resistance study (doi.org/10.1128/iai.00043-).

      • Line 156: space after the dot.

      Thank you, we have now corrected this in the manuscript.

      • Line 164: resulting reads were mapped to the K. pneumoniae: what are the parameters used for mapping (e.g. % of identity...)?

      Thank you for bringing this to our attention, we have now included in our manuscript that we used the default parameters of BWA-MEM for mapping for minimum seed length (default -k =20bp exact match)

      • Line 180: it will be good to upload to a repository the In-house scripts used or indicate the link beside the reference for those scripts.

      Our scripts are derived from the pioneering TraDIS study (doi: 10.1101/gr.097097.109). We are currently still optimizing our scripts and intend to upload these to be publicly available. However, in the meantime we are more than happy to share them with other parties upon request.

      • Line 191: why were genes classified as 12 times more likely to be situated in the left mode? Any particular reason for using this threshold?

      We opted for a more-stringent threshold for classifying essential genes, in keeping with previous and comparable studies (doi.org/10.1371/journal.pgen.1003834).

      • Line 209: do you mean Q-value of <0.05 instead of >0.05 ? How is this Q value is calculated, and which specific tests are applied?

      Thank you for pointing out this Q value error, we have now corrected this in the manuscript. These values were generated using the biotradis tradis_comparison.R script which uses the EdgeR package. For further reading please see DOI: 10.1093/bioinformatics/btp616. The Q-values are from P values corrected for multiple testing by the Benjamini-Hochberg method.

      • Line 212: again, which type of test is used? What about the urine growth analysis? The same type of tests were applied?

      Thank you for bringing this to our attention, we have now indicated in the referenced method section the use of which package for which datasets (i.e. or serum). Line 212 refers to our use of the AlbaTraDIS package, which builds on the biotradis toolkit, to identify gene commonalities/differences in the selected growth conditions again using multiple testing by the Benjamini-Hochberg methods. For further reading, please refer to DOI: 10.1371/journal.pcbi.1007980

      • Line 226: do the authors mean Sanger sequencing instead of SangerSanger sequencing?

      Thank you, we have now corrected this in the manuscript.

      • Line 239: does the WT strain contain another marker for differentiating this strain from the mutant? Or is the calculation of the number of WT CFUs done by subtracting the number of CFUs in media with antibiotics from the total number of CFUs in media without antibiotics? The former will be a more accurate method.

      The calculation was based on the latter assumption, “number of WT CFUs done by subtracting the number of CFUs in media with antibiotics from the total number of CFUs in media without antibiotics”. We have now updated the methods section to make this clearer.

      • Line 266: can you indicate approximately how many CFUs you have in this OD?

      Thank you, we have now also indicated an approximate CFU for this mentioned OD600 (OD600 1 = 7 × 108 cells).

      • Line 309: besides indicating Figure 1D please indicate here Dataset S1 (the table where one can see the list of essential and non-essential genes). This table is shown afterwards but I think it will be more appropriate to show it at the begging of the section.

      Thank you, we have now taken on this recommendation and have now edited the manuscript to also indicate Dataset S1 earlier.

      • Table 3. regarding the comparison of essential genes between different strains. I think it will be more clear if a Venn diagram was drawn including only genes that have homologs in all the studied strains (i.e. defining the core genome essentially).

      We would like to thank the reviewer for suggesting a venn diagram and have now removed Table 3 which has been replaced with a new Figure 3.

      • Line 461: replicates were combined for downstream analyses? But are replicates combined for doing the statistical analysis? If so, how is the statistical analysis performed? How is it taken into account the potential variability in the abundance in each library? An r of 0.9 is high but not perfect.

      Technical replicates of the sequenced input library were combined following identification of a correlation coefficient of r2 = 0.955, for the calculation of insertion index scores used in gene essentiality analysis. While r2 = 0.955 is not perfect, discrepancies here can be attributed to higher variance in insertion index scores when sampling small genes, as these are represented by fewer insertions and the stochastic absence of a single insertion event has a greater effect on the overall IIS. Replicate data were not pooled for statistical analysis of mutant fitness (growth in urine and serum).

      • Line 487: is there any control strain containing the kanamycin gene in a part of the genome that does not affect the growth of K. pneumoniae? This could be used to show that having the kanamycin gene does not provide any defect in urine growth.

      We thank the reviewer for this suggestion but argue that introduction of the kanamycin gene into each unique loci may result in various levels of gene fitness that would be incomparable to a single control strain. Instead, we culture the ECL8 mutant library in urine and ensure that its kinetics are comparable to the wildtype. As the library contains thousands of kanamycin cassettes uniquely positioned across most of the genome with no observable growth defect, we do not anticipate the presence or expression of the cassette to have an appreciable impact.

      • Line 569: in the methodology it was indicated that control cells were incubated in PBS for the same amount of time. I think this is an important control that is not cited in the results section. Please can you indicate?

      We apologise for this misunderstanding due to how the methodology was written. The experiment did not sequence the PBS incubated samples as this was solely used a check for viability of the used K. pneumoniae ECL8 stock solution.

      • Line 597: "Mutants in igaA are enriched in our experiments". Can you show this data?

      We have now included this as a supplementary (Figure S11A)

      • Line 615: when doing this calculation, I guess the authors take into account only genes that are also present in the other strains.

      That is correct, we were aiming to highlight the high conservation of “essential genes” among all the selected strains.

      • Line 627: why surprisingly? Because is too low. Then indicate.

      Thank you, we have now edited this sentence to indicate that.

      • Figure 4: please, for clarity, can you indicate the meaning of the colors in the figure itself besides indicating it in the figure legend?

      Thank you, we have now included a color legend in these figure panels for clarity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors build upon prior data implicating the secreted peptidoglycan hydrolase SagA produced by Enterococcus faecium in immunotherapy. Leveraging new strains with sagA deletion/complementation constructs, the investigators reveal that sagA is non-essential, with sagA deletion leading to a marked growth defect due to impaired cell division, and sagA being necessary for the immunogenic and anti-tumor effects of E. faecium. In aggregate, the study utilizes compelling methods to provide both fundamental new insights into E. faecium biology and host interactions and a proof-of-concept for identifying the bacterial effectors of immunotherapy response.

      We thank the Reviewers for their positive feedback on our manuscript. We also appreciate their helpful comments/critiques and have revised the manuscript as indicated below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Klupt, Fam, Zhang, Hang, and colleagues present a novel study examining the function of sagA in E. faecium, including impacts on growth, peptidoglycan cleavage, cell separation, antibiotic sensitivity, NOD2 activation, and modulation of cancer immunotherapy. This manuscript represents a substantial advance over their prior work, where they found that sagA-expressing strains (including naturally-expressing strains and versions of non-expressing strains forced to overexpress sagA) were superior in activating NOD2 and improving cancer immunotherapy. Prior to the current study, an examination of sagA mutant E. faecium was not possible and sagA was thought to be an essential gene.

      The study is overall very carefully performed with appropriate controls and experimental checks, including confirmation of similar densities of ΔsagA throughout. Results are overall interpreted cautiously and appropriately.

      I have only two comments that I think addressing would strengthen what is already an excellent manuscript.

      In the experiments depicted in Figure 3, the authors should clarify the quantification of peptidoglycans from cellular material vs supernatants. It should also be clarified whether the sagA need to be expressed endogenously within E. faecium, and whether ambient endopeptidases (perhaps expressed by other nearby bacteria or recombinant enzymes added) can enzymatically work on ΔsagA cell wall products to produce NOD2 ligands?

      We mentioned in the main text that peptidoglycan was isolated from bacterial sacculi and digested with mutanolysin for LC-MS analysis. We have now also included “mutanolysin-digested” sacculi in the Figure 3 legend as well.

      We have added the following text “We next evaluated live bacterial cultures with mammalian cells to determine their ability to activate the peptidoglycan pattern recognition receptor NOD2” and “our analysis of these bacterial strains” to indicate live cultures were evaluated for NOD2 activation.

      We have also added the following text “Our results also demonstrated that while many enzymes are required for the biosynthesis and remodeling of peptidoglycan in E. faecium, SagA is essential for generating NOD2 activating muropeptides ex vivo.”

      In the murine experiments depicted in Figure 4, because the bacterial intervention is being performed continuously in the drinking water, the investigators have not distinguished between colonization vs continuous oral dosing of the mice peptidoglycans. While I do not think additional experimentation is required to distinguish the individual contributions of these 2 components in their therapeutic intervention, I do think the interpretation of their results should include this perspective.

      We have added the following text “We note that by continuous oral administration in the drinking water, live E. faecium and soluble muropeptides that are released into the media during bacterial growth may both contribute to NOD2 activation in vivo.” and revised the following text “Nonetheless, these results demonstrate SagA is not essential for E. faecium colonization, but required for promoting the ICI antitumor activity through NOD2 in vivo.

      Reviewer #2 (Public Review):

      Summary:

      The gut microbiome contributes to variation in the efficacy of immune checkpoint blockade in cancer therapy; however, the mechanisms responsible remain unclear. Klupt et al. build upon prior data implicating the secreted peptidoglycan hydrolase SagA produced by Enterococcus faecium in immunotherapy, leveraging novel strains with sagA deleted and complemented. They find that sagA is non-essential, but sagA deletion leads to a marked growth defect due to impaired cell division. Furthermore, sagA is necessary for the immunogenic and anti-tumor effects of E. faecium. Together, this study utilizes compelling methods to provide fundamental new insights into E. faecium biology and host interactions, and a proof-of-concept for identifying the bacterial effectors of immunotherapy response.

      Strengths:

      Klupt et al. provide a well-written manuscript with clear and compelling main and supplemental figures. The methods used are state-of-the-art, including various imaging modalities, bacterial genetics, mass spectrometry, sequencing, flow cytometry, and mouse models of immunotherapy response. Overall, the data supports the conclusions, which are a valuable addition to the literature.

      Weaknesses:

      Only minor revision recommendations were noted.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      General comments - the number/type of replicates and statistics are missing from some of the figure panels. Please be sure to add these throughout - all main figure panels should have replicates. I've also noted some specific cases below.

      Abstract - sagA is non-essential, need to edit text at "essential functions".

      This change has been made.

      "small number of mutations" - specify how many in the text.

      We revised the text. “Small number” is changed to “11”.

      "under control of its native promoter" - what was the plasmid copy number? It looks clearly overexpressed in Figure 1d despite using a native promoter, although it's a bit hard to know for sure without a loading control.

      pAM401 has p15A origin of replication, therefore the plasmid copy number ~20-30 copies (Lutz R. et al Nucleic Acids Res. 1997). Total protein was visualized by Stain-Free™ imaging technology (BioRad) and serves as protein loading control and has been relabeled accordingly.

      "decrease levels of small muropeptides" - the asterisks are missing from Figure 3a.

      Green asterisks for peaks 2, 3, 7 and purple asterisks for peaks 13, 14 were added.

      The use of "Com 15 WT" in the figures is confusing - just replace it with "wt" and specify the strain in the text. Presumably, all of the strains are on the Com 15 background.

      “Com15 WT” was replaced to “WT” in figures and main text.

      Change 1d to 1b so that the panels are in order (reading left to right and then top to bottom).

      Figure 1 legend is missing a number of replicates and statistics for 1a.

      Number of replicates were added.

      Figure 1b - it's unclear to me what to look at here, could add arrows indicating the feature or interest and expand the relevant text.

      Arrows pointing to cell clusters were added.

      Figure 1d - what is "stain free"? It would be preferable to show a loading control using an antibody against a constitutive protein to allow for normalization of the loading control.

      Stain-Free Imaging technology (BioRad) utilizes gel-containing trihalo compound to make proteins fluorescent directly in the gel with a short photoactivation, allowing the immediate visualization of proteins at any point during electrophoresis and western blotting. Stain-Free total protein measurement serves as a reliable loading control comparable to Coomassie Blue Staining. This has been relabeled a “Total protein” in the Figure and Stain-free imaging technology is noted in the legend.

      ED Figure 1 - representative of how many biological replicates?

      Legends are updated.

      ED Figure 2a - I would replace this with a table, it's not necessary to show the strip images. Also, please specify the number of replicates per group.

      Additional Extended Data Table 2 was added.

      ED Figure 2b - This data was not that convincing since the sagA KO has a marked growth defect and the time points are cut off too soon to know if growth would occur later. The MIC definition is potentially misleading. Should specific a % growth cutoff (i.e. <10% of vehicle control) and the metric used (carrying capacity or AUC). Then assign MIC to the tested concentration, not a range. The empty vector also seems to impact MIC, which is concerning and complicates the interpretation. Specify the number of replicates and add statistics. Given these various concerns, I might suggest removing this figure, as it doesn't really add much to the story.

      We appreciate this comment from the Reviewer, but believe this data is helpful for paper and have included longer time points for the growth data. The definition of MIC for ED Fig. 2b has been included in the legend.

      Figure 2 - specify the type of replicate. Number of cells? Number of slices? Number of independent cultures?

      For Cryo-ET experiments single bacterial cultures were prepared. Number of cells and slices for analysis are indicated in the legend. Legends are updated.

      Figure 4e - missing the water group, was it measured?

      Water (αPD-L1) group was not included in immune profiling of tumor infiltrating lymphocytes (TILs) experiment, as we have previously demonstrated limited impact on ICI anti-tumor activity and T cell activation in this setting (Griffin M et al Science 2021).

      Figure 4d - is this media specific to your strains? If not, qPCR may be a better method using strain-specific primers.

      Yes, HiCrome™ Enterococcus faecium agar plates (HIMEDIA 1580) are selective for Enterococcus species, moreover the agar is chromogenic allowing to identify E. faecium as yellow colonies among other Enterococcus species.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      RC-2023-02105R: Brunetta et al.,

      IF1 is a cold-regulated switch of ATP synthase to support thermogenesis in brown fat

      We are happy to submit our revised manuscript after considering the suggestions made by reviewers. The comments were overall positive, and the changes requested were mostly editorial. We have, nevertheless, added new experiments as quality controls. These experiments did not affect the main conclusions of our work. In addition, we also included two in vivo experimental models of gain and loss-of-function, to further address the physiological relevance of IF1 in BAT thermogenesis. We believe with these additional experiments, quality controls as well as in vivo models, our study has improved considerably. We hope our efforts will be appreciated by the reviewers and we make ourselves available to answer any further questions.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: In the present manuscript, the authors present data in support of their primary discovery that "IF1 controls UCP1-dependent mitochondrial bioenergetics in brown adipocytes". The opening figure convincingly demonstrates that IF1 expression is cold-exposure dependent. They then go on to show that loss of IF1 has functional consequences that would be predicted based on IF1's know role as a regulator of ATP hydrolysis by CV. They go on to make a few additional claims, succinctly detailed in the Discussion section. Specific claims include the following: 1) IF1 is downregulated in cold-adapted BAT, allowing greater hydrolytic activity of ATP synthase by operating in the reverse mode; 2) when IF1 is upregulated in brown adipocytes in vitro mitochondria unable to sustain the MMP upon adrenergic stimulation, 3) IF1 ablation in brown adipocytes phenocopies the metabolic adaptation of BAT to cold, and 4) IF1 overexpression blunts mitochondrial respiration without any apparent compensator response in glycolytic activity. The claims described above are well supported by the evidence. The manuscript is very well written, figures are clear and succinct. Overall, the quality of the work is very high. Given that IF1 is implicated across many fields of study, the novel discovery of IF1 as a regulator of brown adipose mitochondrial bioenergetics will be of significance across several fields. That said, a few areas of concern were apparent. Concerns are detailed in the "Major" and "Minor" comments section below. Additional experiments do not appear to be required, assuming the authors adequately acknowledge the limitations of the study and either remove or qualify speculative claims.

      Major Comments:

      1. The authors convincingly demonstrate that IF1 expression is specifically down-regulated in BAT upon cold-exposure. These data strongly implicate a role for IF1 in BAT bioenergetics, a major claim of the authors and a novel finding herein. Additional major strengths of the paper, which provide excellent scientific rigor include the use of both loss of function and gain of function approaches for IF1. In addition, the mutant IF1 experiments are excellent, as they convincingly show that the effects of IF1 are dependent on its ability to bind CV. RESPONSE: We thank the reviewer for the positive feedback on our work.

      Regarding Figure 1 - Did the content of ATP synthase change? In figure 1A-B, the authors show that ATPase activity of CV is higher in cold-adapted mice. While this result could be due to a loss of IF1, it could also be due to a higher expression of CV. To control for this, the authors should consider blotting for CV, which would allow for ATPase activity to be normalized to expression.

      RESPONSE: Thank you for this suggestion. We have now determined complex V subunit A in our experimental protocol. We found that cold exposure does not impact complex V protein levels. Given the importance of this control, we have now included it in Figure 1 (Please, see the revised version) alongside the IF1/complex V ratio. In addition, we have now performed WBs in the BAT from mice exposed for 3 and 7 days to thermoneutrality (~28°C). We found that IF1 is not reduced following whitening of BAT by this approach whilst UCP1 and other mitochondrial proteins are reduced. This set of data is now included in Figure 1I,K,L.

      Regarding MMP generated specifically by ATP hydrolysis at CV, the reversal potential for ANT occurs at a more negative MMP than that of CV (PMID: 21486564). Because reverse transport of ATP (cytosol to matrix) via ANT will also generate a MMP, it is speculative to state that the MMP in the assay is driven by ATP hydrolysis at CV. It is possible and maybe even likely that the majority of the MMP is driven by ANT flux, which in turn limits the amount of ATP hydrolyzed by CV. Admittedly, it is very challenging to different MMP from ANT vs that from CV, thus the authors simply need to acknowledge that the specific contribution of ATP hydrolysis to MMP remains to be fully determined. That said, the fact that ATP-dependent MMP tracks with IF1 expression does certainly implicate a role for ATP hydrolysis in the process. The authors should consider including a discussion of the ambiguity of the assay to avoid confusion. A role for ANT likely should be incorporated in the Fig. 1J cartoon.

      RESPONSE: Thank you for bringing the ANT contribution to MMP to our attention. The effects of ATP in the real-time MMP measurements were totally abolished by the addition of oligomycin in BAT-derived isolated mitochondria, thus suggesting dependency of complex V in this process. However, the assessment of MMP in intact cells is much more challenging given cytosolic vs. mitochondrial contribution to ATP pool, and ATP synthase vs. ANT reversal capacity depending on MMP. Nevertheless, we have addressed these points in the discussion section as well as added to our schematic cartoon in Figure 1m.

      Regarding the lack of effect of IF1 silencing on MMP, it is possible that IF1 total protein levels are simply lower in cultured brown fat cells relative to tissue? The authors could consider testing this by blotting for IF1 and CV in BAT and brown fat cells. The ratio of IF1/ATP5A1 in tissue versus cells may provide some amount of mechanistic evidence as to their findings.

      RESPONSE: We have now blotted for complex V and IF1 in both differentiated primary brown adipocytes and BAT homogenates derived from mice kept at room temperature (~22°C). We found the levels of complex V in primary brown adipocytes are higher than BAT homogenates. Therefore, IF1/complex V ratio is different between these two systems. This has indeed the potential to influence our gain and loss-of-function experiments. We have added these results alongside their interpretation in the revised manuscript.

      The calculation of ATP synthesis from respiration sensitive to oligomycin has many conceptual flaws. Unlike glycolysis, where ATP is produced via substrate level phosphorylation, during OXPHOS, the stoichiometry of ATP produced per 2e transfer is not known in intact brown adipose cells. This is a major limitation of this "calculated ATP synthesis" approach that is beginning to become common. Such claims are speculative and thus likely do more harm than good. In addition to ANT and CV, there are many proton-consuming reactions driven by the proton motive force (e.g., metabolite transport, Ca2+ cycling, NADPH synthesis). Although it remains unclear how much proton conductance is diverted to non-ATP synthesis dependent processes, it seems highly likely that these processes contribute to respiratory demand inside living cells. Moreover, just as occurs with UCP1 in response to adrenergic stimuli, proton conductance across the various proton-dependent processes likely changes depending on the cellular context, which is another reason why using a fixed stoichiometry to calculate how much ATP is produced from oxygen consumption is so highly flawed. Maximal P/O values that are often used for NAD/FAD linked flux are generated using experimental conditions that favor near complete flux through the ATP synthesis system (supraphysiological substrate and ADP levels). The true P/O value inside living cells is likely to be lower.

      RESPONSE: We agree with the reviewer regarding the limitations on calculating ATP production in intact cells based on respiration and proton flux. However, this was only one experiment on which we based our conclusions, as these were also supported by i.e. ATP/ADP ratio measurements and oxygen consumption using different substrates. Therefore, we do not rely exclusively on the ATP production estimative, rather we use this experiment to support complementary methodologies. Nevertheless, we have now better detailed our experimental protocol as well as acknowledged the limitations of the method, so the reader is aware of our procedure and its limitations. We hope the reviewer understands our motivation to perform these experiments and the contribution to our study.

      Why are the results in Figure 3K expressed as a % of basal? Could the authors please normalize the OCR data to protein and/or provide a justification for why different normalization strategies were used between 3K and 3M?

      RESPONSE: We apologize for the lack of consistency. We have now updated Figure 3 to show all the data in absolute values divided by protein content. This change does not affect the overall interpretation of the findings.

      The authors claim that IF1 overexpression lowers ATP production via OXPHOS. However, given the major limitations of this assay (ass discussed above), these claims should be viewed as speculation. This needs to be addressed by the authors as a major limitation. The fact that the ATP/ADP levels did not change do not support of reduction in ATP production, as claimed in the title of Figure 4.

      RESPONSE: The reduction in ATP levels and mitochondrial respiration (independent of the substrate offered) suggests a reduction in ATP production rather than an increase in ATP consumption. Moreover, the maintenance of ATP/ADP ratio suggests the existence of a compensatory mechanism to avoid cellular energy crises, which we interpreted as reduced metabolic activity of the cells. Nevertheless, we have now reworded our statements to address the limitations of the methods and our interpretation of the data.

      In the discussion, the authors state "However, considering that IF1 inhibits F1-ATP synthase in a 1:1 stoichiometric ratio, the relatively higher expression of IF1 in BAT at room temperature could represent an additional inhibitory factor for ATP synthesis in this tissue." This does not appear to be correct. Although IF1 has been suggested to partially lower maximal rates of ATP synthesis rates, most of this evidence comes from over-expression experiments. According to the current understanding of IF1-CV interaction, the protein is expelled from the complex during rotation in favor of ATP synthesis (PMID: 37002198). It is far more likely that ATP synthesis is low in BAT mitochondria due to the low CV expression. Relative to heart and when normalized to mitochondrial content, CV expression in BAT mitochondria is about 10% that of heart (PMID: 33077793).

      RESPONSE: We agree with the reviewer and removed this sentence.

      The last sentence of the manuscript states, "Given the importance of IF1 to control brown adipocyte energy metabolism, lowering IF1 levels therapeutically might enhance approaches to enhance NST for improving cardiometabolic health in humans." This sentence seems at odds with the evidence that IF1 levels go up, not down, in human BAT upon cold exposure.

      RESPONSE: In light of our new experiments, we have now updated our conclusions.

      Minor Comments:

      The term "anaerobic glycolysis" is used throughout. All experiments were performed under normoxic conditions, thus the correct term is "aerobic glycolysis.

      RESPONSE: Thank you for this comment and we have replaced this term as suggested.

      Only male mice were used in the study, could the authors please provide a justification for this?

      RESPONSE: Given we devoted most of our efforts to the manipulation of IF1 in vitro, we have used the mouse model as a proof-of-principle on the impact of IF1 in adrenergic-induced thermogenesis. We have now included IF1 KO male and female mice to address the role of IF1 in adrenergic-induced thermogenesis. However, due to the limitation of material, we could only perform AAV in vivo gain-of-function in male mice, therefore, our results cannot be immediately transferred to both sexes, unfortunately.

      Reviewer #1 (Significance (Required)):

      Overall, the quality of the work is very high. Given that IF1 is implicated across many fields of study, the novel discovery of IF1 as a regulator of brown adipose mitochondrial bioenergetics will be of significance across several fields.

      My expertise is in mitochondrial thermodynamics; thus, I do not feel there are any parts of the paper that I do not have sufficient expertise to evaluate.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary

      The manuscript by Brunetta and colleagues conveys the message that the ATPase inhibitory factor 1 (IF1) protein, a physiological inhibitor of mitochondrial ATP synthase, is expressed in BAT of C57BL/6J mice. Moreover, upon cold-adaption of mice they report that the content of IF1 in BAT is downregulated to sustain the mitochondrial membrane potential (MMP) as a result of reverse functioning of the enzyme. In experiments of loss and gain of function of IF1 in cultured brown adipocytes and WT cells they further stress that IF1 silencing promotes metabolic reprogramming to an enhanced glycolysis and lipid oxidation, whereas IF1 overexpression blunts ATP production rendering a quiescent cellular state of the adipocytes.

      RESPONSE: We appreciate the time the reviewer invested in our work. Please, see our responses below in a point-by-point manner.

      Reviewer #2 (Significance (Required)):

      Claims and conclusions:

      I have been surprised by the claim that IF1 protein is expressed in BAT under basal conditions and that its expression is downregulated in the cold-adapted tissue. In a previously published work by Forner et al., (2009) Cell Metab 10, 324-335 (reference 43), using a quantitative proteomic approach, it is reported that the mitochondrial proteome of mouse BAT under basal conditions contains a low content of IF1 (at level comparable to the background of the analysis). Remarkably, in the same study they show that there is roughly a 2-fold increase in the content of IF1 protein in mitochondria of BAT at 4d and 24d of cold-adaptation of mice. In other words, just the opposite of what is being reported in the Brunetta study.

      RESPONSE: We are aware of the inconsistencies between our findings and Forner et al. (2009). We would like to point out that we have determined IF1 levels in BAT in two separate cohorts with the same findings, and in a third cohort, we observed IF1 mRNA levels to be downregulated in a much shorter timeframe. Our functional analysis is line with this pattern of regulation. A closer look at the supplementary table provided by Forner et al. (2009), shows that the increase in IF1 content following cold exposure is not supported and since we do not have further insight into the methods and analysis employed by the Forner et al. group, we believe a direct comparison should be avoided at the moment. Regarding the baseline levels of IF1 in BAT, the relatively high abundance of IF1 in BAT was also found by another independent group (https://doi.org/10.1101/2020.09.24.311076).

      Importantly, the last paragraph of the discussion needs to be amended when mentioning the work of Forner et al. (ref.43). The mentioned reference studied changes in the mouse mitochondrial proteome not in human mitochondria, as it is stated in the alluded paragraph.

      RESPONSE: We apologize for this overlook; we have now reworded our statement.

      More puzzling are the western blots in Figures 1E, 1H, Supp. Fig. 1C, D were IF1 (ATP5IF1) is identified by a 17kDa band. However, in other Figures (Fig. 2, Fig. 3, Fig. 4, Supp Fig. 2) IF1 is identified by its well-known 12kDa band. What is the reason for this change in labeling of the IF1 band? The reactivity of the anti-IF1 antibody used? It has been previously documented that liver of C57BL/6J and FVB mouse strains do not express IF1 to a significant level when compared to heart IF1 levels (Esparza-Molto (2019) FASEB J. 33, 1836-1851). However, in Fig. 1E they show opposite findings, much higher levels of IF1 in liver than in heart as reveal by the 17kDa band. Moreover, in Fig. 1H they show the vanishing of the 17 kDa band under cold adaptation, which is not the migration of IF1 in gels as shown in their own figures (see Fig. 2, Fig. 3, Fig. 4, Supp Fig. 2). I am certainly reluctant to accept that the 17kDa band shown in Figures 1E, 1H, Supp. Fig. 1C, D is indeed IF1. Most likely it represents a non-specific protein recognized by the antibody in the tissue extracts analyzed. Cellular overexpression experiments of IF1 in WT1 cells (Fig. 2E) and primary brown adipocytes (Fig. 4B) also support this argument. Overall, I do not support publication of this study for the reasons stated above.

      RESPONSE: We understand the concerns raised by the reviewer and apologize for the lack of details in our experimental procedures. While we used the same antibody in the study (Cell Sig. cat. Num. 8528, 1:500), we used two different types of gels. The difference in the molecular weight appearance of IF1 is likely through the migration of the protein in the agarose gel. By using custom-made gels, we observe the protein ~17kDa (Fig. 1 and 5), whereas by using commercial gels (Fig. 2, 3, and 4), we observe the protein closer to the predicted molecular weight (i.e. ~12kDa). Of note, gain and loss-of-function experiments, both in vivo as well as in vitro confirm this statement and the specificity of the antibody (Fig. 2, 3, 4, 5, Fig. EV2). In addition, when we ran a custom-made gel with primary BAT cells, we observed again the ~17kDa band (see Figure for the reviewer below). These experiments alongside the absence of other bands in the gels (see uncropped membranes in Supplementary Figure 1) make us conclude that the band we observe is indeed IF1. Nevertheless, we have now updated our methods section, so the reader is aware of our approaches. We hope the reviewer is satisfied with our additional experiments and editions throughout the manuscript.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary:

      In this manuscript, Brunneta et al describe the role of IF1 in brown adipose tissue activation using in vivo and in vitro experimental models. They observed that cold adaptation promotes a reduction in IF1 expression and an increase in the reverse activity of mitochondrial ATPase or Complex V. Based on these results, the authors explore the contribution of IF1 in this metabolic pathway by modeling the thermogenic process in differentiated primary brown adipocytes. They silenced and overexpressed IF1 in culture and studied their adrenergic stimulation under norepinephrine.

      Major comments:

      The experiments are well explained and the manuscript flows very well. There are several comments that should be addressed.

      RESPONSE: We thank the reviewer for the kind words regarding our work.

      1. The authors measure ATP hydrolysis in isolated mitochondria from BAT in Figure 1. They observed that IF1 is decreased upon cold exposure and that ATP hydrolysis is increased. They assess protein levels of different OXPHOS proteins, including IF1 but not other proteins of Complex V (ATP5A) as they do in Figures 3 and 4. It is important to see that cold exposure only affects IF1 levels but not other proteins from Complex V. Does IF1/Complex V ratio change? RESPONSE: We thank the reviewer for this suggestion which was also raised by Reviewer #1. We have now measured complex V subunit A in our experimental protocol. We found that cold exposure does not impact complex V protein levels. Given the importance of this information, we have now included it in Figure 1 (Please, see the revised version) alongside the IF1/complex V ratio. In addition, we have now performed WBs in the BAT exposed for 3 and 7 days to thermoneutrality (~28°C) where we found that IF1 is not reduced following whitening of BAT by this approach whilst UCP1 and other mitochondrial proteins are reduced.

      This set of data is now included in Figure 1I,K,L.

      In Figure 2J, the drop in MMP is lower upon adrenergic stimulation than in Figure 2E. The same observation applies to other results when the reduction in MMP after NE addition is minimal. Why do the authors remove TMRM for the measurements of membrane potential? TMRM imaging is normally done in the presence of the dye in non-quenching mode. Treatments should be done prior to the addition of the dye and then TMRM should be added and left during the imaging analysis and measure in non-quenching mode. This might explain some of the above-mentioned points regarding the MMP data. Alternatively, if the dye is removed before the measurements, they should let the cells to adapt and so the dye equilibrates between mitochondria and cytosol. A more elegant method to measure membrane potential could be live-cell imaging. In addition, authors propose that mitochondrial membrane potential upon NE stimulation is maintained by reversal of ATP synthase. If this is the case, one would expect that addition of oligomycin in NE treated adipocytes would cause depolarization. However, in FigS2A this is not the case. Authors should comment on this in addition to considering more elegant approach to measure MMP.

      RESPONSE: We apologize for the lack of details in the methods. All treatments (i.e., transfection and norepinephrine stimulation) were performed before the addition of TMRM. Indeed, this approach does not have the resolution compared to safranine in isolated mitochondria (Fig. 1D), which limits our interpretation regarding the dynamic role of IF1 on MMP in brown adipocytes. We have taken care to state the limitations of our method throughout the entire paper to avoid overinterpretation of our data. Regarding the removal of the dye before the measurements, our internal controls indicate that this procedure does not change the ability of our method to detect fluctuations in MMP (i.e., oligomycin and FCCP as internal controls). Nevertheless, as suggested by the reviewer, to test the time effect of the probe equilibrium (i.e., mitochondria versus cytosol) in our method, we loaded cells with TMRM 20 nM for 30 min and measured the fluorescence right after the removal of the probe/washing steps for another 10 min. We were not able to detect differences in the fluorescence in a time-dependent manner (see below). Therefore, we conclude the removal of TMRM does not influence the fluorescence of the probe in differentiated brown adipocytes.

      +NE

      -NE

      In addition, we performed a similar experiment using TMRM in the quenching mode (200 nM), however, after the removal of TMRM, we added FCCP (1 mM) to the cells for 10 min under constant agitations at 37°C. This approach aimed to expel all TMRM that accumulated within the mitochondria in an MMP-dependent manner. Therefore, excluding the dynamic Brownian movement that we could have caused by the removal of the dye before the measurement mentioned by the reviewer. By doing this, we found the same effect of IF1 overexpression in the reduction of MMP in the presence of norepinephrine.

      Protocol:

      • Transfection (24h) on day 4 of differentiation + 24h just normal media

      • 30 min norepinephrine 10 µM

      • 200 nM TMRM on top of NE

      • Washing step

      • Add FCCP 1 µM for 10 min, and read (The aim here was to release all TMRM accumulated inside of mitochondria in a MMP-dependent manner)

      In summary, the data suggests the removal of the dye from the cells does not influence the fluorescence of TMRM, therefore, enabling us to make conclusions regarding the biological effects of IF1 manipulation in the MMP of brown adipocytes. Regarding the reverse mode of ATP synthase and the absence of effects with oligomycin, given oligomycin inhibits both rotation of ATP synthase and even uncoupled brown adipocytes respond to oligomycin (i.e. reduction in O2 consumption), the prediction of lowering MMP in the presence of oligomycin due to inhibition of the reserve mode of ATP synthase is more complicated than anticipated. Nevertheless, we have now addressed this topic in the discussion section. Lastly, we generally observe a reduction in MMP around 10-25% in differentiated adipocytes upon NE treatment (30 minutes, 10mM). However, due to the differentiation state of the cells, MMP response from norepinephrine fluctuated from experiment to experiment. Therefore, we did not compare experiments performed on different days or batches, but only within the same differentiation batch to reduce variability.

      In Figure 2, in the model of siIF1, there is baseline more phosphorylation of AMPK than in the scramble control (pAMPK). However, this is not the case of p-p38MAPK. Do the authors have any explanation for those differences in baseline activation of the stress kinases when IF1 is silenced? In the same experimental group, addition of NE seems to have more effect in the scrambled than in siIF1, but the plotted data does not reflect these differences. In contrast, increase in pAMPK upon NE is higher in IF1 overexpressing cells compared to EV (Figure 2H), but again this is not reflected in western blot quantification (Figure 2I).

      RESPONSE: Although some differences in pAMPK in the treatments were observed as gathered by the representative blots, these changes were not confirmed later in different biological replicates, therefore, the overall effect of IF1 manipulation in pAMPK does not change. Given we used this approach as quality control for our experiments to guarantee norepinephrine treatment works, we removed the pAMPK data from the study and kept p38 as a marker of adrenergic signaling activation (please see revised Fig. 2 in the main file).

      Does NE promote decrease of IF1 expression in control (siScramble and EV) adipocytes? The authors should test it and see whether it goes in the same direction as the observations derived from the experiments in cold exposed mice. This is very important point, as it could explain the lack of an additional effect of IF1 silencing on NE-induced depolarization (Figure 2E).

      RESPONSE: We thank the reviewer for this suggestion. In line, with the in vivo data, acute NE treatment in differentiated brown adipocytes does not change IF1 mRNA and protein levels. We have now added this information and the corresponding interpretation to the updated manuscript.

      Does NE promote decrease of IF1 expression in the scramble and EV adipocytes? The authors should test it and see whether it goes in the same direction as the observations derived from the experiments in cold exposed mice.

      RESPONSE: As this question is the same as #4, we believe the reviewer may have erroneously pasted this here.

      For MMP data in Fig2, they should include significance between non treated and NE-treated groups. They say: "While UCP1 ablation did not cause any effect on MMP upon adrenergic stimulation...", but NE caused (probably significant) depolarization in siUCP1, which seems even stronger than depolarization in EV. This is opposite to what you would expect. They also didn't confirm UCP1 silencing with western blot.

      RESPONSE: We thank the reviewer for this suggestion. We have now included the expected statistical main effect of NE upon MMP. Although the effects of IF1 overexpression were blunted when Ucp1 was silenced, we indeed still observed the same degree of reduction in MMP in brown adipocytes. This finding has two possible explanations, one is the effectiveness of the silencing protocol, therefore, residual Ucp1 expression may still play a role in this experiment; second, other ATP-consuming processes are able to lower MMP in a UCP1-independent manner. We have added this information to the updated manuscript to make the reader aware of our findings as well as the limitations of the method. Unfortunately, we were not able to detect UCP1 protein levels due to technical issues. Given the effects of IF1 overexpression were blunted when Ucp1 was silenced, we believe this functional outcome is sufficient, alongside mRNA levels, to demonstrate the effectiveness of our silencing protocol.

      It has been established that decreased expression of IF1 promotes increase in the reverse activity of Complex V, ATP hydrolytic activity. Increase in ATP hydrolysis also affects ECAR. The authors should consider this when calculating the contribution of ATP glycolysis versus ATP OXPHOS since the ATP hydrolysis is also playing a role in the ECAR increase. The data should be reinterpreted. ATP hydrolysis should be measured in the situation where IF1 is silenced and overexpressed. These measurements can be done in cells using the seahorse.

      RESPONSE: The only differences we observed in MMP are in the presence of norepinephrine (i.e. UCP-1-dependent proton conductance), which is not present during the estimation of ATP production by Seahorse analysis. Nevertheless, we have now improved the description of our experimental protocol and limitations to estimate ATP production to make it as clear as possible to the reader. Lastly, given the addition of in vivo gain-of-function experiments, we have now determined the ATP hydrolytic activity in this model, which offers a better understanding of the in vivo modulation of IF1 levels affecting ATP synthase activity (reverse mode). We hope the reviewer understands our motivation to focus on the in vivo model of gain-of-function regarding ATP synthase activity.

      The authors use GAPDH as loading control in western blots. They should use another protein since GAPDH is part of the intermediary metabolism and plays a role in glycolysis.

      RESPONSE: We understand the concern of the reviewer regarding the use of GAPDH as a loading control for the studies of metabolism. However, as can be observed by the western blot images, GAPDH levels do not change in our experimental models, therefore, we feel confident that our loading is homogeneous throughout our gels.

      The authors show that reduction of IF1 involves more lipid utilization. They should include more experiments showing the connection of the metabolic adaptation in the absence of IF1 and some lipid imaging.

      RESPONSE: We appreciate this suggestion. We have now performed Oil Red O staining in differentiated adipocytes following ablation of IF1. However, we did not observe any effect on lipid accumulation in primary brown adipocytes following IF1 knockdown. Therefore, the effects of IF1 ablation on lipid mobilization are not due to lipid content or reflected in lipid accumulation. We have now added this new information to the manuscript (please, see the revised form Fig. EV3).

      In the text, "Despite this adjustment of experimental conditions, we did not detect any effect of IF1 ablation on mitochondrial oxygen consumption (Supplementary Fig. 3A,B)", this is true for baseline, NE-driven and ATP-linked respiration, but what about maximal respiration? There is a huge increase in IF1 knockdown... They should explain these results.

      RESPONSE: We perform this experiment to address the question of whether the lipid mobilization induced by norepinephrine would uncouple mitochondria in a UCP1-independent manner. Given the absence of effect between scrambled and IF1 ablated cells in mitochondrial respiration in the presence of norepinephrine and following the addition of oligomycin, we concluded no effect of lipolysis-induced UCP1-independent uncoupling. However, as observed by the reviewer and consistent with other data within the study, the interaction between lipid metabolism and IF1 knockdown seems to affect maximal electron transport chain activity, which although interesting, was not the focus of the present study. Nevertheless, we have now acknowledged these findings and a possible explanation for them in the revised manuscript.

      In Figure 3K they present OCR as % of baseline, but in a similar experiment in Figire 4G it is OCR/protein, they should make the Y axis consistent across experiments.

      RESPONSE: We apologize for this overlook. We have now edited all the axes and labels for consistency.

      The graphical abstract is confusing. In BAT there are two populations of mitochondria, the cytosolic and the mitochondria attached to the lipid droplet, peridroplet mitochondria (PDM). Upon adrenergic stimulation, PDM leave the lipid droplet and lipolysis takes place. The authors propose that upon adrenergic stimulation, IF1 is reduced and there is lipid mobilization. The part of the scheme where it says "fully recruited" should be removed or rewritten, since adrenergic stimulation is not compatible with mitochondria recruitment around the lipid droplet.

      RESPONSE: Thank you for this input. Given the addition of new experiments and interpretation, we have now redrawn the graphical abstract and addressed this topic in the discussion section.

      The title should be rewritten to better reflect the research presented in the manuscript.

      RESPONSE: Thank you for this input. Given the addition of new experiments, we have now rewritten the title accordingly.

      Minor comments:

      Some of the Y axis should be corrected. For example, in Figure 2J, L and M should say % of EV untreated, Similarly, in Figure 2E, it should say % of scramble untreated. In Figure 3N, the Y axis is misspelled. All the Y axis referring to percentages should have the same scale for comparison purposes.

      RESPONSE: Thank you for the proofreading. We have now edited the scales and labels to keep consistency.

      The authors should describe better the results corresponding to Figure 2. There is a lot of information and they should improve the description pertaining the connection between the different pieces of data relating the different signaling pathways that are shown. For westerns in this Figure, they should provide some rationale (one to two sentences in the results section) as to why they are checking the expression of pAMPK and p38-MAPK.

      RESPONSE: We have now edited the description of our results to make them as clear as possible.

      Here are some comments referring to the methods section:

      For Complex V hydrolytic activity, the reaction buffer contains 10mM Na-azide. I guess this is to inhibit respiration, but wouldn't azide also inhibit complex V at this concentration?

      RESPONSE: We thank the reviewer for this question. To test that, we performed complex V activity in buffers containing or not 10 mM sodium azide. As demonstrated below, the presence of sodium azide in the buffer does not influence complex V activity in two different tissues with low and high complex V activity (BAT and heart, respectively).

      Table 1. ATP synthase hydrolytic activity in the presence or absence of Na-azide.

      BAT

      Heart

      +Na-azide

      100 ± 43.01

      100 ± 39.36

      -Na-azide

      82.6 ± 4.33

      111.3 ± 43.32

      +Na-azide + oligomycin

      15.3 ± 4.32*

      13.8 ± 14.01*

      -Na-azide + oligomycin

      14.2 ± 3.53*

      11.9 ± 2.88*

      Data presented as % of control (i.e. presence of Na-azide and absence of oligomycin) for both tissues independently. N = 2-3/condition. Statistical test: two-way ANOVA. * main effect of oligomycin (p In the mitochondrial isolation protocol, they say "mitochondria were centrifuged at 800g for 10min..." Will this speed pellet the mitochondria? I think this is a mistake in writing.

      RESPONSE: We apologize for the lack of clarity. What was centrifuged at 800 g was the whole-tissue homogenate to discard cellular debris, before pelleting mitochondria at 5000 g. We have now corrected this mistake in the methods section.

      For the safranin-O experiment, they don't mention mitochondrial substrate used, probably it's in the reference that they provide, but I think it should be included in the text.

      RESPONSE: We did not use any substrate because our goal was to test the contribution of ATP synthase to mitochondrial membrane potential. For that, we inhibited proton movement within the ETC with antimycin A and through UCP1 with GDP (see Methods). We have now edited our Method’s description to make sure the reader is aware of our approach.

      Reviewer #3 (Significance (Required)):

      The manuscript is well written, and it flows well when reading. However, there are some additional experiments that need to be performed to reach the conclusions the authors claim.

      RESPONSE: We thank the reviewer for the positive commentaries regarding our work and hope to have answered the open questions with the edits and new experiments.

      The role of ATP hydrolysis in BAT thermogenesis is novel and interesting as it can sed some light onto potential approaches to promotes BAT activation.

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      This is an interesting investigation into the activity of IF1 in brown adipocytes. The findings are innovative and the conclusion is well-supported by the data. The conclusion is in line with previous reports on IF1 activities in other cell types, particularly in terms of its regulation of FoF1-ATPase. The authors have executed an exceptional job in designing the study, preparing the figures, and writing the manuscript. Overall, this study significantly contributes to the understanding of IF1 activity in brown adipocytes and its role in thermogenesis.

      RESPONSE: We thank the reviewer for the kind words. Please, find below our answers in a point-by-point manner.

      Reviewer #4 (Significance (Required)):

      The study demonstrates involvement of IF1 in regulating thermogenesis in brown adipocytes, which is a unique aspect not covered in existing literature. Advantage of the study is well-designed cellular studies. The major weakness is lack of proof of conclusion in vivo. There are a few minor concerns that should be addressed to further enhance quality of the manuscript.

      RESPONSE: We have now included two in vivo models, whole-body IF1 KO mice and BAT-injected IF1 overexpression to test the role of IF1 in BAT biology. The whole dataset is included in the main manuscript, where we conclude the BAT IF1 overexpression partially suppresses b3-adrenergic induction of thermogenesis alongside a reduction (overall and UCP1 dependent) in mitochondrial oxygen consumption. Also, similar to our in vitro experiments, IF1 KO mice did not present any difference in adrenergic-stimulated oxygen consumption.

      1. Current discussion does not mention the regulation of IF1 protein by the cAMP/PKA pathway. This point should be included to provide a comprehensive understanding of the regulatory mechanisms of IF1 protein. RESPONSE: Thank you for this suggestion. We have now added this topic to the discussion.

      It has been reported that IF1 also influences the structure of mitochondrial crista. Considering the observed changes with IF1 knockdown, it would be valuable to discuss this activity in relation to the findings of the study.

      RESPONSE: We discussed the implications of IF1 modulation in mitochondrial morphology in the revised manuscript.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2023-02218R

      Corresponding author(s): Steven, McMahon

      1. General Statements [optional]

      *We were pleased to receive the encouraging critiques and very much appreciate the Reviewer's specific comments and suggestions. In this revised version of our manuscript, we have made a number of substantive additions and modifications in response to these comments/suggestions. We hope you agree that the study is now improved to the point where it is suitable for publication. *

      2. Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary This study describes efforts to characterize differences in the roles of the two related human decapping factors Dcp1a and Dcp1b by assessing mRNA decay and protein associations in knockdown and knockout cell lines. The authors conclude that these proteins are non-redundant based on the observations that loss of DCp1a versus Dcp1b impacts the decapping complex (interactome) and the transcriptome differentially.

      Major comments • While the experiments appear to be well designed and executed and the data of generally high quality, the conclusions are drawn without sufficient consideration for the fact that these two proteins form a heterotrimeric complex. The authors assume that there are distinct homotrimeric complexes rather than a single complex with both proteins in. Homotrimers may have new/different functions not normally seen when both proteins are expressed. Thus while it is acceptable to infer that the functions of these two proteins within the decapping complex are distinct, it is not clear that they act separately, or that complexes naturally exist without one or the other. A careful evaluation of the relative ratios of Dcp1a and b overall and in decapping complexes would be informative if the authors want to make stronger statements about the roles of these two factors.

      RESPONSE: Thank you for this valuable comment. We have substantially edited the manuscript to incorporate these points. Examples include a detailed analysis of iBAQ values for the DDX6, DCP1a, and DCP1b interactomes (which now allows us to estimate the ratios of DCP1a and DCP1b in these complexes) and cellular fractionation to interrogate complex integrity (using Superose 6).

      • The concept of buffering is not adequately introduced and the interpretation of observations that RNAs with increased half life do not show increased protein abundance - that Dcp1a/b are involved in transcript buffering is nebulous. In order to support this interpretation, the mRNA abundances (NOT protein abundances) should be assessed, and even then, there is no way to rule out indirect effects. RESPONSE: Thank you for this comment. In the revised version of the manuscript, we introduced the concept of transcript buffering at an earlier stage as one of the potential explanations for our findings. We were also able to use a new algorithm (grandR) to estimate half-lives and synthesis rates from our data. These new data add strength to the argument that DCP1a and DCP1b are linked to transcript buffering pathways.

      • It might be interesting to see what happens when both factors are depleted to get an idea of the overall importance of each one.

      RESPONSE: In our work we tried to emphasize the differences between the two paralogs. We believe that doing double knockout or knockdown would mask the distinct impacts of the paralogs. In data not included in this study, we have shown that cells lacking both DCP1a and DCP1b are viable. We did check PARP cleavage in the CRISPR generated cell pools of DCP1a KO, DCP1b KO, and the double KO. The WB measuring the PARP cleavage is shown in the supplemental material (Supplementary Material: Replicates)

      • The algorithms etc used for data analysis should be included at the time of publication. Version number and settings used for SMART to define protein domains, and webgestalt should be indicated

      RESPONSE: We apologize for this oversight. Version number and settings used for the webtools (SMART, Webgestalt) are now included. The analysis pipeline for half-lives and synthesis rates estimation as well as all the files and the code needed to generate the figures in the paper are available on zenodo (https://zenodo.org/records/10725429).

      • Statistical analysis is not provided for the IP experiments, the number of replicates performed is not indicated and quantification of KD efficiency are not provided.

      RESPONSE: The number of replicates performed in each experiment is now clearly indicated and quantifications of knockdown efficiency are provided (Supplemental Figure 3A and 3B, Figure 3A, Figure 3B).

      • The possibility that the IP Antibody interferes with protein-protein interactions is not mentioned.

      RESPONSE: Thank you for this comment. The revised manuscript includes a discussion of the antibody epitope location and the potential for impact on protein-protein interactions.

      Minor comments • P4 - "This translational repression of mRNA associated with decapping can be reversed, providing another point at which gene expression can be regulated (21)" - implies that decapping can be reversed or that decapped RNAs are translated. I don't think this is technically true.

      RESPONSE: There have been several studies that document the reversal of decapping. These findings are summarized in the following reviews.

      Schoenberg, D. R., & Maquat, L. E. (2009). Re-capping the message. Trends in biochemical sciences, 34(9), 435-442.

      Trotman, J. B., & Schoenberg, D. R. (2019). A recap of RNA recapping. Wiley Interdisciplinary Reviews: RNA, 10(1), e1504.

      • P11 - how common is it for higher eukaryotes to have 2 DCP genes? *RESPONSE: Metazoans have 2 DCP1 genes. *

      • Fig S1 - says "mammalian tissues" in the text but the data is all human. The statement that "expression analyses revealed that DCP1a and DCP1b have concordant rather than reciprocal expression patterns across different mammalian tissues (Supplemental Figure 1)" is a bit misleading as no evidence for correlation or anti-correlation is provided. Also co-expression is not strong support for the idea that these genes have non-redundant functions. Both genes are just expressed in all tissues - there's no evidence provided that they are concordantly expressed. In bone marrow it may be worth noting that one is high and the other low - i.e. reciprocal. *RESPONSE: We appreciate this comment. We have corrected the interpretation of the aforementioned dataset. We have also incorporated a more detailed discussion in the text of the paper. As the Reviewer pointed out, there are a subset of tissues where their expression appears to be reciprocal. *

      • Fig 1A - it is not clear what the different colors mean. Does Sc DCP1 have 1 larger EVH or 2 distinct ones. Are the low complexity regions in Sc DCP2 the SLiMs. *RESPONSE: Thank you for this comment. We have corrected this ambiguity to reflect that Sc DCP1 has one EVH1 domain that is interconnected by a flexible hinge. The low-complexity regions typically contain short linear motifs (SLIMs), however, not all low-complexity regions have been verified to contain them. In the figure, only low-complexity regions are shown. The text of the paper refers only to verified SLIMs . *

      • P11 - why were HCT116 cells selected? RESPONSE: HCT116 cells are an easily transfectable human cell line and have been widely used in biochemical and molecular studies, including studies of mRNA decapping (see references below). Since decapping is impacted by viral proteins we avoided the use of other commonly used cell models such as HEK293T or HeLa.

      https://pubmed.ncbi.nlm.nih.gov/?term=decapping+hct116&sort=date&size=200

      • Fig 1B - what are the asterisks by the RNA names? Might be worth noting that over-expression of DCP1b reduced IP of DCP1a. There's no quantification and no indication of the number of times this experiment was repeated. Data from replicates and quantification of the knockdown efficiency in each replicate would be nice to see. *RESPONSE: Thank you for this comment. Asterisks indicate that those bands were from a second gel, as DCP1a and DCP1b run at approximately the same molecular weight. We have now included a note in our figure legend to indicate this. The knockdown efficiency is provided (Figure 3 and Supplemental Figure 3). We also noted the number of replicas for each IP in figure 1. The replicas are provided as supplementary material (Supplementary Materials: Replicates). *

      • Fig 1C/1D - why are there 3 bands in the DCP1a blot? Quantification of the IP bands is necessary to say whether there is an effect or not of over-expression/KO. RESPONSE: The additional bands in DCP1a blots are background. When we stained the whole blot for DCP1a, in cells which with complete DCP1a KO cells (clone A3), these bands still appear (Supplementary Material: Validation of the KO clones). Quantifications of the bands in the overexpression experiments is now provided.

      • Fig 3 - is it possible that differences are due to epitope positions for the antibodies used for IP? RESPONSE: We do not believe so. DCP1a antibody binds roughly 300-400 residues on DCP1a, and DCP1b antibody binds around Val202. Antibodies therefore do not bind DCP1a or DCP1b low-complexity regions (which are largely responsible for interacting with the decapping complex interactome). Antibodies don't bind the EVH1 domains or the trimerization domain, which are needed for their interaction with DCP2 and each other.

      • Fig 5A - the legend doesn't match the colors in the figure. It is not clear how the pRESPONSE: Thank you for this comment. We have corrected this issue in the revised version of the paper. High-confidence proteins are those with pRESPONSE: Thank you for this comment. We have corrected this issue in the revised version of the paper.*

      • There are a few more recent studies on buffering that should be cited and more discussion of this in the introduction is necessary if conclusions are going to be drawn about buffering. *RESPONSE: We have included a discussion of transcript buffering in the introduction. *

      • The heatmaps in figure 2 are hard to interpret. RESPONSE: To clarify the heatmaps, we included a more detailed description in the figure legends, have enlarged the heatmaps themselves, and have added more extensive labeling.

      Reviewer #1 (Significance (Required)):

      • Strengths: The experiments appear to be done well and the datasets should be useful for the field. • Limitations: The results are overinterpreted - different genes are affected by knocking down one or other of these two similar proteins but this does not really tell us all that much about how the two proteins are functioning in a cell where both are expressed. • Audience: This study will appeal most to a specialized audience consisting of those interested in the basic mechanisms of mRNA decay. Others may find the dataset useful. • This study might complement and/or be informed by another recent study in BioRXiv - https://doi.org/10.1101/2023.09.04.556219 • My field of expertise is mRNA decay - I am qualified to evaluate the findings within the context of this field. I do not have much experience of LC-MS-MS and therefore cannot evaluate the methods/analysis of this part of the study.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The authors provide evidence that Dcp1a and Dcp1b - two paralogous proteins of the mRNA decapping complex - may have divergent functions in a cancer cell line. In the first part, the authors show that interaction of Dcp2 with EDC4 is diminished upon depletion of Dcp1a but not affected by depletion of Dcp1b. The results have been controlled by overexpression of Dcp1b as it may be limiting factor (i.e. expression levels too low to compensate for depletion of Dcp1a reduced interaction with EDC3/4 while depletion of Dcp1b lead to opposite and increase interactions). They then defined the protein interactome of DDX6 in parental and Dcp1a or Dcp1b depleted cells. Here, the authors show some differential association with EDC4 again, which is along results shown in the first part. The authors further performed SLAM-seq and identified subsets of mRNA whose decay rates are common but also different upon depletion with Dcp1a and Dcp1b. Interestingly, it seems that Dcp1a preferentially targets mRNAs for proteins regulating lymphocyte differentiation. To further test whether changes in RNA decay rates are also reflected at the protein levels, they finally performed an MS analysis with Dcp1a/b depleted cells. However no significant overlap with mRNAs showing altered stability could be observed; and the authors suggested that the lack of congruence reflects translational repression.

      Major comments: 1. While functional difference between Dcp1a and Dcp1b are interesting and likely true, there are overinterpretations that need correction or further evidence for support. Sentences like "DCP1a regulates RNA cap binding proteins association with the decapping complex and DCP1b controls translational initiation factors interactions (Figure 2E)" sound misleading. While differential association with proteins has been recognised with MS-data, it does not necessary implement an active process of control/regulation. To make the claim on 'control/regulation', and inducible system or introduction of mutants would be required.

      RESPONSE: This set of comments were particularly useful in helping us refine the presentation of our findings. We have edited our manuscript to be more specific about the limits of our data.

      1. The MS analysis is not clearly described in the text and it is unclear how authors selected high-confident proteins. The reader needs to consider the supplemental tables to find out what controls were used. Furthermore, the authors should show correlation plots of MS data between replicates. For instance, there seems to be limited correlation among some of the replicates (e.g. Dcp1b_ko3 sample, Fig. 2c). Any explanation in this variance?

      *RESPONSE: We have now included a clear description of how all high-confidence proteins were selected in the Methods and Results sections. The revised manuscript also includes a more thorough description of the controls used and the number of replicates for individual experiments. The PCA plots have now been included where appropriate. The variance in this sample is likely technical. *

      1. GO analysis for the proteome analysis should consider the proteome and not the genome as the background. The authors should also indicate the corrected P-values (multiple testing) FDRs.

      *RESPONSE: Webgestalt uses a reference set of IDs to recognize the input IDs, and it does not use it for the background analysis in the classical sense. We repeated a subset of our proteome analyses using the 'genome-protein coding' as background and obtained the same result as in our original analysis. All ontology analyses now include raw p-values and/or FDRs when appropriate. *

      1. Fig 2E. The figures display GO enrichments needs better explanation and additional data can be added. The enrichment ratio is not explained (is this normalised?) and p-values and FDRs, number of proteins in respective GO category should be added. *RESPONSE: More thorough explanations of the GO enrichments are now included. The supplemental data contains all p-values (raw and adjusted), as well as the number of proteins in each GO category. The Enrichment ratio is normalized and contains information about the number of proteins that are redundant in multiple groups. GO Ontology analyses are now displayed with p-values and/or FDR values, and in this case the enrichment ratio contains information regarding the number of proteins found in our input set and the number of expected proteins in the GO group. The network analysis shows the FDR values and the number of proteins found in the groups compared. *

      Minor: 5. These studies were performed in a colorectal carcinoma cell line (HCT116). The authors should justify the choice of this specialised cell line. Furthermore, one wonders whether similar conclusions can be drawn with other cell lines or whether findings are specific to this cancer line.

      RESPONSE: The study that is currently in pre-print in BioRxiv (https://doi.org/10.1101/2023.09.04.556219*) utilized HEK293Ts and found similar results to ours when examining the various relationships between the core decapping core members. *

      1. Fig. 1B. It is unclear what DCP1b* refers to? There are bands of different size that are not mentioned by the authors - are those protein isoforms or what are those referring to? A molecular marker should be added to each Blots. Uncropped Western images and markers should be provided in the Supplement. *RESPONSE: The asterisk indicates that these images came from a second western blot gel (DCP1a and DCP1b have a similar molecular weight and cannot be probed on the same membrane). Uncropped western blot images and markers (as available) are provided in the supplement. *

      2. MS data submitted to public repository with access. No. indicated in the manuscript.

      RESPONSE: MS data is submitted as supplementary datasets to the paper. It contains the analyzed data as well as the LCMSMS output. We are in the process of submitting the raw LSMSMS data to a public repository.

      Fig 3. A Venn Diagram displaying the overlap of identified proteins should be added. GO analysis should be done considering the proteome as background (as mentioned above).

      *RESPONSE: A Venn diagram showing the overlap among the proteins identified is now included in the revised version. *

      Reviewer #2 (Significance (Required)):

      Overall, this is a large-scale integrative -omics study that suggest functional difference between Dcp1 paralogues. While it seems clear that both paralogous have some different functions and impact, there are overinterpretations in place and further evidence would to be provided to substantiate conclusions made in the paper. For instance, while the interactions with Dcp2/Ddx6 in the absence of Dcp1a,b with EDC4/3 may be altered (Fig. 1, 2), the functional implications of this changed associations remains unresolved and not further discussed. As such, it remains somehow disconnected with the following experiments and compromises the flow of the study. The observed differences in decay-rates for distinct functionally related sets of mRNAs is interesting; however, it remains unclear whether those are direct or rather indirect effects. This is further obscured by the absence of any correlation to changes in protein levels, which the authors interpreted as 'transcriptional buffering'. In this regard, it is puzzling how the authors can make a statement about transcriptional buffering? While this may be an interesting aspect and concept of the discussion, there is no primary data showing such a functional impact.

      As such, the study is interesting as it claims functional differences between DCP1a/b paralogous in a cancer cell line. Nevertheless, I am not sure how trustful the MS analysis and decay measurements are as there is not further validation. It woudl be interesting if the authors could go a bit further and draw some hypothesis how the selectivty could be achieved i.e interaction with RNA-binding proteins that may add some specificity towards the target RNAs for differential decay. As such, the study remains unfortunately rather descriptive without further functional insight.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Review on "Non-redundant roles for the human mRNA decapping cofactor paralogs DCP1a and DCP1b" by Steven McMahon and co-workers mRNA decay is a critical step in the regulation of gene expression. In eukaryotes, mRNA turnover typically begins with the removal of the poly(A) tail, followed by either removal of the 5' cap structure or exonucleolytic 3'-5' decay catalyzed by the exosome. The decapping enzyme DCP2 forms a complex with its co-activator DCP1, which enhances decapping activity. Mammals are equipped with two DCP1 paralogs, namely DCP1a and DCP1b. Metazoans' decapping complexes feature additional components, such as enhancer of decapping 4 (EDC4), which supports the interaction between DCP1 and DCP2, thereby amplifying the efficiency of decapping. This work focuses on DCP1a and DCP1b and investigates their distinct functions. Using DCP1a- and DCP1a-specific knockdowns as well as K.O. cell lines, the authors find surprising differences between the DCP1 paralogs. While DCP1a is essential for the assembly of EDC4-containig decapping complexes and interactions with mRNA cap binding proteins, DCP1b mediates interactions with the translational machinery. Furthermore, DCP1a and DCP1b target different mRNAs for degradation, indicating that they execute non-overlapping functions. The findings reported here expand our understanding of mRNA decapping in human cells, shedding light on the unique contributions of DCP1a and DCP1b to mRNA metabolism. The manuscript tackles an interesting subject. Historically, the emphasis has been on studying DCP1a, while DCP1b has been deemed a functionally redundant homolog of DCP1a. Therefore, it is commendable that the authors have taken on this topic and, with the help of knockout cell lines, aimed to dissect the function of DCP1a and DCP1b. Despite recognizing the significance of the subject and approach, the manuscript falls short of persuading me. Following a promising start in Figure 1 (which still has room for improvement), there is a distinct decline in overall quality, with only relatively standard analyses being conducted. However, I do not want to give the authors a detailed advice on maximizing the potential of their data and presenting it convincingly. So, here are just a few key points for improvement: Figure 1C: Upon closer examination, a faint band is still visible at the size of DCP1a in the DCP1a knockout cells. Could this be leaky expression of DCP1a? The authors should provide an in-depth characterization of their cells (possibly as supplementary material), including identification of genomic changes (e.g. by sequencing of the locus) and Western blots with longer exposure, etc.

      *RESPONSE: Thank you for this comment. The in-depth characterization of our cells is now included in the Supplementary Material. DCP1a KO cells and DCP1b KO cells indicated as single cell clones have been confirmed to have no DCP1a or DCP1b expression. In Figure 1D and Figure 3, polyclonal pool cells were used as indicated (only for DCP1a KO). *

      Figure 2: It is great to see that the effects of the KOs are also visible in the DDX6 immunoprecipitation. However, I wonder if the IP clearly confirms that the KO cells indeed do not express DCP1a or DCP1b. In the heatmap in Figure 2B, it appears as if the proteins are only reduced by a log2-fold change of approximately 1.5? Additionally, Figure 2 shows a problem that persists in the subsequent figures. The visual presentation is not particularly appealing, and essential details, such as the scale of the heatmap in 2B (is it log2 fold?), are lacking.

      *RESPONSE: The in-depth characterization of our cells is included in the Supplementary Materials and confirms the presence of single-cell clones where indicated. As noted above, only Figure 1D and Figure 3 used DCP1a KO pooled cells. The heatmap in Figure 2B is scaled by row using the pheatfunction in R studio. The actual data for the heatmap comes from protein intensities from the LC-MS/MS analysis. We have improved the visual presentation in the revised manuscript. *

      Figure 3: I wonder why there are no primary data shown here, only processed GO analyses. Wouldn't one expect that DCP2 interacts mainly with DCP1a, but less with DCP1b? Is this visible in the data? Moreover, such analyses are rather uninformative (as reflected in the GO terms themselves, for instance, "oxoglutarate dehydrogenase complex" doesn't provide much meaningful insight). The authors should rather try to derive functional and mechanistic insights from their data.

      RESPONSE: We have now revised this Figure to include primary data as well as the IP of DCP1a in DCP1b KO cells (single cell clones) and the IP of DCP1b in DCP1a KO cells (pooled cells). We identified EDC3 in the high-confidence protein pool. The EDC3:DCP1a interaction is enhanced in DCP1b KO cells. We also found that the EDC3:DCP1b interaction is less abundant in DCP1a KO cells. This is consistent with our data in Figures 1 and 2. DCP2 was not identified in the interactomes of either DCP1a or DCP1b. This is not unusual as DCP2 is highly flexible and the association between DCP1s with DCP2 is transient and facilitated by other proteins.

      In Fig. 4 the potential of the approach is not fully exploited. Firstly, I would advocate for omitting the GO analyses, as, in my opinion, they offer little insight. Again, crucial information is missing to assess the results. While 75 nt reads are mentioned in the methods, the sequencing depth remains unspecified. Figure 4b should be included in the supplements. Furthermore, I strongly recommend concentrating on insights into the mechanisms of DCP1a and DCP1b-containing complexes. E.g. what characteristics distinguish DCP1a and DCP1b-dependent mRNAs? Are these targets inherently unstable? Why are they degraded? Are they known decapping substrates?

      *RESPONSE: Thank you for this comment. We have now revised this figure and have included information about sequencing depth and other pertinent information. We have been able to use a newly available algorithm (grandR) and were able to estimate half-lives and synthesis rates. This is a significant addition to the paper. We were also able to compare significantly impacted mRNAs (by DCP1a or DCP1b loss) to the established DCP2 target list. *

      In general, I suggest the authors revise the manuscript with a focus on the potential readers. Reduce Gene Ontology (GO) analyses and heatmaps, and instead, incorporate more analyses regarding the molecular processes associated with the different decapping complexes.

      *RESPONSE: We removed selected GO analyses and heatmaps from the main body of the manuscript (included as Supplementary Figures instead). For our LC-MS/MS datasets, we added iBAQ analyses of the DDX6 IP, DCP1a IP, and DCP1b IP in the control conditions. Cellular fractionation studies (using Superose 6 chromatography) were also added to the paper and allow us to interrogate decapping complex composition in more detail. The revised version of the manuscript includes a new 4SU labeling experiment (pulse-chase) as well as estimation of half-lives and synthesis rates in our conditions. Also included is relevant information about DCP1b transcriptional regulation. *

      Reviewer #3 (Significance (Required)):

      The manuscript in its current form could benefit from substantial revisions for it to be considered impactful for researchers in the field.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      I have trialled the package on my lab's data and it works as advertised. It was straightforward to use and did not require any special training. I am confident this is a tool that will be approachable even to users with limited computational experience. The use of artificial data to validate the approach - and to provide clear limits on applicability - is particularly helpful.

      The main limitation of the tool is that it requires the user to manually select regions. This somewhat limits the generalisability and is also more subjective - users can easily choose "nice" regions that better match with their hypothesis, rather than quantifying the data in an unbiased manner. However, given the inherent challenges in quantifying biological data, such problems are not easily circumventable.

      *

      * I have some comments to clarify the manuscript:

      1. A "straightforward installation" is mentioned. Given this is a Method paper, the means of installation should be clearly laid out.*

      __This sentence is now modified. In the revised manuscript we now describe how to install the toolset and we give the link to the toolset website if further information is needed. __On this website, we provide a full video tutorial and a user manual. The user manual is provided as a supplementary material of the manuscript.

      * It would be helpful if there was an option to generate an output with the regions analysed (i.e., a JPG image with the data and the drawn line(s) on top). There are two reasons for this: i) A major problem with user-driven quantification is accidental double counting of regions (e.g., a user quantifies a part of an image and then later quantifies the same region). ii) Allows other users to independently verify measurements at a later time.*

      We agree that it is helpful to save the analyzed regions. To answer this comment and the other two reviewers' comments pointing at a similar feature, we have now included an automatic saving of the regions of interest. The user will be able to reopen saved regions of interest using a new function we included in the new version of PatternJ.

      * 3. Related to the above point, it is highlighted that each time point would need to be analysed separately (line 361-362). It seems like it should be relatively straightforward to allow a function where the analysis line can be mapped onto the next time point. The user could then adjust slightly for changes in position, but still be starting from near the previous timepoint. Given how prevalent timelapse imaging is, this seems like (or something similar) a clear benefit to add to the software.*

      We agree that the analysis of time series images can be a useful addition. We have added the analysis of time-lapse series in the new version of PatternJ. The principles behind the analysis of time-lapse series and an example of such analysis are provided in Figure 1 - figure supplement 3 and Figure 5, with accompanying text lines 140-153 and 360-372. The analysis includes a semi-automated selection of regions of interest, which will make the analysis of such sequences more straightforward than having to draw a selection on each image of the series. The user is required to draw at least two regions of interest in two different frames, and the algorithm will automatically generate regions of interest in frames in which selections were not drawn. The algorithm generates the analysis immediately after selections are drawn by the user, which includes the tracking of the reference channel.

      * Line 134-135. The level of accuracy of the searching should be clarified here. This is discussed later in the manuscript, but it would be helpful to give readers an idea at this point what level of tolerance the software has to noise and aperiodicity.

      *

      We agree with the reviewer that a clarification of this part of the algorithm will help the user better understand the manuscript.__ We have modified the sentence to clarify the range of search used and the resulting limits in aperiodicity (now lines 176-181). __Regarding the tolerance to noise, it is difficult to estimate it a priori from the choice made at the algorithm stage, so we prefer to leave it to the validation part of the manuscript. We hope this solution satisfies the reviewer and future users.

      *

      **Referees cross-commenting**

      I think the other reviewer comments are very pertinent. The authors have a fair bit to do, but they are reasonable requests. So, they should be encouraged to do the revisions fully so that the final software tool is as useful as possible.

      Reviewer #1 (Significance (Required)):

      Developing software tools for quantifying biological data that are approachable for a wide range of users remains a longstanding challenge. This challenge is due to: (1) the inherent problem of variability in biological systems; (2) the complexity of defining clearly quantifiable measurables; and (3) the broad spread of computational skills amongst likely users of such software.

      In this work, Blin et al., develop a simple plugin for ImageJ designed to quickly and easily quantify regular repeating units within biological systems - e.g., muscle fibre structure. They clearly and fairly discuss existing tools, with their pros and cons. The motivation for PatternJ is properly justified (which is sadly not always the case with such software tools).

      Overall, the paper is well written and accessible. The tool has limitations but it is clearly useful and easy to use. Therefore, this work is publishable with only minor corrections.

      *We thank the reviewer for the positive evaluation of PatternJ and for pointing out its accessibility to the users.

      *

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      # Summary

      The authors present an ImageJ Macro GUI tool set for the quantification of one-dimensional repeated patterns that are commonly occurring in microscopy images of muscles.

      # Major comments

      In our view the article and also software could be improved in terms of defining the scope of its applicability and user-ship. In many parts the article and software suggest that general biological patterns can be analysed, but then in other parts very specific muscle actin wordings are used. We are pointing this out in the "Minor comments" sections below. We feel that the authors could improve their work by making a clear choice here. One option would be to clearly limit the scope of the tool to the analysis of actin structures in muscles. In this case we would recommend to also rename the tool, e.g. MusclePatternJ. The other option would be to make the tool about the generic analysis of one-dimensional patterns, maybe calling the tool LinePatternJ. In the latter case we would recommend to remove all actin specific wordings from the macro tool set and also the article should be in parts slightly re-written.

      *

      We agree with the reviewer that our initial manuscript used a mix of general and muscle-oriented vocabulary, which could make the use of PatternJ confusing especially outside of the muscle field. To make PatternJ useful for the largest community, we corrected the manuscript and the PatternJ toolset to provide the general vocabulary needed to make it understandable for every biologist. We modified the manuscript accordingly.

      * # Minor/detailed comments

      # Software

      We recommend considering the following suggestions for improving the software.

      ## File and folder selection dialogs

      In general, clicking on many of the buttons just opens up a file-browser dialog without any further information. For novel users it is not clear what the tool expects one to select here. It would be very good if the software could be rewritten such that there are always clear instructions displayed about which file or folder one should open for the different buttons.*

      We experienced with the current version of macOS that the file-browser dialog does not display any message; we suspect this is the issue raised by the reviewer. This is a known issue of Fiji on Mac and all applications on Mac since 2016. We provided guidelines in the user manual and on the tutorial video to correct this issue by changing a parameter in Fiji. Given the issues the reviewer had accessing the material on the PatternJ website, which we apologize for, we understand the issue raised. We added an extra warning on the PatternJ website to point at this problem and its solution. Additionally, we have limited the file-browser dialog appearance to what we thought was strictly necessary. Thus, the user will experience fewer prompts, speeding up the analysis.

      *

      ## Extract button

      The tool asks one to specify things like whether selections are drawn "M-line-to-M-line"; for users that are not experts in muscle morphology this is not understandable. It would be great to find more generally applicable formulations. *

      We agree that this muscle-oriented vocabulary can make the use of PatternJ confusing. We have now corrected the user interface to provide both general and muscle-specific vocabulary ("center-to-center or edge-to-edge (M-line-to-M-line or Z-disc-to-Z-disc)").*

      ## Manual selection accuracy

      The 1st step of the analysis is always to start from a user hand-drawn profile across intensity patterns in the image. However, this step can cause inaccuracy that varies with the shape and curve of the line profile drawn. If not strictly perpendicular to for example the M line patterns, the distance between intensity peaks will be different. This will be more problematic when dealing with non-straight and parallelly poised features in the image. If the structure is bended with a curve, the line drawn over it also needs to reproduce this curve, to precisely capture the intensity pattern. I found this limits the reproducibility and easy-usability of the software.*

      We understand the concern of the reviewer. On curved selections this will be an issue that is difficult to solve, especially on "S" curved or more complex selections. The user will have to be very careful in these situations. On non-curved samples, the issue may be concerning at first sight, but the errors go with the inverse of cosine and are therefore rather low. For example, if the user creates a selection off by 5 degrees, which is visually obvious, lengths will be affected by an increase of only 0.38%. The point raised by the reviewer is important to discuss, and we therefore added a paragraph to comment on the choice of selection (lines 94-98) and a supplementary figure to help make it clear (Figure 1 - figure supplement 1).*

      ### Reproducibility

      Since the line profile drawn on the image is the first step and very essential to the entire process, it should be considered to save together with the analysis result. For example, as ImageJ ROI or ROIset files that can be re-imported, correctly positioned, and visualized in the measured images. This would greatly improve the reproducibility of the proposed workflow. In the manuscript, only the extracted features are being saved (because the save button is also just asking for a folder containing images, so I cannot verify its functionality). *

      We agree that this is a very useful and important feature. We have added ROI automatic saving. Additionally, we now provide a simplified import function of all ROIs generated with PatternJ and the automated extraction and analysis of the list of ROIs. This can be done from ROIs generated previously in PatternJ or with ROIs generated from other ImageJ/Fiji algorithms. These new features are described in the manuscript in lines 120-121 and 130-132.

      *

      ## ? button

      It would be great if that button would open up some usage instructions.

      *

      We agree with the reviewer that the "?" button can be used in a better way. We have replaced this button with a Help menu, including a simple tutorial showing a series of images detailing the steps to follow by the user, a link to the user website, and a link to our video tutorial.

      * ## Easy improvement of workflow

      I would suggest a reasonable expansion of the current workflow, by fitting and displaying 2D lines to the band or line structure in the image, that form the "patterns" the author aims to address. Thus, it extracts geometry models from the image, and the inter-line distance, and even the curve formed by these sets of lines can be further analyzed and studied. These fitted 2D lines can be also well integrated into ImageJ as Line ROI, and thus be saved, imported back, and checked or being further modified. I think this can largely increase the usefulness and reproducibility of the software.

      *

      We hope that we understood this comment correctly. We had sent a clarification request to the editor, but unfortunately did not receive an answer within the requested 4 weeks of this revision. We understood the following: instead of using our 1D approach, in which we extract positions from a profile, the reviewer suggests extracting the positions of features not as a single point, but as a series of coordinates defining its shape. If this is the case, this is a major modification of the tool that is beyond the scope of PatternJ. We believe that keeping our tool simple, makes it robust. This is the major strength of PatternJ. Local fitting will not use line average for instance, which would make the tool less reliable.

      * # Manuscript

      We recommend considering the following suggestions for improving the manuscript. Abstract: The abstract suggests that general patterns can be quantified, however the actual tool quantifies specific subtypes of one-dimensional patterns. We recommend adapting the abstract accordingly.

      *

      We modified the abstract to make this point clearer.

      * Line 58: Gray-level co-occurrence matrix (GLCM) based feature extraction and analysis approach is not mentioned nor compared. At least there's a relatively recent study on Sarcomeres structure based on GLCM feature extraction: https://github.com/steinjm/SotaTool with publication: *https://doi.org/10.1002/cpz1.462

      • *

      We thank the reviewer for making us aware of this publication. We cite it now and have added it to our comparison of available approaches.

      * Line 75: "...these simple geometrical features will address most quantitative needs..." We feel that this may be an overstatement, e.g. we can imagine that there should be many relevant two-dimensional patterns in biology?!*

      We have modified this sentence to avoid potential confusion (lines 76-77).

      • *

      • Line 83: "After a straightforward installation by the user, ...". We think it would be convenient to add the installation steps at this place into the manuscript. *

      __This sentence is now modified. We now mention how to install the toolset and we provide the link to the toolset website, if further information is needed (lines 86-88). __On the website, we provide a full video tutorial and a user manual.

      * Line 87: "Multicolor images will give a graph with one profile per color." The 'Multicolor images' here should be more precisely stated as "multi-channel" images. Multi-color images could be confused with RGB images which will be treated as 8-bit gray value (type conversion first) images by profile plot in ImageJ. *

      We agree with the reviewer that this could create some confusion. We modified "multicolor" to "multi-channel".

      * Line 92: "...such as individual bands, blocks, or sarcomeric actin...". While bands and blocks are generic pattern terms, the biological term "sarcomeric actin" does not seem to fit in this list. Could a more generic wording be found, such as "block with spike"? *

      We agree with the reviewer that "sarcomeric actin" alone will not be clear to all readers. We modified the text to "block with a central band, as often observed in the muscle field for sarcomeric actin" (lines 103-104). The toolset was modified accordingly.

      * Line 95: "the algorithm defines one pattern by having the features of highest intensity in its centre". Could this be rephrased? We did not understand what that exactly means.*

      We agree with the reviewer that this was not clear. We rewrote this paragraph (lines 101-114) and provided a supplementary figure to illustrate these definitions (Figure 1 - figure supplement 2).

      * Line 124 - 147: This part the only description of the algorithm behind the feature extraction and analysis, but not clearly stated. Many details are missing or assumed known by the reader. For example, how it achieved sub-pixel resolution results is not clear. One can only assume that by fitting Gaussian to the band, the center position (peak) thus can be calculated from continuous curves other than pixels. *

      Note that the two sentences introducing this description are "Automated feature extraction is the core of the tool. The algorithm takes multiple steps to achieve this (Fig. S2):". We were hoping this statement was clear, but the reviewer may refer to something else. We agree that the description of some of the details of the steps was too quick. We have now expanded the description where needed.

      * Line 407: We think the availability of both the tool and the code could be improved. For Fiji tools it is common practice to create an Update Site and to make the code available on GitHub. In addition, downloading the example file (https://drive.google.com/file/d/1eMazyQJlisWPwmozvyb8VPVbfAgaH7Hz/view?usp=drive_link) required a Google login and access request, which is not very convenient; in fact, we asked for access but it was denied. It would be important for the download to be easier, e.g. from GitHub or Zenodo.

      *

      We are sorry for issues encountered when downloading the tool and additional material. We thank the reviewer for pointing out these issues that limited the accessibility of our tool. We simplified the downloading procedure on the website, which does not go through the google drive interface nor requires a google account. Additionally, for the coder community the code, user manual and examples are now available from GitHub at github.com/PierreMangeol/PatternJ, and are provided as supplementary material with the manuscript. To our knowledge, update sites work for plugins but not for macro toolsets. Having experience sharing our codes with non-specialists, a classical website with a tutorial video is more accessible than more coder-oriented websites, which deter many users.

      * Reviewer #2 (Significance (Required)):

      The strength of this study is that a tool for the analysis of one-dimensional repeated patterns occurring in muscle fibres is made available in the accessible open-source platform ImageJ/Fiji. In the introduction to the article the authors provide an extensive review of comparable existing tools. Their new tool fills a gap in terms of providing an easy-to-use software for users without computational skills that enables the analysis of muscle sarcomere patterns. We feel that if the below mentioned limitations could be addressed the tool could indeed be valuable to life scientists interested in muscle patterning without computational skills.

      In our view there are a few limitations, including the accessibility of example data and tutorials at sites.google.com/view/patternj, which we had trouble to access. In addition, we think that the workflow in Fiji, which currently requires pressing several buttons in the correct order, could be further simplified and streamlined by adopting some "wizard" approach, where the user is guided through the steps.

      *As answered above, the links on the PatternJ website are now corrected. Regarding the workflow, we now provide a Help menu with:

      1. __a basic set of instructions to use the tool, __
      2. a direct link to the tutorial video in the PatternJ toolset
      3. a direct link to the website on which both the tutorial video and a detailed user manual can be found. We hope this addresses the issues raised by this reviewer.

      *Another limitation is the reproducibility of the analysis; here we recommend enabling IJ Macro recording as well as saving of the drawn line ROIs. For more detailed suggestions for improvements please see the above sections of our review. *

      We agree that saving ROIs is very useful. It is now implemented in PatternJ.

      We are not sure what this reviewer means by "enabling IJ Macro recording". The ImageJ Macro Recorder is indeed very useful, but to our knowledge, it is limited to built-in functions. Our code is open and we hope this will be sufficient for advanced users to modify the code and make it fit their needs.*

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary In this manuscript, the authors present a new toolset for the analysis of repetitive patterns in biological images named PatternJ. One of the main advantages of this new tool over existing ones is that it is simple to install and run and does not require any coding skills whatsoever, since it runs on the ImageJ GUI. Another advantage is that it does not only provide the mean length of the pattern unit but also the subpixel localization of each unit and the distributions of lengths and that it does not require GPU processing to run, unlike other existing tools. The major disadvantage of the PatternJ is that it requires heavy, although very simple, user input in both the selection of the region to be analyzed and in the analysis steps. Another limitation is that, at least in its current version, PatternJ is not suitable for time-lapse imaging. The authors clearly explain the algorithm used by the tool to find the localization of pattern features and they thoroughly test the limits of their tool in conditions of varying SNR, periodicity and band intensity. Finally, they also show the performance of PatternJ across several biological models such as different kinds of muscle cells, neurons and fish embryonic somites, as well as different imaging modalities such as brightfield, fluorescence confocal microscopy, STORM and even electron microscopy.

      This manuscript is clearly written, and both the section and the figures are well organized and tell a cohesive story. By testing PatternJ, I can attest to its ease of installation and use. Overall, I consider that PatternJ is a useful tool for the analysis of patterned microscopy images and this article is fit for publication. However, i do have some minor suggestions and questions that I would like the authors to address, as I consider they could improve this manuscript and the tool:

      *We are grateful to this reviewer for this very positive assessment of PatternJ and of our manuscript.

      * Minor Suggestions: In the methodology section is missing a more detailed description about how the metric plotted was obtained: as normalized intensity or precision in pixels. *

      We agree with the reviewer that a more detailed description of the metric plotted was missing. We added this information in the method part and added information in the Figure captions where more details could help to clarify the value displayed.

      * The validation is based mostly on the SNR and patterns. They should include a dataset of real data to validate the algorithm in three of the standard patterns tested. *

      We validated our tool using computer-generated images, in which we know with certainty the localization of patterns. This allowed us to automatically analyze 30 000 images, and with varying settings, we sometimes analyzed 10 times the same image, leading to about 150 000 selections analyzed. From these analyses, we can provide with confidence an unbiased assessment of the tool precision and the tool capacity to extract patterns. We already provided examples of various biological data images in Figures 4-6, showing all possible features that can be extracted with PatternJ. In these examples, we can claim by eye that PatternJ extracts patterns efficiently, but we cannot know how precise these extractions are because of the nature of biological data: "real" positions of features are unknown in biological data. Such validation will be limited to assessing whether a pattern was found or not, which we believe we already provided with the examples in Figures 4-6.

      * The video tutorial available in the PatternJ website is very useful, maybe it would be worth it to include it as supplemental material for this manuscript, if the journal allows it. *

      As the video tutorial may have been missed by other reviewers, we agree it is important to make it more prominent to users. We have now added a Help menu in the toolset that opens the tutorial video. Having the video as supplementary material could indeed be a useful addition if the size of the video is compatible with the journal limits.

      * An example image is provided to test the macro. However, it would be useful to provide further example images for each of the three possible standard patterns suggested: Block, actin sarcomere or individual band.*

      We agree this can help users. We now provide another multi-channel example image on the PatternJ website including blocks and a pattern made of a linear intensity gradient that can be extracted with our simpler "single pattern" algorithm, which were missing in the first example. Additionally, we provide an example to be used with our new time-lapse analysis.

      * Access to both the manual and the sample images in the PatternJ website should be made publicly available. Right now they both sit in a private Drive account. *

      As mentioned above, we apologize for access issues that occurred during the review process. These files can now be downloaded directly on the website without any sort of authentication. Additionally, these files are now also available on GitHub.

      * Some common errors are not properly handled by the macro and could be confusing for the user: When there is no selection and one tries to run a Check or Extraction: "Selection required in line 307 (called from line 14). profile=getProfile( ;". A simple "a line selection is required" message would be useful there. When "band" or "block" is selected for a channel in the "Set parameters" window, yet a 0 value is entered into the corresponding "Number of bands or blocks" section, one gets this error when trying to Extract: "Empty array in line 842 (called from line 113). if ( ( subloc . length == 1 ) & ( subloc [ 0 == 0) ) {". This error is not too rare, since the "Number of bands or blocks" section is populated with a 0 after choosing "sarcomeric actin" (after accepting the settings) and stays that way when one changes back to "blocks" or "bands".*

      We thank the reviewer for pointing out these bugs. These bugs are now corrected in the revised version.

      * The fact that every time one clicks on the most used buttons, the getDirectory window appears is not only quite annoying but also, ultimately a waste of time. Isn't it possible to choose the directory in which to store the files only once, from the "Set parameters" window?*

      We have now found a solution to avoid this step. The user is only prompted to provide the image folder when pressing the "Set parameter" button. We kept the prompt for directory only when the user selects the time-lapse analysis or the analysis of multiple ROIs. The main reason is that it is very easy for the analysis to end up in the wrong folder otherwise.

      * The authors state that the outputs of the workflow are "user friendly text files". However, some of them lack descriptive headers (like the localisations and profiles) or even file names (like colors.txt). If there is something lacking in the manuscript, it is a brief description of all the output files generated during the workflow.*

      PatternJ generates multiple files, several of which are internal to the toolset. They are needed to keep track of which analyses were done, and which colors were used in the images, amongst others. From the user part, only the files obtained after the analysis All_localizations.channel_X.txt and sarcomere_lengths.txt are useful. To improve the user experience, we now moved all internal files to a folder named "internal", which we think will clarify which outputs are useful for further analysis, and which ones are not. We thank the reviewer for raising this point and we now mention it in our Tutorial.

      I don't really see the point in saving the localizations from the "Extraction" step, they are even named "temp".

      We thank the reviewer for this comment, this was indeed not necessary. We modified PatternJ to delete these files after they are used.

      * In the same line, I DO see the point of saving the profiles and localizations from the "Extract & Save" step, but I think they should be deleted during the "Analysis" step, since all their information is then grouped in a single file, with descriptive headers. This deleting could be optional and set in the "Set parameters" window.*

      We understand the point raised by the reviewer. However, the analysis depends on the reference channel picked, which is asked for when starting an analysis, and can be augmented with additional selections. If a user chooses to modify the reference channel or to add a new profile to the analysis, deleting all these files would mean that the user will have to start over again, which we believe will create frustration. An optional deletion at the analysis step is simple to implement, but it could create problems for users who do not understand what it means practically.

      * Moreover, I think it would be useful to also save the linear roi used for the "Extract & Save" step, and eventually combine them during the "Analysis step" into a single roi set file so that future re-analysis could be made on the same regions. This could be an optional feature set from the "Set parameters" window. *

      We agree with the reviewer that saving ROIs is very useful. ROIs are now saved into a single file each time the user extracts and saves positions from a selection. Additionally, the user can re-use previous ROIs and analyze an image or image series in a single step.

      * In the "PatternJ workflow" section of the manuscript, the authors state that after the "Extract & Save" step "(...) steps 1, 2, 4, and 5 can be repeated on other selections (...)". However, technically, only steps 1 and 5 are really necessary (alternatively 1, 4 and 5 if the user is unsure of the quality of the patterning). If a user follows this to the letter, I think it can lead to wasted time.

      *

      We agree with the reviewer and have corrected the manuscript accordingly (line 119-120).

      • *

      *I believe that the "Version Information" button, although important, has potential to be more useful if used as a "Help" button for the toolset. There could be links to useful sources like the manuscript or the PatternJ website but also some tips like "whenever possible, use a higher linewidth for your line selection" *

      We agree with the reviewer as pointed out in our previous answers to the other reviewers. This button is now replaced by a Help menu, including a simple tutorial in a series of images detailing the steps to follow, a link to the user website, and a link to our video tutorial.

      * It would be interesting to mention to what extent does the orientation of the line selection in relation to the patterned structure (i.e. perfectly parallel vs more diagonal) affect pattern length variability?*

      As answered to reviewer 1, we understand this concern, which needs to be clarified for readers. The issue may be concerning at first sight, but the errors grow only with the inverse of cosine and are therefore rather low. For example, if the user creates a selection off by 3 degrees, which is visually obvious, lengths will be affected by an increase of only 0.14%. The point raised by the reviewer is important to discuss, and we therefore have added a comment on the choice of selection (lines 94-98) as well as a supplementary figure (Figure 1 - figure supplement 1).

      * When "the algorithm uses the peak of highest intensity as a starting point and then searches for peak intensity values one spatial period away on each side of this starting point" (line 133-135), does that search have a range? If so, what is the range? *

      We agree that this information is useful to share with the reader. The range is one pattern size. We have modified the sentence to clarify the range of search used and the resulting limits in aperiodicity (now lines 176-181).

      * Line 144 states that the parameters of the fit are saved and given to the user, yet I could not find such information in the outputs. *

      The parameters of the fits are saved for blocks. We have now clarified this point by modifying the manuscript (lines 186-198) and modifying Figure 1 - figure supplement 5. We realized we made an error in the description of how edges of "block with middle band" are extracted. This is now corrected.

      * In line 286, authors finish by saying "More complex patterns from electron microscopy images may also be used with PatternJ.". Since this statement is not backed by evidence in the manuscript, I suggest deleting it (or at the very least, providing some examples of what more complex patterns the authors refer to). *

      This sentence is now deleted.

      * In the TEM image of the fly wing muscle in fig. 4 there is a subtle but clearly visible white stripe pattern in the original image. Since that pattern consists of 'dips', rather than 'peaks' in the profile of the inverted image, they do not get analyzed. I think it is worth mentioning that if the image of interest contains both "bright" and "dark" patterns, then the analysis should be performed in both the original and the inverted images because the nature of the algorithm does not allow it to detect "dark" patterns. *

      We agree with the reviewer's comment. We now mention this point in lines 337-339.

      * In line 283, the authors mention using background correction. They should explicit what method of background correction they used. If they used ImageJ's "subtract background' tool, then specify the radius.*

      We now describe this step in the method section.

      *

      Reviewer #3 (Significance (Required)):

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field. Being a software paper, the advance proposed by the authors is technical in nature. The novelty and significance of this tool is that it offers quick and simple pattern analysis at the single unit level to a broad audience, since it runs on the ImageJ GUI and does not require any programming knowledge. Moreover, all the modules and steps are well described in the paper, which allows easy going through the analysis.
      • Place the work in the context of the existing literature (provide references, where appropriate). The authors themselves provide a good and thorough comparison of their tool with other existing ones, both in terms of ease of use and on the type of information extracted by each method. While PatternJ is not necessarily superior in all aspects, it succeeds at providing precise single pattern unit measurements in a user-friendly manner.
      • State what audience might be interested in and influenced by the reported findings. Most researchers working with microscopy images of muscle cells or fibers or any other patterned sample and interested in analyzing changes in that pattern in response to perturbations, time, development, etc. could use this tool to obtain useful, and otherwise laborious, information. *

      We thank the reviewer for these enthusiastic comments about how straightforward for biologists it is to use PatternJ and its broad applicability in the bio community.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      The authors present an ImageJ Macro GUI tool set for the quantification of one-dimensional repeated patterns that are commonly occurring in microscopy images of muscles.

      Major comments

      In our view the article and also software could be improved in terms of defining the scope of its applicability and user-ship. In many parts the article and software suggest that general biological patterns can be analysed, but then in other parts very specific muscle actin wordings are used. We are pointing this out in the "Minor comments" sections below. We feel that the authors could improve their work by making a clear choice here. One option would be to clearly limit the scope of the tool to the analysis of actin structures in muscles. In this case we would recommend to also rename the tool, e.g. MusclePatternJ. The other option would be to make the tool about the generic analysis of one-dimensional patterns, maybe calling the tool LinePatternJ. In the latter case we would recommend to remove all actin specific wordings from the macro tool set and also the article should be in parts slightly re-written.

      Minor/detailed comments

      Software

      We recommend considering the following suggestions for improving the software.

      File and folder selection dialogs

      In general, clicking on many of the buttons just opens up a file-browser dialog without any further information. For novel users it is not clear what the tool expects one to select here. It would be very good if the software could be rewritten such that there are always clear instructions displayed about which file or folder one should open for the different buttons.

      Extract button

      The tool asks one to specify things like whether selections are drawn "M-line-to-M-line"; for users that are not experts in muscle morphology this is not understandable. It would be great to find more generally applicable formulations.

      Manual selection accuracy

      The 1st step of the analysis is always to start from a user hand-drawn profile across intensity patterns in the image. However, this step can cause inaccuracy that varies with the shape and curve of the line profile drawn. If not strictly perpendicular to for example the M line patterns, the distance between intensity peaks will be different. This will be more problematic when dealing with non-straight and parallelly poised features in the image. If the structure is bended with a curve, the line drawn over it also needs to reproduce this curve, to precisely capture the intensity pattern. I found this limits the reproducibility and easy-usability of the software.

      Reproducibility

      Since the line profile drawn on the image is the first step and very essential to the entire process, it should be considered to save together with the analysis result. For example, as ImageJ ROI or ROIset files that can be re-imported, correctly positioned, and visualized in the measured images. This would greatly improve the reproducibility of the proposed workflow. In the manuscript, only the extracted features are being saved (because the save button is also just asking for a folder containing images, so I cannot verify its functionality).

      ? button

      It would be great if that button would open up some usage instructions.

      Easy improvement of workflow

      I would suggest a reasonable expansion of the current workflow, by fitting and displaying 2D lines to the band or line structure in the image, that form the "patterns" the author aims to address. Thus, it extracts geometry models from the image, and the inter-line distance, and even the curve formed by these sets of lines can be further analyzed and studied. These fitted 2D lines can be also well integrated into ImageJ as Line ROI, and thus be saved, imported back, and checked or being further modified. I think this can largely increase the usefulness and reproducibility of the software.

      Manuscript

      We recommend considering the following suggestions for improving the manuscript. Abstract: The abstract suggests that general patterns can be quantified, however the actual tool quantifies specific subtypes of one-dimensional patterns. We recommend adapting the abstract accordingly.

      Line 58: Gray-level co-occurrence matrix (GLCM) based feature extraction and analysis approach is not mentioned nor compared. At least there's a relatively recent study on Sarcomeres structure based on GLCM feature extraction: https://github.com/steinjm/SotaTool with publication: https://doi.org/10.1002/cpz1.462

      Line 75: "...these simple geometrical features will address most quantitative needs..." We feel that this may be an overstatement, e.g. we can imagine that there should be many relevant two-dimensional patterns in biology?!

      Line 83: "After a straightforward installation by the user, ...". We think it would be convenient to add the installation steps at this place into the manuscript.

      Line 87: "Multicolor images will give a graph with one profile per color." The 'Multicolor images' here should be more precisely stated as "multi-channel" images. Multi-color images could be confused with RGB images which will be treated as 8-bit gray value (type conversion first) images by profile plot in ImageJ.

      Line 92: "...such as individual bands, blocks, or sarcomeric actin...". While bands and blocks are generic pattern terms, the biological term "sarcomeric actin" does not seem to fit in this list. Could a more generic wording be found, such as "block with spike"?

      Line 95: "the algorithm defines one pattern by having the features of highest intensity in its centre". Could this be rephrased? We did not understand what that exactly means.

      Line 124 - 147: This part the only description of the algorithm behind the feature extraction and analysis, but not clearly stated. Many details are missing or assumed known by the reader. For example, how it achieved sub-pixel resolution results is not clear. One can only assume that by fitting Gaussian to the band, the center position (peak) thus can be calculated from continuous curves other than pixels.

      Line 407: We think the availability of both the tool and the code could be improved. For Fiji tools it is common practice to create an Update Site and to make the code available on GitHub. In addition, downloading the example file (https://drive.google.com/file/d/1eMazyQJlisWPwmozvyb8VPVbfAgaH7Hz/view?usp=drive_link) required a Google login and access request, which is not very convenient; in fact, we asked for access but it was denied. It would be important for the download to be easier, e.g. from GitHub or Zenodo.

      Significance

      The strength of this study is that a tool for the analysis of one-dimensional repeated patterns occurring in muscle fibres is made available in the accessible open-source platform ImageJ/Fiji. In the introduction to the article the authors provide an extensive review of comparable existing tools. Their new tool fills a gap in terms of providing an easy-to-use software for users without computational skills that enables the analysis of muscle sarcomere patterns. We feel that if the below mentioned limitations could be addressed the tool could indeed be valuable to life scientists interested in muscle patterning without computational skills.

      In our view there are a few limitations, including the accessibility of example data and tutorials at sites.google.com/view/patternj, which we had trouble to access. In addition, we think that the workflow in Fiji, which currently requires pressing several buttons in the correct order, could be further simplified and streamlined by adopting some "wizard" approach, where the user is guided through the steps. Another limitation is the reproducibility of the analysis; here we recommend enabling IJ Macro recording as well as saving of the drawn line ROIs. For more detailed suggestions for improvements please see the above sections of our review.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Bell et al. provide an exhaustive and clear description of the diversity of a new class of predicted type IV restriction systems that the authors denote as CoCoNuTs, for their characteristic presence of coiled-coil segments and nuclease tandems. Along with a comprehensive analysis that includes phylogenetics, protein structure prediction, extensive protein domain annotations, and an in-depth investigation of encoding genomic contexts, they also provide detailed hypotheses about the biological activity and molecular functions of the members of this class of predicted systems. This work is highly relevant, it underscores the wide diversity of defence systems that are used by prokaryotes and demonstrates that there are still many systems to be discovered. The work is sound and backed-up by a clear and reasonable bioinformatics approach. I do not have any major issues with the manuscript, but only some minor comments.

      Strengths:

      The analysis provided by the authors is extensive and covers the three most important aspects that can be covered computationally when analysing a new family/superfamily: phylogenetics, genomic context analysis, and protein-structure-based domain content annotation. With this, one can directly have an idea about the superfamily of the predicted system and infer their biological role. The bioinformatics approach is sound and makes use of the most current advances in the fields of protein evolution and structural bioinformatics.

      Weaknesses:

      It is not clear how coiled-coil segments were assigned if only based on AF2-predicted models or also backed by sequence analysis, as no description is provided in the methods. The structure prediction quality assessment is based solely on the average pLDDT of the obtained models (with a threshold of 80 or better). However, this is not enough, particularly when multimeric models are used. The PAE matrix should be used to evaluate relative orientations, particularly in the case where there is a prediction that parts from 2 proteins are interacting. In the case of multimers, interface quality scores, such as the ipTM or pDockQ, should also be considered and, at minimum, reported.

      A description of the coiled-coil predictions has been added to the Methods. For multimeric models, PAE matrices and ipTM+pTM scores have been included in Supplementary Data File S1.

      Reviewer #2 (Public Review):

      Summary:

      In this work, using in-depth computational analysis, Bell et al. explore the diverse repertoire of type IV McrBC modification-dependent restriction systems. The prototypical two-component McrBC system has been structurally and functionally characterised and is known to act as a defence by restricting phage and foreign DNA containing methylated cytosines. Here, the authors find previously unanticipated complexity and versatility of these systems and focus on detailed analysis and classification of a distinct branch, the so-called CoCoNut, named after its composition of coiled-coil structures and tandem nucleases. These CoCoNut systems are predicted to target RNA as well as DNA and to utilise defence mechanisms with some similarity to type III CRISPR-Cas systems.

      Strengths:

      This work is enriched with a plethora of ideas and a myriad of compelling hypotheses that now await experimental verification. The study comes from the group that was amongst the first to describe, characterize, and classify CRISPR-Cas systems. By analogy, the findings described here can similarly promote ingenious experimental and conceptual research that could further drive technological advances. It could also instigate vigorous scientific debates that will ultimately benefit the community.

      Weaknesses:

      The multi-component systems described here function in the context of large oligomeric complexes. Some of the single chain AF2 predictions shown in this work are not compatible, for example, with homohexameric complex formation due to incompatible orientation of domains. The recent advances in protein structure prediction, in particular AlphaFold2 (AF2) multimer, now allow us to confidently probe potential protein-protein interactions and protein complex formation. This predictive power could be exploited here to produce a better glimpse of these multimeric protein systems. It can also provide a more sound explanation for some of the observed differences amongst different McrBC types.

      Hexameric CnuB complexes with CnuC stimulatory monomers for Type I-A, I-B, I-C, II, and III-A CoCoNuT systems have been modeled with AF2 and included in Supplementary Data File S1, albeit without the domains fused to the GTPase N-terminus (with the exception of Type I-B, which lacks the long coiled-coil domain fused to the GTPase and was modeled with its entire sequence). Attempts to model the other full-length CnuB hexamers did not lead to convincing results.

      Recommendations for the authors:

      Reviewing Editor:

      The detailed recommendations by the two reviewers will help the authors to further strengthen the manuscript, but two points seem particularly worth considering: 1. The methods are barely sketched in the manuscript, but it could be useful to detail them more closely. Particularly regarding the coiled-coil segments, which are currently just statists, useful mainly for the name of the family, more detail on their prediction, structural properties, and purpose would be very helpful. 2. Due to its encyclopedic nature, the wealth of material presented in the paper makes it hard to penetrate in one go. Any effort to make it more accessible would be very welcome. Reviewer 1 in particular has made a number of suggestions regarding the figures, which would make them provide more support for the findings described in the text.

      A description of the techniques used to identify coiled-coil segments has been added to the Methods. Our predictions ranged from near certainty in the coiled-coils detected in CnuB homologs, to shorter helices at the limit of detection in other factors. We chose to report all probable coiled-coils, as the extensive coiled-coils fused to CnuB, which are often the only domain present other than the GTPase, imply involvement in mediating complex formation by interacting with coiled-coils in other factors, particularly the other CoCoNuT factors. The suggestions made by Reviewer 1 were thoughtful and we made an effort to incorporate them.

      Reviewer #1 (Recommendations For The Authors):

      I do not have any major issues with the manuscript. I have however some minor comments, as described below.

      • The last sentence of the abstract at first reads as a fact and not a hypothesis resulting from the work described in the manuscript. After the second read, I noticed the nuances in the sentence. I would suggest a rephrasing to emphasize that the activity described is a theoretical hypothesis not backed-up by experiments.

      This sentence has been rephrased to make explicit the hypothetical nature of the statement.

      • In line 64, the authors rename DUF3578 as ADAM because indeed its function is not unknown. Did the authors consider reaching out to InterPro to add this designation to this DUF? A search in interpro with DUF3578 results in "MrcB-like, N-terminal domain" and if a name is suggested, it may be worthwhile to take it to the IntrePro team.

      We will suggest this nomenclature to InterPro.

      • I find Figure 1E hard to analyse and think it occupies too much space for the information it provides. The color scheme, the large amount of small slices, and the lack of numbers make its information content very small. I would suggest moving this to the supplementary and making it instead a bar plot. If removed from Figure 1, more space is made available for the other panels, particularly the structural superpositions, which in my opinion are much more important.

      We have removed Figure 1E from the paper as it adds little information beyond the abundance and phyletic distribution of sequenced prokaryotes, in which McrBC systems are plentiful.

      • In Figure 2, it is not clear due to the presence of many colorful "operon schemes" that the tree is for a single gene and not for the full operon segment. Highlighting the target gene in the operons or signalling it somehow would make the figure easy to understand even in the absence of the text and legend. The same applies to Supplementary Figure 1.

      The legend has been modified to show more clearly that this is a tree of McrB-like GTPases.

      • In line 146, the authors write "AlphaFold-predicted endonucelase fold" to say that a protein contains a region that AF2 predicts to fold like an endonuclease. This is a weird way of writing it and can be confusing to non-expert readers. I would suggest rephrasing for increased clarity.

      This sentence has been rephrased for greater clarity.

      • In line 167, there is a [47]. I believe this is probably due to a previous reference formatting.

      Indeed, this was a reference formatting error and has been fixed.

      • In most figures, the color palette and the use of very similar color palettes for taxonomy pie charts, genomic context composition schemes, and domain composition diagrams make it really hard to have a good understanding of the image at first. Legends are often close to each other, and it is not obvious at first which belong to what. I would suggest changing the layouts and maybe some color schemes to make it easier to extract the information that these figures want to convey.

      It seemed that Figure 4 was the most glaring example of these issues, and it has been rearranged for easier comprehension.

      • In the paragraph that starts at line 199, the authors mention an Ig-like domain that is often found at the N-terminus of Type I CoCoNuTs. Are they all related to each other? How conserved are these domains?

      These domains are all predicted to adopt a similar beta-sandwich fold and are found at the N-terminus of most CoCoNuT CnuC homologs, suggesting they are part of the same family, but we did not undertake a more detailed sequenced-based analysis of these regions.

      We also find comparable domains in the CnuC/McrC-like partners of the abundant McrB-like NxD motif GTPases that are not part of CoCoNuT systems, and given the similarity of some of their predicted structures to Rho GDP-dissociation inhibitor 1, we suspect that they have coevolved as regulators of the non-canonical NxD motif GTPase type. Our CnuBC multimer models showing consistent proximity between these domains in CnuC and CnuB GTPase domains suggest this could indeed be the case. We plan to explore these findings further in a forthcoming publication.

      • In line 210, the authors write "suggesting a role in overcrowding-induced stress response". Why so? In >all other cases, the authors justify their hypothesis, which I really appreciated, but not here.

      A supplementary note justifying this hypothesis has been added to Supplementary Data File S1.

      • At the end of the paragraph that starts in line 264, the authors mention that they constructed AF2 multimeric models to predict if 2 proteins would interact. However, no quality scores were provided, particularly the PAE matrix. This would allow for a better judgement of this prediction, and I would suggest adding the PAE matrix as another panel in the figure where the 3D model of the complex is displayed.

      The PAE matrix and ipTM+pTM scores for this and other multimer models have been added to Supplementary Data File S1. For this model in particular, the surface charge distribution of the model has been presented to support the role of the domains that have a higher PAE in RNA binding.

      • In line 306, "(supplementary data)" refers to what part of the file?

      This file has been renamed Supplementary Table S3 and referenced as such.

      • In line 464, the authors suggest that ShdA could interact with CoCoNuTs. Why not model the complex as done for other cases? what would co-folding suggest?

      As we were not able to convincingly model full-length CnuB hexamers with N-terminal coiled-coils, we did not attempt modeling of this hypothetical complex with another protein with a long coiled-coil, but it remains an interesting possibility.

      • In line 528, why and how were some genes additionally analyzed with HHPred?

      Justification for this analysis has been added to the Methods, but briefly, these genes were additionally analyzed if there were no BLAST hits or to confirm the hits that were obtained.

      • In the first section of the methods, the first and second (particularly the second) paragraphs are extremely long. I would suggest breaking them to facilitate reading.

      This change has been made.

      • In line 545, what do the authors mean by "the alignment (...) were analyzed with HHPred"?

      A more detailed description of this step has been added to the Methods.

      • The authors provide the models they produced as well as extensive supplementary tables that make their data reusable, but they do not provide the code for the automated steps, as to excise target sequence sections out of multiple sequence alignments, for example.

      The code used for these steps has been in use in our group at the NCBI for many years. It will be difficult to utilize outside of the NCBI software environment, but for full disclosure, we have included a zipped repository with the scripts and custom-code dependencies, although there are external dependencies as well such as FastTree and BLAST. In brief, it involves PSI-BLAST detection of regions with the most significant homology to one of a set of provided alignments (seals-2-master/bin/wrappers/cog_psicognitor). In this case, the reference alignments of McrB-like GTPases and DUF2357 were generated manually using HHpred to analyze alignments of clustered PSI-BLAST results. This step provided an output of coordinates defining domain footprints in each query sequence, which were then combined and/or extended using scripts based on manual analysis of many examples with HHpred (footprint_finders/get_GTPase_frags.py and footprint_finders/get_DUF2357_frags.py), then these coordinates were used to excise such regions from the query amino acid sequence with a final script (seals-2-master/bin/misc/fa2frag).

      Reviewer #2 (Recommendations For The Authors):

      (1) Page 4, line 77 - 'PUA superfamily domains' could be more appropriate to use instead of "EVE superfamily".

      While this statement could perhaps be applied to PUA superfamily domains, our previous work we refer to, which strongly supports the assertion, was restricted to the EVE-like domains and we prefer to retain the original language.

      (2) Page 5. lines 128-130 - AF2 multimer prediction model could provide a more sound explanation for these differences.

      Our AF2 multimer predictions added in this revision indeed show that the NxD motif McrB-like CoCoNuT GTPases interact with their respective McrC-like partners such that an immunoglobulin-like beta-sandwich domain, fused to the N-termini of the McrC homologs and similar to Rho GDP-dissociation inhibitor 1, has the potential to physically interact with the GTPase variants. However, we did not probe this in greater detail, as it is beyond the scope of this already highly complex article, but we plan to study it in the future.

      (3) Page 8, line 252 - The surface charge distribution of CnuH OB fold domain looks very different from SmpB (pdb3iyr). In fact, the regions that are in contact with RNA in SmpB are highly acidic in CoCoNut CnuH. Although it looks likely that this domain is involved in RNA binding, the mode of interaction should be very different.

      We did not detect a strong similarity between the CnuH SmpB-like SPB domain and PDB 3IYR, but when we compare the surface charge distribution of PDB 1WJX and the SPB domain, while there is a significant area that is positively charged in 1WJX that is negatively charged in SPB, there is much that overlaps with the same charge in both domains.

      The similarity between SmpB and the SPB domain is significant, but definitely not exact. An important question for future studies is: If the domains are indeed related due to an ancient fusion of SmpB to an ancestor of CnuH, would this degree of divergence be expected?

      In other words, can we say anything about how the function of a stand-alone tmRNA-binding protein could evolve after being fused to a complex predicted RNA helicase with other predicted RNA binding domains already present? Experimental validation will ultimately be necessary to resolve these kinds of questions, but for now, it may be safe to say that the presence of this domain, especially in conjunction with the neighboring RelE-like RTL domain and UPF1-like helicase domain, signals a likely interaction with the A-site of the ribosome, and perhaps restriction of aberrant/viral mRNA.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity

      The manuscript by Barba-Aliaga and colleagues describe a potential function of eIF5A for the control of TIM50 translation. The authors showed that in temperature-sensitive mutants of eIF5A several mitochondrial proteins are decreased including OXPHOS subunits, proteins of the TCA cycle and some components of protein translocases. Some precursor proteins appear to localize into the cytosol. As consequent of mitochondrial dysfunction, the expression of some stress components is induced. The idea is that eIF5A ribosome-stalling of the proline-rich Tim50 of the TIM23 complex and thereby controls mitochondrial protein set-up.

      The findings are potentially interesting. However, some control experiments are required to substantiate the findings.

      1. To support their conclusion the authors should show whether Tim50 levels are affected in the eIF5A-ts mutants used. Tim50 protein half-life is approximately 9.6 h (Christiano et al, 2014), which makes difficult to measure large differences in new protein synthesis upon eIF5A depletion. However, we used different approaches to show that reduction in eIF5A provokes a reduction in Tim50 protein levels and synthesis. 1) The steady-state levels of Tim50 protein (genomic HA-tagged version) are shown by western blotting analysis in Fig. S4B and confirm a significant drop of approximately 20% in the tif51A-1 mutant at restrictive temperature. 2) The use of a construct in which Tim50 is fused to a nanoluciferase reporter under the control of a tetO7 inducible promoter shows a significant 3-fold reduction in Tim50 protein synthesis in the tif51A-1 mutant compared to wild-type (Fig. 4C). In addition, the protein synthesis time is calculated and indicates that it takes the double time for the tif51A-1 strain to synthesize Tim50 protein than the wild-type (Fig. 4E). 3) The expression of a FLAG-TIM50-GFP version under a GAL inducible system also shows a significant reduction in Tim50 protein synthesis in the two eIF5A temperature-sensitive strains (Fig. S4C). 4) The proteomic analysis performed at 41ºC showed a 20% reduction in Tim50 protein levels in the two eIF5A temperature-sensitive strains, although not being statistically significant (Table S1). Furthermore, TIM50 mRNA levels were determined by RT-qPCR across all the experiments mentioned to confirm that the low levels of Tim50 protein were not due to decreased transcription or increased mRNA degradation. 5) An additional experiment of polysome profiling has been included in Fig. R1 (Figure for Reviewers) showing a higher TIM50 mRNA abundance at low polysomal fractions and a lower mRNA abundance at heavy polysomal fractions upon eIF5A depletion. This indicates that the TIM50 mRNA abundance is significantly shifted to earlier fractions and translation of Tim50 is reduced in the tif51A-1 mutant at restrictive temperature but not at permissive temperature. Altoghether, all these experiments confirm a significant reduction of Tim50 protein levels upon eIF5A depletion and conclusions are supported on these results.

      How are the levels of TOM and TIM23 subunits?

      Response: Our proteomic analysis shows that the protein levels of Tom70 and Tom20 receptor subunits of the TOM complex are significantly decreased in the two eIF5A temperature-sensitive strains (Table S1). These results are in agreement with the polysome profiling results, where it is seen a significant reduction of TOM70 and TOM20 mRNAs in the heavy polysomal fractions while a significant increase of these mRNAs is observed in the light fractions of eIF5A-depleted cells (Fig. 2C and Fig. S2D). Apart from Tim50, no other proteins of the Tim23 translocase complex were detected in the proteomic analysis.

      Furthermore, how are the levels of the Tim50 variant that lack the proline residues? Is the stability or function of Tim50 affected by these mutations?

      Although we did not specifically analysed the Tim50ΔPro protein levels, a quantification of the Tim50ΔPro fluorescent signal has been performed to address this matter and is shown in Fig. R2 and mentioned in the corresponding Results section. Results indicate that the Tim50 variant lacking the proline residues has similar protein levels to the wild-type version and therefore, it is tempting to say that its stability should also be similar. However, if Reviewers consider this to be essential for publishing, additional experiments using cycloheximide could be conducted in order to better assess the stability and half-life of this Tim50 version.

      Additionally, functional levels of Tim50ΔPro protein is shown by the fact that wild-type cells carrying this Tim50 protein version as the only copy of Tim50 grew well in glycerol media, where Tim50 is essential for the mitochondrial function (Fig. 5A). However, we suspect that Tim50ΔPro is a bit less efficient protein since a double mutant tif51A-1 Tim50ΔPro shows even reduced growth than the single tif51A-1 mutant (Fig. 5A). This information also responds to the comments made by Reviewer #2.

      How specific is the effect of eIF5A on Tim50? Is there any other mitochondrial substrate of eIF5A? It is not so clear to the reviewer why the authors focused on Tim50.

      Response: eIF5A has been shown to be necessary for the translation of mRNA codons encoding for consecutive prolines and, consequently, lack of eIF5A causes ribosome stalling in these polyproline motifs (Gutierrez et al., 2013; Pelechano and Alepuz, 2017; Schuller et al., 2017). In our manuscript we showed: 1) using an artificial tetO7-TIM50-nanoLuc genomic construct we demonstrate that the synthesis of Tim50 protein (measured as appearance of luciferase activity upon induction of tetO promoter) is significantly reduced by 3-fold under eIF5A depletion only when Tim50 contains the stretch of 7 consecutive prolines (Fig. 4A-D); 2) genomic Tim50-HA and plasmid FLAG-TIM50-GFP protein levels are significantly reduced upon eIF5A depletion (Fig. S4B); 3) calculation of the time for translation elongation of Tim50 mRNA shows that this time is double in cells with eIF5A depletion than in cells containing normal eIF5A levels (Fig. 4E); and 4) analysis of published ribosome profiling data shows a precipitous drop-off in ribosome density exactly where the stretch of polyprolines is located in Tim50 (540-561bp) upon eIF5A depletion but not in the control strain (Fig. 4F). This result is indicative of ribosome stalling at Tim50 polyproline motif upon eIF5A depletion. Altogether, our results strongly support a direct and specific role of eIF5A in Tim50 protein synthesis. However, as we discuss in relation to Fig. 5 and in the Discussion section, Tim50 does not seem to be the only mitochondrial substrate of eIF5A, since recovery of Tim50 protein synthesis does not rescue the growth of eIF5A mutants under respiratory conditions. In this line, we have added further data pointing to ribosome stalling for other co-translationally inserted mitoproteins which are potential substrates of eIF5A (Table S6). Accordingly, this has also been included in the Discussion section. This information also responds to the comments made by Reviewer #4.

      Our focus on Tim50 in this manuscript resides in that we found a global downregulation of mitochondrial protein synthesis (Fig.1 and 2) in parallel to the accumulation of mitochondrial precursor proteins in the cytoplasm and induction of the mitoCPR response (Fig.3). All these data were pointing to a mitochondrial protein import defect. Since Tim50 is an essential component of the Tim23 translocase complex, its protein levels are reduced in eIF5A mutants and Tim50 contains a polyproline motif, all these data were pointing towards a Tim50-dependent effect in mitochondrial protein import upon eIF5A depletion, which we addressed in the manuscript.

      Figure 1A: Which tif51A strain was used?

      Response: The proteomic analysis was performed with tif51A-1 and tif51A-3 temperature-sensitive strains (see Table S1) and Fig.1A shows the average of the values obtained for the two mutants (proteins detected as down-regulated in these two samples and from 3 different biological replicates). This is now clarified in the Figure 1A legend. A similar approach was also followed in Pelechano and Alepuz, 2017. Additionally, the ratios between the protein level in the temperature-sensitive mutant respect wild-type for each protein and for each eIF5A mutant are also shown in Table S1. This information also responds to the comments made by Reviewer #2.

      Figure 1C: The authors should show the steady state levels of some OXPHOS/TCA components to confirm the findings of Figure 1A.

      Response: Proteomic findings have been confirmed for several proteins. The steady state levels of Por1 and Hsp60 proteins were investigated by western blotting (Figs. 1C,D) and results show a significant down-regulation on the two eIF5A temperature-sensitive strains at 41ºC, which confirms the findings of Fig. 1A. Additionally, we have included the same experiment performed at 37ºC (Fig. S1E), which also confirms the same conclusion.

      Furthermore, the steady-state levels of Tim50 protein were also investigated by western blotting (Fig. S4B), and results also showed a significant down-regulation in the tif51A-1 mutant at restrictive temperature (37ºC), compared to wild-type. This result also confirms the findings of Fig. 1A.

      However, if Reviewers consider that additional confirmation for OXPHOS/TCA proteins to be essential for publishing, additional experiments could be conducted to assess the protein levels of other OXPHOS/TCA proteins.

      The manuscript contains several quantifications. However, central information like number of repeats or whether a standard deviation or S.E.M. is depicted are missing.

      Response: Clear information on the number of repeats, type of graphical representation and statistical analysis is now included for all figures in the corresponding figure legends and also detailed in the Materials and Methods section. This information also responds to the comments made by Reviewer #2.

      Figure 3: The authors propose that precursor form aggregates outside mitochondria. To assess the data, a quantification should address in how many cells are protein aggregates.

      Response: The quantification of cytoplasmic Yta12 aggregates is now included in Fig.3E, which shows significant differences between the tif51A-1 mutant and the wild-type strain. In addition, quantification of cytosolic Tim50 aggregates was already included in Fig. 4H, which also shows significant differences between the tif51A-1 mutant and the wild-type strain. These two figures include the individual values from three biological replicates (at least 150 cells were analyzed), mean, standard deviation and statistical analysis.

      Do the observed aggregated proteins interact with Hsp104? recycled?

      Response: Yes, the cytoplasmic mitochondrial precursor aggregated proteins co-localize with Hsp104 as shown in Fig. 3I for Cyc1 and in Fig. 4J for Tim50. The quantification of Cyc1 and Tim50 co-localization with Hsp104 is shown in Fig S5D.


      Significance

      See above


      Reviewer #2

      Evidence, reproducibility and clarity

      The authors report here novel findings concerning the role of eIF5A in mediating protein import to mitochondria in the model eukaryote Saccharomyces cerevisiae. It was previously known from structural and other studies that the translation factor eIF5A binds to the E-site of stalled ribosomes to help promote peptide bond formation. It was inferred by ribosome footprinting and reporter studies assessing the impact of eIF5A depletion that eIF5A is particularly needed to translate several specific amino acid motifs including polyproline stretches. However additional target sequences are known.

      Here a proteomics approach reveals clear evidence that mitochondrially targeted proteins are impacted by temperature sensitive mutations in eIF5A that deplete the factor, including those without polyprolines. The authors then use a range of molecular and cell biology to focus on the role of mitochondrial signal sequences/mitochondrial protein import and the mitochondrial stress response, before highlighting a role for poly-prolines in Tim50, a major mitochondrial protein import factor. Consistent with the ribosome footprinting done previously it is shown that a stretch of 7 prolines limit its translation when eIF5A is depleted and studies shown here are consistent with the idea that this has wider consequences for mitochondrial protein import and hence translation/stability of other proteins. However improved Tim50 translation alone, by eliminating the poly-proline motif, is not sufficient to overcome all consequences of eIF5A depletion for mitochondrial protein import and for viability, suggesting a wider role.

      In general the text flows nicely, this could be a study that explains why a large number of mitochondrially targeted proteins are impacted by depletion of eIF5A in yeast. As the poly Pro sequence in Tim50 is not conserved in higher eukaryotes it is unclear how this observation will scale to other systems, but it provides an example of how studies in a relatively simple system can trace wide-spread impact of the loss of one component of a central pathway-here protein synthesis to altered translation of a key component of another process-mitochondrial protein import. Given that eIF5A and its hypusine modifying enzymes are mutated in rare human disorders, it is likely there will be interest in this study.

      However, while the conclusions may be justified, there are significant deficiencies in how the experiments have been analysed and presented in this version of the manuscript that impact every figure shown, coupled with deficiencies in the methods section that all need to be addressed. Thus, we have here the basis of what should be a very interesting paper here, but there is a lot of work to do to remedy perceived weaknesses. It may be that the overall conclusions are entirely sound and appropriate, but I suspect that performing the statistics in less biased ways may change some of the significant differences claimed. Some explanations concerning how data analyses were conducted and the reasons for specific analysis decisions being made would also improve the narrative. These points are expanded on below.

      All the edits suggested here are aimed at improving the rigor of reporting in this study. Depending on how they are answered some may become major issues, or they could all be minor.

      1 Figure 1 shows proteomic data for response to heat shock at 41{degree sign}C. In the text it is made clear that two different temperature sensitive missense alleles the 51A-1 and 51A-3 were analysed, but the single volcano plot in Figure 1A does not say whether it is reporting one of these experiments compared to WT (which one) or some other analysis (ie have data from the 2 mutants been amalgamated somehow?). I would assume only one, but which one, and why only one plot? How different is the other experiment? Why does the Figure title say the experiment is an eIF5A deletion when it is not this?

      Response: The data shown in Figure 1A corresponds to the average values obtained in the proteomic analysis for the two temperature-sensitive mutants tif51A-1 and tif51A-3 (with data for each mutant obtained from 3 different biological replicates). Highly reproducible proteomic results and similar between the two mutants were obtained (see in Fig. S1A the MDS-plot showing all replicates for each strain and condition studied in the proteomic analysis). In addition, the proteomic data showing the protein 41°C/25°C ratio for each eIF5A temperature-sensitive mutant with respect to wild-type is shown in the Table S1. This is now clarified in the Figure 1A legend. A similar approach using the mean values of the two mutants was followed in the analysis of ribosome footprintings made in Pelechano and Alepuz, 2017. Additionally, the ratios between the protein level in the temperature-sensitive mutant respect wild-type for each protein and for each eIF5A mutant are also shown in Table S1. This information also responds to the comments made by Reviewer #1.

      Reviewer #2 is right with his/her comment and there was a mistake in the Fig.1 title. Now it is corrected and written “depletion” instead of the wrong “deletion”.

      2 Why were the experiments shown in Figure 1 done at 41{degree sign}C when all other experiments are done at 37{degree sign}C? This experimental difference is ignored in the text and no comparison of the impact of 37 vs 41 is made anywhere in the manuscript. For example it would be straightforward to perform a comparison of eIF5A depletion (by western blot), polyribosome profiles, strain growth/inhibition at both temperatures.

      Response: Our aim carrying out a proteomic experiment after 4 hours of incubation of the temperature-sensitive strains at 41°C was to get a more profound depletion of the eIF5A protein, which is very abundant and stable at normal conditions, in order to get clear proteomic results. The proteomic results were pointing to a reduction in the levels of many mitochondrial proteins, corroborating previous results obtained in murine embryonic fibroblasts upon depletion of active eIF5A conditions (https://doi.org/10.1016/j.cmet.2019.05.003). From this starting point we tried to find out the molecular mechanism involved and all the rest of experiments are done with temperature sensitive eIF5A mutants under restrictive temperature of 37°C that is the most common conditions used in yeast by us and others, and in which wild-type yeast cells still grow vigorously.

      In our previous manuscript version, the depletion of eIF5A after growing the cells at 41ºC for 4 h was shown in Fig. 1C. These data has been expanded and we have now included in Fig. S1E a western blotting analysis that shows the depletion of eIF5A after incubating the cells at 37ºC and 41 ºC for 4 h (Fig. S1E). The steady state level of the mitochondrial Por1 protein was investigated by western blotting (Figs. 1C,D) and results show a significant down-regulation in the two eIF5A temperature-sensitive strains at 41ºC. We have now included the same experiment performed at 37ºC (Fig. S1E), which also confirms the same conclusion. In addition, following Reviewer #2 suggestions, growth of the wild-type and tif51A-1 strains was tested by serial drop assays conducted at 25ºC, 37ºC and 41ºC and results confirm that both 37ºC and 41ºC temperatures impair the growth of the tif51A-1 strain but not the wild-type (Fig.S1B). The new information included in Figure S1 is now explained in the Results section. This information also responds to the comments made by Reviewer #4.

      3 Western blot quantification. In Figure 1D and E the authors present western blot quantification. However they have chosen to normalise every panel to the signal in lane 1. This means that there is no variation at all in that sample as every replicate is =1. This completely skews the statistical assumptions made (because there will be variation in that sample) and effectively invalidates all the statistics shown. An appropriate approach to use is to normalise the signal in each lane to the mean signal across all lanes in a single blot. That way if all are identical they remain at 1, but importantly variation across all samples is captured. This should be done to the loading controls as well before working out ratios or performing any statistical analyses.

      Response: Following Reviewer #2 suggestions we have changed the normalization methodology for the Western blots and we have now normalized the signal in each lane to the mean signal across all lanes in each single blot, and do so also for the loading controls. We have conducted this analysis in every western blotting experiment shown in the manuscript (Figs. 1D, S4B and S4C) and statistical analyses have been performed again to capture variation across all samples. In addition, this is also included in the Materials and Methods section (“Western blotting” subsection). Results obtain are similar to previous ones but we agree that this new approach improves the data presentation.

      For this type of experiment it is more appropriate to use Anova than a T-test. This advice applies to every western data analysis figure in the whole manuscript and so all associated statistics need to be done again from the original quantification values. If T-test is justified then a correction for multiple hypothesis testing should be applied.

      Response: After reviewing a large number of publications analysing similar data, and also following the recommendations of our statistical department, we have retained the statistics used in our previous version (with the new data normalisation as explained above, following the recommendations of Reviewer #2). This is because for each western blot figure shown, we have performed experiments with two different biological samples, wild-type cells and eIF5A mutant cells, and compared results for a single variable (Por1 protein level; eIF5A protein level or Hsp60 protein level) using three or more biological replicates. In this context, we compare the mean of the protein levels obtained from the biological replicate for two groups: wild-type and eIF5A mutant. Therefore, we believe that the statistical T-test is more appropriate. However, we could repeat the statistic if it is finally considered more appropriate.

      In all bar chart figures in addition to showing the mean and SD, each replicate value should be shown (eg as done in Fig 2C). Graphpad allows individual points to be plotted easily.

      Response: All Figures along the manuscript now include individual values from each replicate, in addition to showing the mean, SD and statistical analysis. All figure legends have been corrected accordingly.

      5 Figure 2. Polysome profiles. The impact of translation elongation stalls on global polysome profiles is complex, but a global run off is highly unlikely. Stalls later in the coding region would be anticipated to cause an increase in ribosome density as more ribosomes accumulate (like cars queueing held at a red light). However where a stall is early in a longer ORF, for example at a signal sequence, then there is less opportunity for ribosomes to join and so for those mRNAs moving to lighter points in the gradient may be observed. This may also cause knock on effects on AUG clearance and initiation which the authors appear to see as there may be an increased 60S peak in the traces shown. Are there differences in overall -low vs high polysomes, the traces shown suggest there may be? Discussion of these points is merited in the results section given the subsequent qPCR experiment.

      Response: The comments made by the Reviewer #2 are very interesting and we have made changes accordingly. First, we now show in Fig. 2A,B and Fig.S2B,C the quantification of polysomal and monosomal fractions in wild-type and tif51A-1 mutants at permissive and restrictive temperatures. It can be appreciated that there is no impact on global polysomal and monosomal fractions under eIF5A depletion. This result does not support a global stall at 3’ region of the ORF, because then an increase in polysomal fractions should be detected; nor a global stall at the 5’ region of the ORF, because then a decrease in polysomal fractions should be detected. However, with respect to individual mRNAs, our data show a significant reduction in the heavier polysomal fractions and a significant increase in lighter polysomal fractions for mRNAs encoding mitochondrial proteins, while no significant changes were observed for mRNAs encoding cytoplasmic proteins (Fig. 2C and Fig. S2D-I). These results could be interpreted as a result of ribosome stalls in the 5’ ORF regions, for example at the signal sequence, according to Reviewer #2 comments.

      We have now introduced this comment in the Results and Discussion sections.

      Figure 2 qPCR. Using qPCR to analyse RNA levels across polysome gradients is tricky for multiple reasons including that the total RNA level varies across fractions that can impact recovery efficiencies following precipitation of gradient fractions. Often investigators use a spike in control to act as a normalising factor. Here it is completely unclear what analysis was done because details are not stated anywhere. How were primers optimized, was amplification efficiency determined? Or are they assumed to be 100%, which they will not be? A detailed description or reference to a study where that is written is needed.

      Response: The RNA extraction and analyses by RT-qPCR of the mRNA levels in the polysomal gradients was done as in previous studies of our lab (Romero et al. Sci Rep. 2020;10(1):233. doi: 10.1038/s41598-019-57132-0; Ramos-Alonso et al. PLoS Genet. 2018;14(6):e1007476. doi: 10.1371/journal.pgen.1007476; van Wijlick et al. PLoS Genet. 2016;12(10):e1006395. doi: 10.1371/journal.pgen.1006395; Garre et al., 2012 Mol Biol Cell. ;23(1):137-50. doi: 10.1091/mbc.E11-05-0419.). Three independent replicates were analyzed and results were reproducible and statistically significant, as shown in Fig. S2. Total RNA was extracted from each fraction using the SpeedTools Total RNA Extraction kit (Biotools B&M Labs). In the first replicate a spike in RNA control (Phenylalanine) was added and tested that no significant differences in the results were obtained when using or not the spike in control (see below Figure R3 for referees). mRNA relative values are always obtained from qPCR using a calibrating efficiency standard curve for each pair of oligos, after the initial set up of the qPCR for this specific pair of oligos. Therefore, slight differences in amplification efficiencies for each oligo pair are taken into account. More details about qPCR are now included in the Materials and Methods section (“Polyribosome profile analysis” subsection) and one additional reference is also included for the processing of polysomal gradient fractions.

      It would be helpful to state how long CDS are for these mRNAs and where 2-3/2-8 cut off made is what for determining what is 'short' vs 'long' and the scientific basis for selecting 2-3 vs 2-8, why 8? Were M fractions also used in qPCR, they appear to be ignored in the analysis as currently presented?

      Response: The CDS lengths of the mRNAs analyzed by polysome profiling and other important features are now included in new Table S5. We decided to classify as short length mRNAs those with a length below 600 bp, while mRNAs with lengths above 600 bp were classified as long length mRNAs. This classification was made on the basis of specific mRNA profiles obtained by qPCR analysis. mRNAs with short lengths behaved similarly and we selected 2n-3n fractions since the main polysomal peak under normal conditions appeared among 4n-5n fractions. In this line, long length mRNAs also behaved similarly between them, and we selected 2n to 8n fractions since the main polysomal peak under normal conditions appeared right after the 8n fraction. This information is now included in the Results and Materials and Methods sections.

      Regarding the use of the Monosomal fractions, yes, they were used as it can be seen in Fig. S2 which includes the distribution in Monosomal (M), lighter (2n-3n/2n-8n) or heavier (n>3/n>8, P) polysomal fractions. In the polysomal profiles we can be see that depletion of eIF5A causes a reduction in the amount of mitochondrial mRNAs in the heavier fractions and a corresponding increase in the amount of mRNAs in the lighter polysomal fractions, while no significant changes are found in the monosomal fractions. Therefore, the statistically significant change in the heavier/lighter polysomal fraction ratio is indicative of the translation down-regulation and these ratios are shown in Fig. 2C. As the Reviewer #2 commented in point 5, the change in mRNA distribution to lighter polysomal fractions may be indicative or ribosome stalling at the 5’ ORF region, compatible with a stall at the mitochondrial target signal (MTS), and this discussion is now included in the Results and Discussion section.

      Which transcripts studied here encode proteins with signal sequences? As Signal sequence pauses early in translation should impact ribosome loading this is potentially important here as discussed above.

      Response: Yes, we agree with Reviewer #2 that this information may be relevant according to the hypothesis of ribosome stall at the MTS. Therefore, a score value of probability of harbouring an MTS presequence (Fukasawa et al., 2015) is now included in Table S5 for each of the mRNAS analyzed by polysome profiling. The discussion of this point has also been included in the Results and Discussion sections.

      While it has been shown that SRP recognition is able to slow and even arrest translation of ER signal recognition peptides, there is currently no known direct SRP like correlate for mitochondrial signal sequences. We are therefore unaware of literature showing that mitochondrial signal sequences pause translation in a manner similar to ER signal sequences. We have previously found that downstream translational slowing is important for mitochondrial mRNA targeting (Tsuboi et al 2020, Arceo et al 2022), but we believe that to be distinct to what the Reviewer #2 is addressing.

      Figures 3-5. Microscopy. The false green color images in Figure 3B do not show up well. They may be better shown in grayscale, with only the multiple overlays in color.

      Response: False color for fluorescent microscopy images are widely used because they help to visualize the results to the readers and also facilitate the interpretation of multiple overlays. The use of false color is also suggested by Reviewer #4.

      Figure 3C should show the data spread for all 150 cells and normalise differently as discussed above for westerns. I do not believe that all 150 WT cells have exactly the same GFP intensity, which is what the present plot claims.

      Response: As answered to point 3 made by this Reviewer, now all figures, including Fig. 3C, are made with Graphpad and scatter plot with all individual points plotted, additionally to showing the mean, SD and statistical analysis. Results correspond to three independent experiments and show a statistically significant difference in Pdr5-GFP intensity signal between wild-type and tif51A-1 mutant. Figure legend has been corrected accordingly.

      For panels 3D-F image quantification should be shown so that the variation across a population is clear. Eg in violin plots, or showing every point. It should be clear what proportion of cells have GFP aggregates and what the variation in number of granules is.

      Response: The quantification of cytoplasmic Yta12 aggregates is now included in Fig.3E, which shows significant differences between the tif51A-1 mutant and the wild-type strain. Results show the individual values from three independent experiments with a minimum of 150 cells counted. We used a bar graph in which the values (% of cells with 0, 1, 2 or 3 aggregates) for each independent experiment are shown together with the mean, SD and statistical analysis. Figure legend has been corrected accordingly. This information also responds to the comments made by Reviewer #1.

      Figure 4H has no error bars.

      Response: New Fig.4H now shows the individual values of each of the three independent replicates, mean and error bars (SD). Figure legend has been corrected accordingly.

      Figure 5C normalises 2 WTs to 1 as in Figure 3C. Both would be better as violin plots.

      Response: Results in Fig. 5C are now shown using Graphpad and scatter plot in which all individual values are plotted (not normalized wild-type to 1), and also mean, SD and statistical significance. Results correspond to three independent replicates with the fluorescence intensity measured in more than 150 cells.

      Figure 5D/E shows 37{degree sign}C data only. Do tif51A-1 cells have aggregates at 25{degree sign}C?There are no error bars in Figure 5E or any indication of how many cells/replicates were quantified.

      Response: Figures 5D and 5E only show data at 37ºC since there are no Tim50-GFP aggregates, nor aggregates of other mitochondrial proteins, in tif51A-1 mutants at 25ºC, as shown in Fig. S3C-F and Fig. S5C.

      New Fig. 5E shows individual values from each of the three independent experiments, mean, SD and statistical significance. Results correspond to the measurement of Tim50 protein aggregates in more than 150 cells. Figure legend has been corrected accordingly.

      There are no sizing bars on any of the micrographs.

      Response: Now, all sets of microscopy figures contain a size bar and this is indicated in the corresponding Figure legend.

      The methods states that all quantification was done using ImageJ, but there is no detail given about how this was done. There are lots of ways to use ImageJ.

      Response: A detailed description of the quantifications made using ImageJ is now included in the Materials and Methods section (“Fluorescent microscopy and analysis” subsection).

      Figure 4. Luciferase assay. It is clear that there are differences in Tim50 vs Tim50∆7pro signal over time from the primary plots. It is not clear why the quantification plots on the right are from 2 selected time points. It is more typical to calculate the rate of increase in RLU per min in the linear portion of the plot, for these examples it would be approximately 30-40 mins.

      Response: As luciferase mRNA level is also increasing with time, the total amount of luciferase protein will increase exponentially. At some point mRNA levels will reach a steady state and for a brief period there could be a linear portion of RLU increase, but that will be different for each condition and reporter as ribosome quality control can have a direct impact on mRNA half-life. We have instead chosen two time points to show that statistical differences in Tim50 protein expression upon eIF5A depletion are not dependent on the time point chosen. We have also included the full data plots for readers to view the raw data.

      Figure 4F. The text on p6 states Fig 4F is evidence of RQC induction. This is an overstatement. There are no data presented relating to RQC.

      Response: Ribosome-associated quality control (RQC) is a mechanism by which elongation-stalled ribosomes are sensed in the cell, and then removed from the stall site by ribosomal subunit dissociation. This is the definition of RQC. With high levels of RQC this will cause a drop in ribosome density downstream of the stall site because of ribosome removal. While we would agree that most studies do not show actual buildup of ribosomes at ribosome stalls, and removal after the stall, we do. Our ribosome profiling analysis shows in vivo distribution of ribosome density across the TIM50 mRNA in wild-type and upon eIF5A depletion. We show that in the eIF5A depletion the ribosome density is similar to wild-type for the first ~200 bp, then there is a buildup of ribosomes for ~300 bps up to the stretch of polyproline residues, indicative of slowed ribosome movement. This slowed ribosome movement is further supported by our translation duration measurements in Fig. 4E. Then the transcript is almost completely devoid of ribosomes after the stretch of proline residues, indicating the ribosomes are removed at the proline stretch. This combination of ribosome stalling (Fig. 4E,4F) and subsequent ribosome removal (Fig 4F) is the textbook definition of RQC, so we indicate this as evidence for RQC.

      Figure 5G. It is not clear to this reviewer why the CYC1 reporter is impacted by Tim50∆pro at 25{degree sign}C. Can the authors comment?

      Response: This is also not clear to us, however, no differences are seen with and without eIF5A depletion, supporting the interpretation that Cyc1 translation is not affected by eIF5A depletion when Tim50 protein levels are restored in the Tim50∆pro strain. However, in order to clarify this point, we propose, if it is considered necessary, to remake the Tim50∆pro CYC1 reporter strain.

      Does ∆pro impact Tim50 function or is there possibly some other off target impact of integrating the reporter in this strain?

      Response: As answered to Reviewer #1 in her/his point 1, the functionality of Tim50ΔPro is shown by the fact that wild-type cells carrying this Tim50 protein version as the only copy of Tim50 grew well in glycerol media, where Tim50 is essential for the mitochondrial function (Fig. 5A). However, we suspect that Tim50ΔPro is a bit less efficient protein since a double mutant tif51A-1 Tim50ΔPro shows even reduced growth than the single tif51A-1 mutant (Fig. 5A). We do not expect off target impact in this Tim50ΔPro strains, although we cannot exclude this 100%, as in any other yeast strain obtained by transformation.

      Significance

      Strengths and Limitations:

      Strengths are that the study uses a wide range of molecular approaches to address the questions and that the results present a clear story.

      Limitations are that the poly-proline residues identified in yeast Tim50 are not conserved through to humans, so the direct relevance to higher organisms is unclear. However there are many more poly-proline proteins in human genes than in yeast and there are rare genetic conditions affecting eIF5A and its hypusination

      Advance. provides a clear link between dysregulation of eIF5A, Tim50 expression and wider impact on mitochondria.

      Audience. Scientists interested in protein synthesis, mitochondrial biology and clinicians investigating rare human disorders of eIF5A and hypusination.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      eIF5A is required to mediate efficient translation elongation of some amino-acid sequences like polyproline motifs, and eIF5A depletion was reported to impair mitochondrial respiration functions, decreasing mitochondrial protein levels. In this study, Barba-Aliaga et al. showed that eIF5A is important for the translation of the Pro-repeat containing protein, Tim50, an essential subunit of the TIM23 complex, the presequence translocase in the mitochondrial inner membrane. eIF5A ts mutants caused ribosome stalling of Tim50 mRNA on the mitochondrial surface at non-permissive temperature, and the removal of the Pro-repeat from Tim50 (Tim50-delta7Pro mutant) made its translation independent of eIF5A. However, the replacement of endogenous Tim50 with Tim50-delta7Pro did not recover the cell growth defects of eIF5A ts mutant on respiration medium at semi-permissive temperature, suggesting that Tim50 is not the only reason for the global mitochondrial defects caused by defective eIF5A.

      (1) I am wondering why the authors mainly used the eIF5A ts mutant strains instead of the eIF5A degron strain since, for example, the decrease in the level of Tim50 was only marginal (Fig. EV4A).

      Response: eIF5A is a very abundant protein and with high stability (SGD data: 273594 molecules/cell in YPD and 9.1 h protein half-life). We have used temperature-sensitive strains, tif51A-1, instead of eIF5A-degron because eIF5A is depleted much quicker in the first than the second system. As it can be seen in Schuller et al., Mol Cell. 2017;66(2):194-205.e5. doi: 10.1016/j.molcel.2017.03.003, with the eIF5A-degron system the addition of auxin was made in parallel to a transcriptional shut off using GAL promoter to express eIF5A-degron, changing the media from galactose to glucose and incubating the cells for 10 hours. With our approach using temperature-sensitive proteins, almost full depletion (without affecting viability, see Li et al., Genetics 2014; 197(4):1191-200 doi: 10.1534/genetics.114.166926) can be done after 4-6 h incubation at 37ºC or 4 h incubation at 41ºC (Fig. 1C and Fig. S1E, almost no signal is detected by western blotting). Therefore, we chose to use eIF5A depletion with temperature-sensitive yeast strains to achieve stronger protein depletion with shorter times and avoid secondary effects. In addition, the two eIF5A temperature-sensitive strains used in this study have been widely used by us and others (Pelechano and Alepuz, 2017; Zanelli and Valentini, 2005; Zanelli et al., 2006; Dias et al., 2008; Muñoz-Soriano et al., 2017; Rossi et al., 2014; Li et al., 2014; Xiao et al., 2024).

      (2) To show that the compromised translation of Tim50 in the absence of functional eIF5A causes defects in the mitochondrial protein import by clogging the import channels, the authors should directly observe the accumulation of the precursor forms of several matrix-targeting proteins by immunoblotting. In this sense, the results in Fig. 1C for Hsp60 do not fit the interpretation of import channel clogging.

      Response: We did not see precursor mitochondrial proteins by Western blot upon eIF5A depletion possibly because: 1) the mature protein form is more abundant and stable; 2) the precursor mito-protein appears in cytoplasmic aggregates and this may not be easily extracted during preparation of proteins for Western blot analysis. In the work by Weidberg and Amon, 2018, who described the mitoCPR response; Krämer et al., 2023, who described mitostores; and others (Wrobel et al., 2015; Boos et al., 2019) the authors use extreme over-expression of mitoproteins or mutations in essential proteins for mitochondrial biogenesis to induce clogging of translocases and accumulation of precursors in the cytosol. However, we are using and detecting proteins at their physiological levels, expressed under their native promoters, what may explain why we do not detect precursor mito-proteins. We are using what we believe to be a much more physiologically relevant system, where we use endogenous expression of mitochondrially imported proteins. Yet we see similar transcriptional induction of mitoCPR targets (CIS1, PDR5, PDR15) and mislocalization of mitochondrial proteins to Hsp104 marked aggregates (MitoStores).

      (3) The authors speculated in the Discussion section that import defects caused by compromised translation of Tim50 could cause down-regulation of translation through prolonged mitochondrial stress. However, this lacks experimental evidence.

      Response: We do see that depletion of eIF5A causes import defects through Tim50 and correlates with the down-regulation of translation of mRNAs encoding mitoproteins as shown in Fig. 2C and Fig. S2. In these figures it can be seen that mito-mRNAs move from heavier to lighter polysomal fractions upon eIF5A depletion, indicating that less ribosomes are bound to these mRNAs. Importantly, synthesis of Cyc1 and Cox5A mitochondrial proteins is recovered when TIM50 gene is replaced by an eIF5A-translation independent TIM50ΔPro gene, arguing in favor of a translation defect caused by eIF5A depletion through the collapse of import systems produced by the ribosome stalling in TIM50 mRNA.

      As discussed by Reviewer #2 and in our answers to his/her points 5 and 6, the reduction in the number of ribosomes bound to mito-mRNAs upon eIF5A depletion may be a consequence of the stall of ribosomes after the mRNA 5’ coding region encoding the MTS. This discussion has now been introduced in the Discussion section. This information also responds to the comments made by Reviewer #2.

      (4) The authors stated that human Tim50 does not have Pro-repeat motif, but how about other organisms (like other fungi species)? Is the present observation specific only to S. cerevisiae?

      Response: We have now included a sequence alignment of the Tim50 protein sequences of different yeast species (Saccharomyces cerevisiae, Candida albicans, Candida glabrata, Candida lipolytica, Schizzosaccharomyces pombe, Schizzosaccharomyces jamonicus), mouse and human (Fig. S4A). The resulting alignment shows that S. cerevisiae is the only organism presenting the seven consecutive proline residues. Still, C. albicans and C. glabrata conserve five consecutive prolines while C. lipolytica conserves five non-consecutive prolines. Furthermore, S. pombe and S. jamonicus, and mouse and human, conserve three and four non-consecutive prolines respectively. This means that the observations presented in this manuscript could be extended to other fungi species as well since most of the proline residues are conserved and are predicted to behave as eIF5A-dependent motifs for translation. Moreover, the described eIF5A-dependent tripeptide motif PDP is found in humans, mice and S. pombe at the Tim50 region where we found the PPP motif inducing ribosome stalling in S. cerevisiae (Fig S4A). This may confer eIF5A-dependent ribosome stall since as we showed in our previous ribosome footprinting (Pelechano et al., 2017), this PDP motif causes a similar high ribosome stall as the PPP motif. This discussion has now been introduced in the Results and Discussion sections.

      (5) Two references in the text are marked with "?", which should be corrected.

      Response: We thank you the Reviewer #3 for noticing this, references have been corrected in the text.

      __Reviewer #3 (Significance (Required)): __

      The essence of this work, the role of eIF5A in the efficient translation of Pro-repeat containing Tim50 (Figs. 4 and 5), is important and worth publication. However, the results of the effects of defective eIF5A on the levels and localization of mitochondrial proteins (Figs.1-3) can be even deleted to make clear the point of this work.

      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      The manuscript submitted by Barba-Aliaga et al. aims to understand on the molecular level how eIF5A influences mitochondrial function. elF5A promotes translation elongation at stretches prone to translational stalling like e.g. polyproline sequence. The finding that eIF5a influences mitochondrial function has been previously reported by the same group and by others. In this context, it was suggested that eIF5a promotes translation of N-terminal mitochondrial targeting signals. Here, the authors propose an alternative mechanism and suggest that "eIF5a directly controls mitochondrial protein import through alleviation of ribosome stalling along TIM50 mRNA." Using luciferase reporter assay, the authors indeed convincingly show that the speed of Tim50 translation is dependent on the presence of functional TIF51A, the major eIF5a in yeast, and that this dependence comes from the presence of the polyproline stretch in Tim50. The rest of the manuscript is unfortunately less clear and it is very hard, if not impossible, to sort out direct from secondary effects and compensations. The authors use proteomics, biochemical methods, RNAseq and fluorescence microscopy to analyze the temperature sensitive tif51A mutant but the conditions used in the manuscript are non-consistent between various experiments presented, in respect to the medium, temperature, preculture condition and the length of treatment used.

      Response: We do not agree with this Reviewer #4 appreciation. We used different molecular approaches to investigate different questions. Indeed, this is one of the Strengths that is highlighted by Reviewer 2 as it reads above: “Strengths are that the study uses a wide range of molecular approaches to address the questions and that the results present a clear story.” All the experiments presented in the manuscript, apart from proteomics analysis (Fig. 1), have been performed in the same conditions respect to the medium (SGal), temperature (25ºC/37ºC), preculture condition (SGal, 25ºC) and length of treatment used (4 h of depletion at 37ºC). This is already clearly specified in every Figure legend along the whole manuscript and also in the Materials and Methods section. In addition, individual values from each replicate, mean, standard deviation and statistical tests are shown for every Figure in the manuscript. Therefore, we do believe that conditions are consistent between experiments and conclusions are made based on different experiments and different scientific approaches.

      We agree with Reviewer #4 in that depletion of eIF5A protein in the temperature sensitive tif51A-1 mutant was done in the proteomic at 41°C for 4 h, whereas in the rest of experiments depletion is made at 37°C for 4 h. As answered to Reviewer #2 (see answer to point 2), stronger depletion conditions were used to get clear proteomic results, and in order to compare both temperatures we have added now some controls showing eIF5A depletion and growth of tif51A-1 mutant at 41°C and 37°C; importantly, we also show the reduction in mito-protein levels upon eIF5A depletion at 37°C (Fig. S1B and E).

      In some cases, the genetic background of the yeast strains and plasmids used are also unclear (e.g. pYES2-pGAL-FLAG-TIM50-GFP-URA3 - based on the provided description, TIM50 was inserted between FLAG and GFP tags; if so, mitochondrial targeting signal of Tim50 would be masked making its import into mitochondria impossible).

      Response: We do not agree with this appreciation. The genetic background of the yeast strains is always the same along the whole manuscript (BY4741 background) and is clearly specified in Table S2. In this line, all the information regarding the plasmids used can be found at Table S3 and plasmids construction is extensively detailed in the Materials and Methods section (“Yeast strains, plasmids, and growth conditions” subsection).

      Regarding the pYES2-pGAL-FLAG-TIM50-GFP-URA3 plasmid and as already mentioned in the text, we only used this plasmid to analyze by western blotting the protein synthesis of Tim50 independently of its subcellular localization. Our results (Fig. S4C) confirm that the synthesis defect of this Tim50 version upon eIF5A depletion is only due to the presence of the polyproline region. Importantly, we did not make any conclusion regarding import defects or protein localization based on these results.

      I have no doubt that upon exposure of tif51A cells to 41{degree sign}C for 4h cells initiate a number of cellular responses including mitoCPR and formation of MitoStores, however, I don´t think that the authors convincingly show that these are initiated by reduced levels of Tim50 - on the contrary, the authors show that levels of Tim50 are actually not significantly changed. This can hardly be reconciled with the model proposed. In addition, should the effect of Tif51A on mitochondria primarily be due to its effect on Tim50, then Tim50deltaPro should rescue the phenotype of tif51a mutant but it didn´t; if anything, it made it worse (see Fig 5A - the double mutant grows worse than the single ones). Furthermore, expression of Cyc1 luciferase reporter is reduced in Tim50deltaPro strain even at permissive temperature, Figure 5G. Since cytochrome c is not a substrate of the presequence pathway this again points to the secondary effects that are being observed.

      Response: We believe that our main results, summarized next and all performed at 37°C, do show that translation defects in TIM50 mRNA are the cause of the mitoCPR induction and formation of MitoStores. First, Tim50 protein levels are significantly reduced upon eIF5A depletion, as shown in Fig. S4A and S4B. Although being statistically significant, we agree that the reduction in Tim50 protein level is quantitatively low. This can be explained by the high stability of Tim50 protein, with a half-life of approximately 9.6 h (Christiano et al, 2014), which makes it more difficult to measure large differences in new protein synthesis. This is why we additionally used an accurate and quantitative test for showing the eIF5A-dependency for TIM50 mRNA translation: the fusion of the TIM50 DNA sequence to a TetO7-inducible nLuc reporter, which allows to monitor the appearance of new Tim50 protein and to estimate the translation elongation rate (Fig.4C-E). The ribosome stalling at TIM50 mRNA provoked by eIF5A depletion, where this mRNA is located at the mitochondrial surface to promote the import of nascent Tim50 protein during translation (Fig. S5B), may cause by itself the clogging of the protein import system even though yields only a slight reduction in total Tim50 cellular protein. Second, as Reviewer #4 pointed with our model, Tim50deltaPro should rescue the phenotype of tif51A-1 mutant and it does it: no mitoCPR induction and no mito-protein cytoplasmic aggregation are observed (Fig. 5D-F). Moreover, no differences in Cyc1- and Cox5a-nanoLuc synthesis are observed in the tif51A-1 Tim50ΔPro strain between depletion and not depletion conditions (Fig. 5E). These results strongly suggest that the mitochondrial protein import defects (and consequently the mitoCPR induction and mito-protein cytoplasmic aggregation) caused by eIF5A depletion are a consequence of ribosome stalling during TIM50 mRNA translation. However, Reviewer #4 is right in that mitochondrial respiration and growth in glycerol are not restored in the tif51A-1 Tim50ΔPro strain, even though Tim50 protein levels have been restored under eIF5A depletion conditions. As we discuss in the manuscript, we expect that there are additional mitochondrial proteins as targets of eIF5A, such as Yta12 and/or others. We have added further data pointing to ribosome stalling and RQC for other cotranslationally inserted mitochondrial proteins (Table S6). Accordingly, this has also been included in the Discussion section. However, the identification and study of these other mitochondrial targets goes beyond the aim of our current study.

      Minor comments

      1. Page 1, mitochondrial proteins cross do not the intermembrane space through Tom40 but rather the outer membrane Response: We think the Reviewer #4 misunderstood the sentence because we are saying exactly what he/she states: mitoproteins cross the outer membrane to the intermembrane space through Tom40. Thus our sentence is:

      “Usually, mitoproteins contact the central receptor Tom20 and cross to the intermembrane space (IMS) through Tom40, the β-barrel pore-forming subunit.”

      Therefore, we kept the sentence.

      Page 4, ATP1 is present in the matrix and not the inner membrane

      Response: This has been corrected. We thank the Reviewer for pointing this.

      The citations are missing at several places - they are left as "?"

      Response: References have been corrected in the text.

      It would be nice if microscopy images were colored in magenta and cyan, rather than red and green, to make them accessible to a wider audience.

      Response: Green and red colors for fluorescent microscopy images are widely used in high-impact journals, especially when showing mitochondrial proteins and mitochondrial marker Su9-mCherry (Hughes et al., 2016, eLife, doi: 10.7554/eLife.13943; Kakimoto et al., 2018, Scientific Reports, doi: 10.1038/s41598-018-24466-0; Kreimendahl et al, 2020, BMC Biology, doi: 10.1186/s12915-020-00888-z). However, if the Reviewers think this is essential for publication, microscopy images can be colored in magenta and cyan instead.

      Formally speaking, Tim50 is not per se a translocase, it is either a component of the translocase or, more precisely, a receptor of the translocase. Similarly, Tom20 and Tom70 are not membrane transporters but rather receptors of the TOM complex.

      Response: We have changed the title and text to be more precise in the description of the components of the mitochondrial import systems as suggested by Reviewer #4.

      Reviewer #4 (Significance (Required)):


      This is a potentially interesting story, however, the conditions used for the analysis of the temperature sensitive mutants were either too harsh or these mutants are in general impossible to control, making the manuscript, in my opinion, unfortunately too premature for publication.

      Response: We do not agree with the Reviewer #4 opinion, all experiments were done at 37ºC except the proteomic analysis that it is also confirmed further for Tim50 and Por1 proteins at 37ºC. We want to stress that we show all experiments with at least three biological replicates, individual values for each measurement are included now in the graphics as recommended by Reviewer #2, and the mean, SD and statistical tests are included. We make conclusions based in statistical significant differences along the manuscript. The temperature-sensitive yeast mutants used show reproducible analysis, they behave as expected in the controlled conditions used and they have been widely used in our lab and others (Pelechano and Alepuz, 2017; Zanelli and Valentini, 2005; Zanelli et al., 2006; Dias et al., 2008; Muñoz-Soriano et al., 2017; Rossi et al., 2014; Li et al., 2014; Xiao et al., 2024).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work provides a valuable contribution and assessment of what it means to replicate a null study finding, and what are the appropriate methods for doing so (apart from a rote p-value assessment). Through a convincing re-analysis of results from the Reproducibility Project: Cancer Biology using frequentist equivalence testing and Bayes factors, the authors demonstrate that even when reducing 'replicability success' to a single criterion, how precisely replication is measured may yield differing results. Less focus is directed to appropriate replication of non-null findings.

      Reviewer #1 (Public Review):

      Summary:

      The goal of Pawel et al. is to provide a more rigorous and quantitative approach for judging whether or not an initial null finding (conventionally with p ≥ 0.05) has been replicated by a second similarly null finding. They discuss important objections to relying on the qualitative significant/non-significant dichotomy to make this judgment. They present two complementary methods (one frequentist and the other Bayesian) which provide a superior quantitative framework for assessing the replicability of null findings.

      Strengths:

      Clear presentation; illuminating examples drawn from the well-known Reproducibility Project: Cancer Biology data set; R-code that implements suggested analyses. Using both methods as suggested provides a superior procedure for judging the replicability of null findings.

      Weaknesses:

      The proposed frequentist and the Bayesian methods both rely on binary assessments of an original finding and its replication. I'm not sure if this is a weakness or is inherent to making binary decisions based on continuous data.

      For the frequentist method, a null finding is considered replicated if the original and replication 90% confidence intervals for the effects both fall within the equivalence range. According to this approach, a null finding would be considered replicated if p-values of both equivalences tests (original and replication) were, say, 0.049, whereas would not be considered replicated if, for example, the equivalence test of the original study had a p-value of 0.051 and the replication had a p-value of 0.001. Intuitively, the evidence for replication would seem to be stronger in the second instance. The recommended Bayesian approach similarly relies on a dichotomy (e.g., Bayes factor > 1).

      Thanks for the suggestions, we now emphasize more strongly in the “Methods for assessing replicability of null results” and “Conclusions” sections that both TOST p-values and Bayes factors are quantitative measures of evidence that do not require dichotomization into “success” or “failure”.

      Reviewer #2 (Public Review):

      Summary:

      The study demonstrates how inconclusive replications of studies initially with p > 0.05 can be and employs equivalence tests and Bayesian factor approaches to illustrate this concept. Interestingly, the study reveals that achieving a success rate of 11 out of 15, or 73%, as was accomplished with the non-significance criterion from the RPCB (Reproducibility Project: Cancer Biology), requires unrealistic margins of Δ > 2 for equivalence testing.

      Strengths:

      The study uses reliable and shareable/open data to demonstrate its findings, sharing as well the code for statistical analysis. The study provides sensitivity analysis for different scenarios of equivalence margin and alfa level, as well as for different scenarios of standard deviations for the prior of Bayes factors and different thresholds to consider. All analysis and code of the work is open and can be replicated. As well, the study demonstrates on a case-by-case basis how the different criteria can diverge, regarding one sample of a field of science: preclinical cancer biology. It also explains clearly what Bayes factors and equivalence tests are.

      Weaknesses:

      It would be interesting to investigate whether using Bayes factors and equivalence tests in addition to p-values results in a clearer scenario when applied to replication data from other fields. As mentioned by the authors, the Reproducibility Project: Experimental Philosophy (RPEP) and the Reproducibility Project: Psychology (RPP) have data attempting to replicate some original studies with null results. While the RPCB analysis yielded a similar picture when using both criteria, it is worth exploring whether this holds true for RPP and RPEP. Considerations for further research in this direction are suggested. Even if the original null results were excluded in the calculation of an overall replicability rate based on significance, sensitivity analyses considering them could have been conducted. The present authors can demonstrate replication success using the significance criteria in these two projects with initially p < 0.05 studies, both positive and non-positive.

      Other comments:

      • Introduction: The study demonstrates how inconclusive replications of studies initially with p > 0.05 can be and employs equivalence tests and Bayesian factor approaches to illustrate this concept. Interestingly, the study reveals that achieving a success rate of 11 out of 15, or 73%, as was accomplished with the non-significance criterion from the RPCB (Reproducibility Project: Cancer Biology), requires unrealistic margins of Δ > 2 for equivalence testing.

      • Overall picture vs. case-by-case scenario: An interesting finding is that the authors observe that in most cases, there is no substantial evidence for either the absence or the presence of an effect, as evidenced by the equivalence tests. Thus, using both suggested criteria results in a picture similar to the one initially raised by the paper itself. The work done by the authors highlights additional criteria that can be used to further analyze replication success on a case-by-case basis, and I believe that this is where the paper's main contributions lie. Despite not changing the overall picture much, I agree that the p-value criterion by itself does not distinguish between (1) a situation where the original study had low statistical power, resulting in a highly inconclusive non-significant result that does not provide evidence for the absence of an effect and (2) a scenario where the original study was adequately powered, and a non-significant result may indeed provide some evidence for the absence of an effect when analyzed with appropriate methods. Equivalence testing and Bayesian factor approaches are valuable tools in both cases.

      Regarding the 0.05 threshold, the choice of the prior distribution for the SMD under the alternative H1 is debatable, and this also applies to the equivalence margin. Sensitivity analyses, as highlighted by the authors, are helpful in these scenarios.

      Thank you for the thorough review and constructive feedback. We have added an additional “Appendix C: Null results from the RPP and EPRP” that shows equivalence testing and Bayes factor analyses for the RPP and EPRP null results.

      Reviewer #3 (Public Review):

      Summary:

      The paper points out that non-significance in both the original study and a replication does not ensure that the studies provide evidence for the absence of an effect. Also, it can not be considered a "replication success". The main point of the paper is rather obvious. It may be that both studies are underpowered, in which case their non-significance does not prove anything. The absence of evidence is not evidence of absence! On the other hand, statistical significance is a confusing concept for many, so some extra clarification is always welcome.

      One might wonder if the problem that the paper addresses is really a big issue. The authors point to the "Reproducibility Project: Cancer Biology" (RPCB, Errington et al., 2021). They criticize Errington et al. because they "explicitly defined null results in both the original and the replication study as a criterion for replication success." This is true in a literal sense, but it is also a little bit uncharitable. Errington et al. assessed replication success of "null results" with respect to 5 criteria, just one of which was statistical (non-)significance.

      It is very hard to decide if a replication was "successful" or not. After all, the original significant result could have been a false positive, and the original null-result a false negative. In light of these difficulties, I found the paper of Errington et al. quite balanced and thoughtful. Replication has been called "the cornerstone of science" but it turns out that it's actually very difficult to define "replication success". I find the paper of Pawel, Heyard, Micheloud, and Held to be a useful addition to the discussion.

      Strengths:

      This is a clearly written paper that is a useful addition to the important discussion of what constitutes a successful replication.

      Weaknesses:

      To me, it seems rather obvious that non-significance in both the original study and a replication does not ensure that the studies provide evidence for the absence of an effect. I'm not sure how often this mistake is made.

      Thanks for the feedback. We do not have systematic data on how often the mistake of confusing absence of evidence with evidence of absence has been made in the replication context, but we do know that it has been made in at least three prominent large-scale replication projects (the RPP, RPEP, RPCB). We therefore believe that there is a need for our article.

      Moreover, we agree that the RPCB provided a nuanced assessment of replication success using five different criteria for the original null results. We emphasize this now more in the “Introduction” section. However, we do not consider our article as “a little bit uncharitable” to the RPCB, as we discuss all other criteria used in the RPCB and note that our intent is not to diminish the important contributions of the RPCB, but rather to build on their work and provide constructive recommendations for future researchers. Furthermore, in response to comments made by Reviewer #2, we have added an additional “Appendix B: Null results from the RPP and EPRP” that shows equivalence testing and Bayes factor analyses for null results from two other replication projects, where the same issue arises.

      Reviewer #1 (Recommendations For The Authors):

      The authors may wish to address the dichotomy issue I raise above, either in the analysis or in the discussion.

      Thank you, we now emphasize that Bayes factors and TOST p-values do not need to be dichotomized but can be interpreted as quantitative measures of evidence, both in the “Methods for assessing replicability of null results” and the “Conclusions” sections.

      Reviewer #2 (Recommendations For The Authors):

      Given that, here follow additional suggestions that the authors should consider in light of the manuscript's word count limit, to avoid confusing the paper's main idea:

      2) Referencing: Could you reference the three interesting cases among the 15 RPCB null results (specifically, the three effects from the original paper #48) where the Bayes factor differs qualitatively from the equivalence test?

      We now explicitly cite the original and replication study from paper #48.

      3) Equivalence testing: As the authors state, only 4 out of the 15 study pairs are able to establish replication success at the 5% level, in the sense that both the original and the replication 90% confidence intervals fall within the equivalence range. Among these 4, two (Paper #48, Exp #2, Effect #5 and Paper #48, Exp #2, Effect #6) were initially positive with very low p-values, one (Paper #48, Exp #2, Effect #4) had an initial p of 0.06 and was very precisely estimated, and the only one in which equivalence testing provides a clearer picture of replication success is Paper #41, Exp #2, Effect #1, which had an initial p-value of 0.54 and a replication p-value of 0.05. In this latter case (or in all these ones), one might question whether the "liberal" equivalence range of Δ = 0.74 is the most appropriate. As the authors state, "The post-hoc specification of equivalence margins is controversial."

      We agree that the post hoc choice of equivalence ranges is a controversial issue. The margins define an equivalence region where effect sizes are considered practically negligible, and we agree that in many contexts SMD = 0.74 is a large effect size that is not practically negligible. We therefore present sensitivity analyses for a wide range of margins. However, we do not think that the choice of this margin is more controversial for the mentioned studies with low p-values than for other studies with greater p-values, since the question of whether a margin plausibly encodes practically negligible effect sizes is not related to the observed p-value of a study. Nevertheless, for the new analyses of the RPP and EPRP data in Appendix B, we have added additional sensitivity analyses showing how the individual TOST p-values and Bayes factors vary as a function of the margin and the prior standard deviation. We think that these analyses provide readers with an even more transparent picture regarding the implications of the choice of these parameters than the “project-wise” sensitivity analyses in Appendix A.

      4) Bayes factor suggestions: For the Bayes factor approach, it would be interesting to discuss examples where the BF differs slightly. This is likely to occur in scenarios where sample sizes differ significantly between the original study and replication. For example, in Paper #48, Exp #2 and Effect #4, the initial p is 0.06, but the BF is 8.1. In the replication, the BF dramatically drops to < 1/1000, as does the p-value. The initial evidence of 8.1 indicates some evidence for the absence of an effect, but not strong evidence ("strong evidence for H0"), whereas a p-value of 0.06 does not lead to such a conclusion; instead, it favors H1. It would be interesting if the authors discussed other similar cases in the paper. It's worth noting that in Paper #5, Exp #1, Effect #3, the replication p-value is 0.99, while the BF01 is 2.4, almost indicating "moderate" evidence for H0, even though the p-value is inconclusive.

      We agree that some of the examples nicely illustrate conceptual differences between p-values and Bayes factors, e.g., how they take into account sample size and effect size. As methodologists, we find these aspects interesting ourselves, but we think that emphasizing them is beyond the scope of the paper and would distract eLife readers from the main messages.

      Concerning the conceptual differences between Bayes factors and TOST p-values, we already discuss a case where there are qualitative differences in more detail (original paper #48). We added another discussion of this phenomenon in the Appendix C as it also occurs for the replication of Ranganath and Nosek (2008) that was part of the RPP.

      5) p-values, magnitude and precision: It's noteworthy to emphasize, if the authors decide to discuss this, that the p-value is influenced by both the effect's magnitude and its precision, so in Paper #9, Exp #2, Effect #6, BF01 = 4.1 has a higher p-value than a BF01 = 2.3 in its replication. However, there are cases where both p-values and BF agree. For example, in Paper #15, Exp #2, Effect #2, both the original and replication studies have similar sample sizes, and as the p-value decreases from p = 0.95 to p = 0.23, BF01 decreases from 5.1 ("moderate evidence for H0") to 1.3 (region of "Absence of evidence"), moving away from H0 in both cases. This also occurs in Paper #24, Exp #3, Effect #6.

      We appreciate the suggestions but, as explained before, think that the message of our paper is better understood without additional discussion of more general differences between p-values and Bayes factors.

      6) The grey zone: Given the above topic, it is important to highlight that in the "Absence of evidence grey zone" for the null hypothesis, for example, in Paper #5, Exp #1, Effect #3 with a p = 0.99 and a BF01 = 2.4 in the replication, BF and p-values reach similar conclusions. It's interesting to note, as the authors emphasize, that Dawson et al. (2011), Exp #2, Effect #2 is an interesting example, as the p-value decreases, favoring H1, likely due to the effect's magnitude, even with a small sample size (n = 3 in both original and replications). Bayes factors are very close to one due to the small sample sizes, as discussed by the authors.

      We appreciate the constructive comments. We think that the two examples from Dawson et al. (2011) and Goetz et al. (2011) already nicely illustrate absence of evidence and evidence of absence, respectively, and therefore decided not to discuss additional examples in detail, to avoid redundancy.

      7) Using meta-analytical results (?): For papers from RPCB, comparing the initial study with the meta-analytical results using Bayes factor and equivalence testing approaches (thus, increasing the sample size of the analysis, but creating dependency of results since the initial study would affect the meta-analytical one) could change the conclusions. This would be interesting to explore in initial studies that are replicated by much larger ones, such as: Paper #9, Exp #2, Effect #6; Goetz et al. (2011), Exp #1, Effect #1; Paper #28, Exp #3, Effect #3; Paper #41, Exp #2, Effect #1; and Paper #47, Exp #1, Effect #5).

      Thank you for the suggestion. We considered adding meta-analytic TOST p-values and Bayes factors before, but decided that Figure 3 and the results section are already quite technical, so adding more analyses may confuse more than help. Nevertheless, these meta-analytic approaches are discussed in the “Conclusions” section.

      8) Other samples of fields of science: It would be interesting to investigate whether using Bayes factors and equivalence tests in addition to p-values results in a clearer scenario when applied to replication data from other fields. As mentioned by the authors, the Reproducibility Project: Experimental Philosophy (RPEP) and the Reproducibility Project: Psychology (RPP) have data attempting to replicate some original studies with null results. While the RPCB analysis yielded a similar picture when using both criteria, it is worth exploring whether this holds true for RPP and RPEP. Considerations for further research in this direction are suggested. Even if the original null results were excluded in the calculation of an overall replicability rate based on significance, sensitivity analyses considering them could have been conducted. The present authors can demonstrate replication success using the significance criteria in these two projects with initially p < 0.05 studies, both positive and non-positive.

      Thank you for the excellent suggestion. We added an Appendix B where the null results from the RPP and EPRP are analyzed with our proposed approaches. The results are also discussed in the “Results” and “Conclusions” sections.

      9) Other approaches: I am curious about the potential impact of using an approach based on equivalence testing (as described in https://arxiv.org/abs/2308.09112). It would be valuable if the authors could run such analyses or reference the mentioned work.

      Thank you. We were unaware of this preprint. It seems related to the framework proposed by Stahel W. A. (2021) New relevance and significance measures to replace p-values. PLoS ONE 16(6): e0252991. https://doi.org/10.1371/journal.pone.0252991

      We now cite both papers in the discussion.

      10) Additional evidence: There is another study in which replications of initially p > 0.05 studies with p > 0.05 replications were also considered as replication successes. You can find it here: https://www.medrxiv.org/content/10.1101/2022.05.31.22275810v2. Although it involves a small sample of initially p > 0.05 studies with already large sample sizes, the work is currently under consideration for publication in PLOS ONE, and all data and materials can be accessed through OSF (links provided in the work).

      Thank you for sharing this interesting study with us. We feel that it is beyond the scope of the paper to include further analyses as there are already analyses of the RPCB, RPP, and EPRP null results. However, we will keep this study in mind for future analysis, especially since all data are openly available.

      11) Additional evidence 02: Ongoing replication projects, such as the Brazilian Reproducibility Initiative (BRI) and The Sports Replication Centre (https://ssreplicationcentre.com/), continue to generate valuable data. BRI is nearing completion of its results, and it promises interesting data for analyzing replication success using p-values, equivalence regions, and Bayes factor approaches.

      We now cite these two initiatives as examples of ongoing replication projects in the introduction. Similarly as for your last point, we think that it is beyond the scope of the paper to include further analyses as there are already analyses of the RPCB, RPP, and EPRP null results.

      Reviewer #3 (Recommendations For The Authors):

      I have no specific recommendations for the authors.

      Thank you for the constructive review.

      Reviewing Editor (Recommendations For the Authors):

      I recognize that it was suggested to the authors by the previous Reviewing Editor to reduce the amount of statistical material to be made more suitable for a non-statistical audience, and so what I am about to say contradicts advice you were given before. But, with this revised version, I actually found it difficult to understand the particulars of the construction of the Bayes Factors and would have appreciated a few more sentences on the underlying models that fed into the calculations. In my opinion, the provided citations (e.g., Dienes Z. 2014. Using Bayes to get the most out of non-significant results) did not provide sufficient background to warrant a lack of more technical presentation here.

      Thank you for the feedback. We added a new “Appendix C: Technical details on Bayes factors” that provides technical details on the models, priors, and calculations underlying the Bayes factors.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Bendzunas, Byrne et al. explore two highly topical areas of protein kinase regulation in this manuscript. Firstly, the idea that Cys modification could regulate kinase activity. The senior authors have published some standout papers exploring this idea of late, and the current work adds to the picture of how active site Cys might have been favoured in evolution to serve critical regulatory functions. Second, BRSK1/2 are understudied kinases listed as part of the "dark kinome" so any knowledge of their underlying regulation is of critical importance to advancing the field.

      Strengths:

      In this study, the author pinpoints highly-conserved, but BRSK-specific, Cys residues as key players in kinase regulation. There is a delicate balance between equating what happens in vitro with recombinant proteins relative to what the functional consequence of Cys mutation might be in cells or organisms, but the authors are very clear with the caveats relating to these connections in their descriptions and discussion. Accordingly, by extension, they present a very sound biochemical case for how Cys modification might influence kinase activity in cellular environs.

      Weaknesses:

      I have very few critiques for this study, and my major points are barely major.

      Major points

      (1) My sense is that the influence of Cys mutation on dimerization is going to be one of the first queries readers consider as they read the work. It would be, in my opinion, useful to bring forward the dimer section in the manuscript.

      We agree that the influence of Cys on BRSK dimerization is a topic of significant interest. Our primary focus was to explore oxidative regulation of the understudied BRSK kinases as they contain a conserved T-loop Cys, and we have previously demonstrated that equivalent residues at this position in related kinases were critical drivers of oxidative modulation of catalytic activity. We have demonstrated here that BRSK1 & 2 are similarly regulated by redox and this is due to oxidative modification of the T+2 Cys, in addition to Cys residues that are conserved amongst related ARKs as well as BRSK-specific Cys. Although we also provide evidence for limited redox-sensitive higher order BRSK species (dimers) in our in vitro analysis, these represent a small population of the total BRSK protein pool (this was validated by SEC-MALs analysis). As such, we do not have strong evidence to suggest that these limited dimers significantly contribute to the pronounced inhibition of BRSK1 & 2 in the presence of oxidizing agents, and instead believe that other biochemical mechanisms likely drive this response. This may result from oxidized Cys altering the conformation of the activation loop. Indeed, the formation of an intramolecular disulfide within the T-loop of BRSK1 & 2, which we detected by MS, is one such regulatory modification. It is noteworthy, that intramolecular disulfide bonds within the T-loop of AKT and MELK have already been shown to induce an inactive state in the kinase, and we posit a similar mechanism for BRSKs.

      While we recognize the potential importance of dimerization in this context, our current data from in vitro and cell-based assays do not provide substantial evidence to assert dimerization as a primary regulatory mechanism. Hence, we maintained a more conservative stance in our manuscript, discussing dimerization in later sections where it naturally followed from the initial findings. That being said, we acknowledge the potential significance of dimerization in the regulation of the BRSK T-loop cysteine. We believe this aspect merits further investigation and could indeed be the focus of a follow-up study.

      (2) Relatedly, the effect of Cys mutation on the dimerization properties of preparations of recombinant protein is not very clear as it stands. Some SEC traces would be helpful; these could be included in the supplement.

      In order to determine whether our recombinant BRSK proteins (and T-loop mutants) existed as monomers or dimers, we performed SDS-PAGE under reducing and non-reducing conditions (Fig 7). This unambiguously revealed that a monomer was the prominent species, with little evidence of dimers under these experimental conditions (even in the presence of oxidizing agents). Although we cannot discount a regulatory role for BRSK dimers in other physiological contexts, we could not produce sufficient evidence to suggest that multimerization played a substantial role in modifying BRSK kinase activity in our assays. We note that our in vitro analysis was performed using truncated forms of the protein, and as such it is entirely possible that regions of the protein that flank the kinase domain may serve additional regulatory functions that may include higher order BRSK conformations. In this regard, although we have not included SEC traces of our recombinant proteins, we have included analytical SEC-MALS of the truncated proteins (Supplementary Figure 6) which we believe to be more informative. We have also now included additional SEC-MALS data for BRSK2 C176A and C183A (Supplementary Figure 6d and e), which supports our findings in Fig 7, demonstrating the presence of limited dimer species under non-reducing conditions.

      (3) Is there any knowledge of Cys mutants in disease for BRSK1/2?

      We have conducted an extensive search across several databases: COSMIC (Catalogue of Somatic Mutations in Cancer), ProKinO (Protein Kinase Ontology), and TCGA (The Cancer Genome Atlas). These databases are well-regarded for their comprehensive and detailed records of mutations related to cancer and protein kinases. Our analysis using the COSMIC and TCGA databases focused on identifying any reported instances of Cys mutations in BRSK1/2 that are implicated in cancer. Additionally, we utilized the ProKinO database to explore the broader landscape of protein kinase mutations, including any potential disease associations of Cys mutations in BRSK1/2. However, we found no evidence to indicate the presence of Cys mutations in BRSK1/2 that are associated with cancer or disease. This lack of association in the current literature and database records suggests that, as of our latest search, Cys mutations in BRSK1/2 have not been reported as significant contributors to pathogenesis.

      (4) In bar charts, I'd recommend plotting data points. Plus, it is crucial to report in the legend what error measure is shown, the number of replicates, and the statistical method used in any tests.

      We have added the data points to the bar charts and included statistical methods in figure legends.

      (5) In Figure 5b, the GAPDH loading control doesn't look quite right.

      The blot has been repeated and updated.

      (6) In Figure 7 there is no indication of what mode of detection was used for these gels.

      We have updated the figure legend to confirm that the detection method was western blot.

      (7) Recombinant proteins - more detail should be included on how they were prepared. Was there a reducing agent present during purification? Where did they elute off SEC... consistent with a monomer of higher order species?

      We have added ‘produced in the absence of reducing agents unless stated otherwise’ in the methods section to improve clarity. Although we have not added additional sentences to describe the elution profile of the BRSK proteins by SEC during purification, we believe that the inclusion of analytical SEC-MALS data is sufficient evidence that the proteins are largely monomeric under non-reducing conditions.

      Reviewer #2 (Public Review):

      Summary:

      In this study by Bendzunas et al, the authors show that the formation of intra-molecular disulfide bonds involving a pair of Cys residues near the catalytic HRD motif and a highly conserved T-Loop Cys with a BRSK-specific Cys at an unusual CPE motif at the end of the activation segment function as repressive regulatory mechanisms in BSK1 and 2. They observed that mutation of the CPE-Cys only, contrary to the double mutation of the pair, increases catalytic activity in vitro and drives phosphorylation of the BRSK substrate Tau in cells. Molecular modeling and molecular dynamics simulations indicate that oxidation of the CPE-Cys destabilizes a conserved salt bridge network critical for allosteric activation. The occurrence of spatially proximal Cys amino acids in diverse Ser/Thr protein kinase families suggests that disulfide-mediated control of catalytic activity may be a prevalent mechanism for regulation within the broader AMPK family. Understanding the molecular mechanisms underlying kinase regulation by redox-active Cys residues is fundamental as it appears to be widespread in signaling proteins and provides new opportunities to develop specific covalent compounds for the targeted modulation of protein kinases.

      The authors demonstrate that intramolecular cysteine disulfide bonding between conserved cysteines can function as a repressing mechanism as indicated by the effect of DTT and the consequent increase in activity by BSK-1 and -2 (WT). The cause-effect relationship of why mutation of the CPE-Cys only increases catalytic activity in vitro and drives phosphorylation of the BRSK substrate Tau in cells is not clear to me. The explanation given by the authors based on molecular modeling and molecular dynamics simulations is that oxidation of the CPE-Cys (that will favor disulfide bonding) destabilizes a conserved salt bridge network critical for allosteric activation. However, no functional evidence of the impact of the salt-bridge network is provided. If you mutated the two main Cys-pairs (aE-CHRD and A-loop T+2-CPE) you lose the effect of DTT, as the disulfide pairs cannot be formed, hence no repression mechanisms take place, however when looking at individual residues I do not understand why mutating the CPE only results in the opposite effect unless it is independent of its connection with the T+2residue on the A-loop.

      Strengths:

      This is an important and interesting study providing new knowledge in the protein kinase field with important therapeutic implications for the rationale design and development of next-generation inhibitors.

      Weaknesses:

      There are several issues with the figures that this reviewer considers should be addressed.

      Reviewer #1 (Recommendations for The Authors):

      Major points

      Page 26 - the discussion could be more concise. There's an element of recapping the results, which should be avoided.

      Regarding the conciseness of the discussion section, we have thoroughly revised it to ensure a more succinct presentation, deliberately avoiding the recapitulation of results. The revised discussion now focuses on interpreting the findings and their implications, steering clear of redundancy with the results section.

      Figure 1b seems to be mislabeled/annotated. I recommend checking whether the figure legends match more broadly. Figure 1 appears to be incorrectly cited throughout the results.

      Thank you for pointing out the discrepancies in the labeling and citation of Figure 1b. We have carefully reviewed and corrected these issues to ensure that all figure labels, legends, and citations accurately reflect the corresponding data and illustrations. We appreciate your attention to detail and the opportunity to improve the clarity and accuracy of our presentation.

      Figure 6 - please include a color-coding key in the figure. Further support for these simulations could be provided by supplementary movies or plots of the interaction. Figure 4 colour palette should be adjusted for the spheres in the Richardson diagrams to have greater distinction.

      As suggested, we have amended the colour palette in Figure 4 to improve conformity throughout the figure.

      Minor points

      Figure 2 - it'd be helpful to know what the percentage coverage of peptides is.

      We have updated the figure legend to include peptide coverage for both proteins

      Some typos - Supp 2 legend "Domians".

      Fixed

      Figure 6 legend - analyzed by needs a space;

      Fixed

      Fig 8 legend schematic misspelled.

      Fixed

      Broadly, if you Google T-loop you get a pot pourri of enzyme answers. Why not just use Activation loop?

      The choice of "T-loop" over "Activation loop" in our manuscript was made to maintain consistency with other literature in the field, and in particular our previous paper “Aurora A regulation by reversible cysteine oxidation reveals evolutionarily conserved redox control of Ser/Thr protein kinase activity” where we refer to the activation loop cysteine as T-loop + 2. We acknowledge the varied enzyme contexts in which "T-loop" is used and agree on the importance of clarity. To address this, we made an explicit note in the manuscript that the "T-loop" is also referred to as the "Activation loop", ensuring readers are aware of the interchangeable use of these terms. Additionally, this nomenclature facilitates a more straightforward designation of cysteine residues within the loop (T+2 Cysteine). We believe this approach balances adherence to established conventions with the need for clarity and precision in our descriptions.

      Methods - what is LR cloning. Requires some definition. Some manufacturer detail is missing in methods, and referring to prior work is not sufficient to empower readers to replicate.

      We agree, and have added the following to the methods section:

      “BRSK1 and 2 were sub-cloned into pDest vectors (to encode the expression of N-terminal Flag or HA tagged proteins) using the Gateway LR Clonase II system (Invitrogen) according to the manufacturer’s instructions. pENtR BRSK1/2 clones were obtained in the form of Gateway-compatible donor vectors from Dr Ben Major (Washington University in St. Louis). The Gateway LR Clonase II enzyme mix mediates recombination between the attL sites on the Entry clone and the attR sites on the destination vector. All cloned BRSK1/2 genes were fully sequenced prior to use.”

      Page 7 - optimal settings should be reported. How were pTau signals quantified and normalised?

      We have added the following to the methods section:

      “Two-color Western blot detection method employing infrared fluorescence was used to measure the ratio of Tau phospho serine 262 to total Tau. Total GFP Tau was detected using a mouse anti GFP antibody and visualized at 680 nm using goat anti mouse IRdye 680 while phospho-tau was detected using a Tau phospho serine 262 specific antibody and visualized at 800 nm using goat anti rabbit IRdye 800. Imaging was performed using a Licor Odessey Clx with scan control settings set to 169 μm, medium quality, and 0.0 mm distance. Quantification was performed using Licor image studio on the raw image files. Total Tau to phospho Tau ratio was determined by measuring the ratio of the fluorescence intensities measured at 800 nm (pTau) to those at 680 nm (total tau).”

      In the Figure 6g-j legend, the salt bridge is incorrectly annotated as E185-R248 rather than 258.

      Fixed

      Lines 393-395 provides a repeat statement on BRSKs phosphorylating Tau (from 388-389).

      We have removed the repetition and reworded the opening lines of the results section to improve the overall flow of the manuscript.

      Supp. Figure 1 is difficult to view - would it be possible to increase the size of the phylogenetic analysis?

      We thank the reviewer for this observation. We have rotated (90°) and expanded the figure so that it can be more clearly viewed

      Supp. Figure 2 - BRSK1/2 incorrectly spelled.

      Fixed

      Please check the alignment of labels in Supp. Figure 3e.

      Fixed

      Reviewer #2 (Recommendations For The Authors):

      (1) In Figure 1, current panel b is not mentioned/described in the figure legend and as a consequence, the rest of the panels in the legends do not fit the content of the figure.

      Reviewer 1 also noted this error, and we have amended the manuscript accordingly.

      What is the rationale for using the HEK293T cells as the main experimental/cellular system? Are there cell lines that express both proteins endogenously so that the authors can recapitulate the results obtained from ectopic overexpression?

      The selection of HEK-293T cells was driven by their well-established utility in overexpression studies, which make them ideal for the investigation of protein interactions and redox regulation. This cell line's robust transfection efficiency and well-characterized biology provide a reliable platform for dissecting the molecular mechanisms underlying the redox regulation of proteins. Furthermore, the use of HEK-293T cells aligns with the broader scientific practice, serving as a common ground for comparability with existing literature in the field of BRSK1/2 signaling, protein regulation and interaction studies.

      The application of HEK-293T cells as a model system in our study serves as a foundational step towards eventually elucidating the functions of BRSK1/2 in neuronal cells, where these kinases are predominantly expressed and play critical roles. Given the fact that BRSKs are classed as ‘understudied’ kinases, the choice of a HEK-293T co-overexpression system allowed us to analyze the direct effects of BRSK kinase activity (using phosphorylation of Tau as a readout) in a cellular context and in more controlled manner. This approach not only aids in the establishment of a baseline understanding of the redox regulation of BRSK1/2, but also sets the stage for subsequent investigations in more physiologically relevant neuronal models

      In current panel d, could the authors recapitulate the same experimental conditions as in current panel c?

      Figure 1 panel c shows that both BRSK1 and 2 are reversibly inhibited by oxidizing agents such as H2O2, whilst panels d and e show the concentration dependent activation and inhibition of the BRSKs with increasing concentrations of DTT and H2O2 respectively. The experimental conditions were identical, other than changing amounts of reducing and oxidizing agents, and used the same peptide coupled assays. Data for all experiments were originally collected in ‘real time’ as depicted in Fig 1c (increase in substrate phosphorylation over time). However, to aid interpretation of the data, we elected to present the latter two panels as dose response curves by calculating the change in the rate of enzyme activity (shown as pmol phosphate incorporated into the peptide substrate per min) for each condition. To aid the reader, we now include an additional supplementary figure (new supplementary figure 2) depicting BRSK1 and 2 dependent phosphorylation of the peptide substrate in the presence of different concentrations of DTT and H2O2 in a real time (kinetic) assay. The new data shown is a subset of the unprocessed data that was used to calculate the rates of BRSK activity in Fig 1d & e.

      Why did the authors use full-length constructs in these experiments and did not in e.g. Figure 2 where they used KD constructs instead?

      In the initial experiments, illustrated in Figure 1, we employed full-length protein constructs to establish a proof of concept, demonstrating the overall behavior and interactions of the proteins in their full-length form. This confirmed that BRSK1 & 2, which both contain a conserved T + 2 Cys residue that is frequently prognostic for redox sensitivity in related kinases, displayed a near-obligate requirement for reducing agents to promote kinase activity.  

      Subsequently, in Figure 2, our focus shifted towards delineating the specific regions within the proteins that are critical for redox regulation. By using constructs that encompass only the kinase domain, we aimed to demonstrate that the redox-sensitive regulation of these proteins is predominantly mediated by specific cysteine residues located within the kinase domain itself. This strategic use of the kinase domain of the protein allowed for a more targeted investigation. Furthermore, in our hands these truncated forms of the protein were more stable at higher concentrations, enabling more detailed characterization of the proteins by DSF and SEC-MALS. We predict that the flanking disordered regions of the full-length protein (as predicted by AlphaFold) contribute to this effect.

      (2) In Figure 2, Did the authors try to do LC/MS-MS in the same experimental conditions as in Figure 1 (e.g. buffer minus/plus DTT, H2O2, H2O2 + DTT)?

      We would like to clarify that the mass spectrometry experiments were conducted exclusively on proteins purified under native (non-reducing) conditions. We did not extend the LC/MS-MS analyses to include proteins treated with various buffer conditions such as minus/plus DTT, H2O2, or H2O2 + DTT as used in the experiments depicted in Figure 1. Given that we could readily detect disulfides in the absence of oxidizing agents, we did not see the benefit of additional treatment conditions as peroxide treatment of protein samples can frequently complicate interpretation of MS data. However, it should be noted that prior to MS analysis, tryptic peptides were subjected to a 50:50 split, with one half alkylated in the presence of DTT (as described in the methods section) to eliminate disulfides and other transiently oxidized Cys forms. Comparative analysis between reduced and non-reduced tryptic peptides improved our confidence when assigning disulfide bonds (which were eliminated in identical peptides in the presence of DTT).

      On panel b, why did the authors show alphafold predictions and not empiric structural information (e.g. X-ray, EM,..)?

      The AlphaFold models were primarily utilized to map the general locations of redox-sensitive cysteine pairs within the proteins of interest. Although we have access to the crystal structure of mouse BRSK2, they do not fully capture the active conformation seen in the Alphafold model of the human version. The use of AlphaFold models for human proteins in this study aids in consistently tracking residue numbering across the manuscript, offering a useful framework for understanding the spatial arrangement of these critical cysteine pairs in their potentially active-like states. This approach facilitates our analysis and discussion by providing a reference for the structural context of these residues in the human proteins.

      What was the rationale for using the KD construct and not the FL as in Figure 1?

      The rationale to use the kinase domain was primarily based on the significantly lower confidence in the structural predictions for regions outside the kinase domain (KD). Our experimental focus was to investigate the role of conserved cysteine residues within the kinase domain, which are critical for the protein's function and regulation. This targeted approach allowed us to concentrate our analyses on the most functionally relevant and structurally defined portion of the protein, thereby enhancing the precision and relevance of our findings. As is frequently the case, truncated forms of the protein, consisting only of the kinase domain, are much more stable than their full length counterparts and are therefore more amenable to in vitro biochemical analysis. In our hands this was true for both BRSK1 and 2, and as such much of the data collected here was generated using kinase-domain (KD) constructs. Simulations using the KD structures are therefore much more representative of our original experimental setup.

      The BSK1 KD construct appears to be rather inactive and not responsive to DTT treatment. Could the authors comment on the differences observed with the FL construct of Figure 1

      It is important to note that BRSK1, in general, exhibits lower intrinsic activity compared to BRSK2. This reduced activity could be attributed to a range of factors, including the need for activation by upstream kinases such as LKB1, as well as potential post-translational modifications (PTMs) that may be absent in the bacterially expressed KD construct. The full-length forms of the protein were purified from Sf21 cells, and as such may have additional modifications that are lacking in the bacterially derived KD counterparts. We also cannot discount additional regulatory roles of the regions that flank the KD, and these may contribute in part to the modest discrepancy observed between constructs.  Despite these differences, it is crucial to emphasize that both the KD and FL constructs of BRSK1 are regulated by DTT, indicating a conserved redox-dependent activation for both of the related BRSK proteins.  

      (3) In Figure 4, on panel A wouldn´t the authors expect that mutating on the pairs e.g. C198A in BSK1 would have the same effect as mutating the C191 from the T+2 site? Did they try mutating individual sites of the aE/CHRD pair? The same will apply to BSK2

      We appreciate the insightful comment. It's important to clarify that the redox regulation of these proteins is influenced not solely by the formation of disulfide bonds but also by the oxidation state of individual cysteine residues, particularly the T+2 Cys. This nuanced mechanism of regulation allows for a diverse range of functional outcomes based on the specific cysteine involved and its state of oxidation. This aspect forms a key finding of our paper, highlighting the complexity of redox regulation beyond mere disulfide bond formation. For example, AURA kinase activity is regulated by oxidation of a single T+2 Cys (Cys290, equivalent to Cys191 and Cys176 of BRSK1 and 2 respectively), but this regulation can be supplemented through artificial incorporation of a secondary Cys at the DFG+2 position (Byrne et al., 2020). This targeted genetic modification or AURA mirrors equivalent regulatory disulfide-forming Cys pairs that naturally occur in kinases such as AKT and MELK, and which provide an extra layer of regulatory fine tuning (and a possible protective role to prevent deleterious over oxidation) to the T+2 Cys. We surmise that the CPE Cys is also an accessory regulatory element to the T+2 Cys in BRSK1 +2, which is the dominant driver of BRSK redox sensitivity (as judged by the fact that CPE Cys mutants are still potently regulated by redox [Fig 4]), by locking it in an inactive disulfide configuration.

      In our preliminary analysis of BRSK1, we observed that mutations of individual sites within the aE/CHRD pair was similarly detrimental to kinase activity as a tandem mutation (see reviewer figure 1). As discussed in the manuscript, we think that these Cys may serve important structural regulatory functions and opted to focus on co-mutations of the aE/CHRD pair for the remainder of our investigation.

      Author response image 1.

      In vitro kinase assays showing rates of in vitro peptide phosphorylation by WT and Cys-to-Ala (aE/CHRD residues) variants of BRSK1 after activation by LKB1.

      In panels C and D, the same experimental conditions should have been measured as in A and B.

      Panels A and B were designed to demonstrate the enzymatic activity and the response to DTT treatment to establish the baseline redox regulation of the kinase and a panel of Cys-to-Ala mutant variants. In contrast, panels C and D were specifically focused on rescue experiments with mutants that showed a significant effect under the conditions tested in A and B. These panels were intended to further explore the role of redox regulation in modulating the activity of these mutants, particularly those that retained some level of activity or exhibited a notable response to redox changes.

      The rationale for this experimental design was to prioritize the investigation of mutants, such as those at the T+2 and CPE cysteine sites, which provided the most insight into the redox-dependent modulation of kinase activity. Other mutants, which resulted in inactivation, were deprioritized in this context as they offered limited additional information regarding the redox regulation mechanism. This focused approach allowed us to delve deeper into understanding how specific cysteine residues contribute to the redox-sensitive control of kinase function, aligning with the overall objective of elucidating the nuanced roles of redox regulation in kinase activity.

      (4) In figure 5: Why did the authors use reduced Glutathione instead of DTT? The authors should have recapitulated the same experimental conditions as in Figure 4 and not focused only on the T+2 or the CPE single mutants but using the double and the aE/CHRD mutants as well, as internal controls and validation of the enzymatic assays using the modified peptide

      Regarding the use of reduced glutathione (GSH) instead of DTT in Figure 5, we chose GSH for its well characterized biological relevance as an antioxidant in cellular responses to oxidative stress. Furthermore, while DTT has been widely used in experimental setups, it is also potentially cytotoxic at high concentrations.

      Addressing the point on experimental consistency with Figure 4, we appreciate the suggestion and indeed had already conducted such experiments (Previously Supp Fig 3, now changed to current Supp Fig 4). These experiments include analyses of BRSK mutant activity in a HEK-293T model. However, we chose not to focus on inactivating mutants (such as the aE/CHRD mutants which had depleted expression levels possibly as a consequence of compromised structural integrity) or pursue the generation of double mutant CMV plasmids, as these were deemed unlikely to add significant insights into the core narrative of our study. Our focus remained on the mutants that yielded the most informative results regarding the redox regulation mechanisms in the in vitro setting, ensuring a clear and impactful presentation of our findings.

      A time course evaluation of the reducing or oxidizing reagents should have been performed. Would we expect that in WT samples, and in the presence of GSH, and also in the case of the CPE mutant, an increment in the levels of Tau phosphorylation as a readout of BSK1-2 activity?

      We acknowledge the importance of such analyses in understanding the dynamic nature of redox regulation on kinase activity and have included a time course (Supp Fig 2 e-g). These results confirm a depletion of Tau phosphorylation over time in response to peroxide generated by the enzyme glucose oxidase.

      (5) In Figure 6, did the authors look at the functional impact of the residues with which interact the T+2 and the CPE motifs e.g. T174 and the E185-R258 tether?

      Our primary focus was on the salt bridges, as this is a key regulatory structural feature that is conserved across many kinases. Regarding the additional interactions mentioned, we have thoroughly evaluated their roles and dynamics through molecular dynamics (MD) simulations but did not find any results of significant relevance to warrant inclusion.

      (6) In Figure 7: Did the author look at the oligomerization state of the BSK1-2 multimers under non-reducing conditions? Were they also observed in the case of the FL constructs? What was the stoichiometry?

      Our current work indicates that the kinase domain of BRSK1-2 primarily exists in a monomeric state, with some evidence of dimerization or multimer formation under specific conditions. Our SEC-MALS (Supp Fig 6) and SDS-PAGE analysis (Figure 7) clearly demonstrates that monomers are overwhelmingly the dominant species under non-reducing conditions (>90 %). We also conclude that these limited oligomeric species can be removed by inclusion of reducing agents such as DTT (Figure 7), which may suggest a role for a Cys residue(s). Notably, removal of the T+2 Cys was insufficient to prevent multimerization.

      We were unable to obtain reliable SEC-MALS data for the full-length forms of the protein, likely due to the presence of disordered regions that flank the kinase domain which results in a highly heterodispersed and unstable preparation (at the concentrations required for SEC-MALS). Although we are therefore unable to comment on the stoichiometry of FL BRSK dimers, we can detect BRSK1 and 2 hetero- and homo-complexes in HEK-293T cells by IP, which supports the existence of limited BRSK1 & 2 dimers (Supp Fig 6a). However, we were unable to detect intermolecular disulfide bonds by MS, although this does not necessarily preclude their existence. The physiological role of BRSK multimerization (if any) and establishing specifically which Cys residues drive this phenomenon is of significant interest to our future investigations.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to Reviewer #1:

      Thank you for the careful reading and the positive evaluation of our manuscript. As you mentioned, the present study tried to address the question of how the lost genomic functions could be compensated by evolutionary adaptation, indicating the potential mechanism of "constructive" rather than "destructive" evolution. Thank you for the instructive comments that helped us to improve the manuscript. We sincerely hope the revised manuscript and the following point-to-point response meet your concerns.

      • Line 80 "Growth Fitness" is this growth rate?

      Yes. The sentence was revised as follows.

      (L87-88) “The results demonstrated that most evolved populations (Evos) showed improved growth rates, in which eight out of nine Evos were highly significant (Fig. 1B, upper).”

      • Line 94 a more nuanced understanding of r/K selection theory, allows for trade-ups between R and K, as well as trade-offs. This may explain why you did not see a trade-off between growth and carrying capacity in this study. See this paper https://doi.org/10.1038/s41396-023-01543-5. Overall, your evos lineages evolved higher growth rates and lower carrying capacity (Figures 1B, C, E). If selection was driving the evolution of higher growth rates, it may have been that there was no selective pressure to maintain high carrying capacity. This means that the evolutionary change you observed in carrying capacity may have been neutral "drift" of the carrying capacity trait, during selection for growth rate, not because of a trade-off between R and K. This is especially likely since carrying capacity declined during evolution. Unless the authors have convincing evidence for a tradeoff, I suggest they remove this claim.

      • Line 96 the authors introduce a previous result where they use colony size to measure growth rate, this finding needs to be properly introduced and explained so that we can understand the context of the conclusion.

      • Line 97 This sentence "the collapse of the trade-off law likely resulted from genome reduction." I am not sure how the authors can draw this conclusion, what is the evidence supporting that the genome size reduction causes the breakdown of the tradeoff between R and K (if there was a tradeoff)?

      Thank you for the reference information and the thoughtful comments. The recommended paper was newly cited, and the description of the trade-off collapse was deleted. Accordingly, the corresponding paragraph was rewritten as follows.

      (L100-115) “Intriguingly, a positive correlation was observed between the growth fitness and the carrying capacity of the Evos (Fig. 1D). It was somehow consistent with the positive correlations between the colony growth rate and the colony size of a genome-reduced strain 11 and between the growth rates and the saturated population size of an assortment of genome reduced strains 13. Nevertheless, the negative correlation between growth rate and carrying capacity, known as the r/K selection30,31 was often observed as the trade-off relationship between r and K in the evolution and ecology studies 32 33,34. As the r/K trade-off was proposed to balance the cellular metabolism that resulted from the cost of enzymes involved 34, the deleted genes might play a role in maintaining the metabolism balance for the r/K correlation. On the other hand, the experimental evolution (i.e., serial transfer) was strictly performed within the exponential growth phase; thus, the evolutionary selection was supposed to be driven by the growth rate without selective pressure to maintain the carrying capacity. The declined carrying capacity might have been its neutral "drift" but not a trade-off to the growth rate. Independent and parallel experimental evolution of the reduced genomes selecting either r or K is required to clarify the actual mechanisms.”

      • Line 103 Genome mutations. The authors claim that there are no mutations in parallel but I see that there is a 1199 base pair deletion in eight of the nine evo strains (Table S3). I would like the author to mention this and I'm actually curious about why the authors don't consider this parallel evolution.

      Thank you for your careful reading. According to your comment, we added a brief description of the 1199-bp deletion detected in the Evos as follows.

      (L119-122) “The number of mutations largely varied among the nine Evos, from two to 13, and no common mutation was detected in all nine Evos (Table S3). A 1,199-bp deletion of insH was frequently found in the Evos (Table S3, highlighted), which well agreed with its function as a transposable sequence.”

      • Line 297 Please describe the media in full here - this is an important detail for the evolution experiment. Very frustrating to go to reference 13 and find another reference, but no details of the method. Looked online for the M63 growth media and the carbon source is not specified. This is critical for working out what selection pressures might have driven the genetic and transcriptional changes that you have measured. For example, the parallel genetic change in 8/9 populations is a deletion of insH and tdcD (according to Table S3). This is acetate kinase, essential for the final step in the overflow metabolism of glucose into acetate. If you have a very low glucose concentration, then it could be that there was selection to avoid fermentation and devote all the pyruvate that results from glycolysis into the TCA cycle (which is more efficient than fermentation in terms of ATP produced per pyruvate).

      Sorry for the missing information on the medium composition, which was additionally described in the Materials and Methods. The glucose concentration in M63 was 22 mM, which was supposed to be enough for bacterial growth. Thank you for your intriguing thinking about linking the medium component to the genome mutation-mediated metabolic changes. As there was no experimental result regarding the biological function of gene mutation in the present study, please allow us to address this issue in our future work.

      (L334-337) “In brief, the medium contains 62 mM dipotassium hydrogen phosphate, 39 mM potassium dihydrogen phosphate, 15 mM ammonium sulfate, 15 μM thiamine hydrochloride, 1.8 μM Iron (II) sulfate, 0.2 mM magnesium sulfate, and 22 mM glucose.”

      • Line 115. I do not understand this argument "They seemed highly related to essentiality, as 11 out of 49 mutated genes were essential (Table S3)." Is this a significant enrichment compared to the expectation, i.e. the number of essential genes in the genome? This enrichment needs to be tested with a Hypergeometric test or something similar.

      • Also, "As the essential genes were known to be more conserved than nonessential ones, the high frequency of the mutations fixed in the essential genes suggested the mutation in essentiality for fitness increase was the evolutionary strategy for reduced genome." I do not think that there is enough evidence to support this claim, and it should be removed.

      Sorry for the unclear description. Yes, the mutations were significantly enriched in the essential genes (11 out of 45 genes) compared to the essential genes in the whole genome (286 out of 3290 genes). The improper description linking the mutation in essential genes to the fitness increase was removed, and an additional explanation on the ratio of essential genes was newly supplied as follows.

      (L139-143) “The ratio of essential genes in the mutated genes was significantly higher than in the total genes (286 out of 3290 genes, Chi-square test p=0.008). As the essential genes were determined according to the growth35 and were known to be more conserved than nonessential ones 36,37, the high frequency of the mutations fixed in the essential genes was highly intriguing and reasonable.”

      • Line 124 Regarding the mutation simulations, I do not understand how the observed data were compared to the simulated data, and how conclusions were drawn. Can the authors please explain the motivation for carrying out this analysis, and clearly explain the conclusions?

      Random simulation was additionally explained in the Materials and Methods and the conclusion of the random simulation was revised in the Results, as follows.

      (L392-401) “The mutation simulation was performed with Python in the following steps. A total of 65 mutations were randomly generated on the reduced genome, and the distances from the mutated genomic locations to the nearest genomic scars caused by genome reduction were calculated. Subsequently, Welch's t-test was performed to evaluate whether the distances calculated from the random mutations were significantly longer or shorter than those calculated from the mutations that occurred in Evos. The random simulation, distance calculation, and statistic test were performed 1,000 times, which resulted in 1,000 p values. Finally, the mean of p values (μp) was calculated, and a 95% reliable region was applied. It was used to evaluate whether the 65 mutations in the Evos were significantly close to the genomic scars, i.e., the locational bias.”

      (L148-157) “Random simulation was performed to verify whether there was any bias or hotspot in the genomic location for mutation accumulation due to the genome reduction. A total of 65 mutations were randomly generated on the reduced genome (Fig. 2B), and the genomic distances from the mutations to the nearest genome reduction-mediated scars were calculated. Welch's t-test was performed to evaluate whether the genomic distances calculated from random mutations significantly differed from those from the mutations accumulated in the Evos. As the mean of p values (1,000 times of random simulations) was insignificant (Fig. 2C, μp > 0.05), the mutations fixed on the reduced genome were either closer or farther to the genomic scars, indicating there was no locational bias for mutation accumulation caused by genome reduction.”

      • Line 140 The authors should give some background here - explain the idea underlying chromosomal periodicity of the transcriptome, to help the reader understand this analysis.

      • Line 142 Here and elsewhere, when referring to a method, do not just give the citation, but also refer to the methods section or relevant supplementary material.

      The analytical process (references and methods) was described in the Materials and Methods, and the reason we performed the chromosomal periodicity was added in the Results as follows.

      (L165-172) “As the E. coli chromosome was structured, whether the genome reduction caused the changes in its architecture, which led to the differentiated transcriptome reorganization in the Evos, was investigated. The chromosomal periodicity of gene expression was analyzed to determine the structural feature of genome-wide pattern, as previously described 28,38. The analytical results showed that the transcriptomes of all Evos presented a common six-period with statistical significance, equivalent to those of the wild-type and ancestral reduced genomes (Fig. 3A, Table S4).”

      • Line 151 "The expression levels of the mutated genes were higher than those of the remaining genes (Figure 3B)"- did this depend on the type of mutation? There were quite a few early stops in genes, were these also more likely to be expressed? And how about the transcriptional regulators, can you see evidence of their downstream impact?

      Sorry, we didn't investigate the detailed regulatory mechanisms of 49 mutated genes, which was supposed to be out of the scope of the present study. Fig. 3B was the statistical comparison between 3225 and 49 genes. It didn't mean that all mutated genes expressed higher than the others. The following sentences were added to address your concern.

      (L181-185) “As the regulatory mechanisms or the gene functions were supposed to be disturbed by the mutations, the expression levels of individual genes might have been either up- or down-regulated. Nevertheless, the overall expression levels of all mutated genes tended to be increased. One of the reasons was assumed to be the mutation essentiality, which remained to be experimentally verified.”

      • Line 199 onward. The authors used WGCNA to analyze the gene expression data of evolved organisms. They identified distinct gene modules in the reduced genome, and through further analysis, they found that specific modules were strongly associated with key biological traits like growth fitness, gene expression changes, and mutation rates. Did the authors expect that there was variation in mutation rate across their populations? Is variation from 3-16 mutations that they observed beyond the expectation for the wt mutation rate? The genetic causes of mutation rate variation are well understood, but I could not see any dinB, mutT,Y, rad, or pol genes among the discovered mutations. I would like the authors to justify the claim that there was mutation rate variation in the evolved populations.

      Thank you for the intriguing thinking. We don't think the mutation rates were significantly varied across the nine populations, as no mutation occurred in the MMR genes, as you noticed. Our previous study showed that the spontaneous mutation rate of the reduced genome was higher than that of the wild-type genome (Nishimura et al., 2017, mBio). As nonsynonymous mutations were not detected in all nine Evos, the spontaneous mutation rate couldn't be calculated (because it should be evaluated according to the ratio of nonsynonymous and synonymous single-nucleotide substitutions in molecular evolution). Therefore, discussing the mutation rate in the present study was unavailable. The following sentence was added for a better understanding of the gene modules.

      (L242-245) “These modules M2, M10 and M16 might be considered as the hotspots for the genes responsible for growth fitness, transcriptional reorganization, and mutation accumulation of the reduced genome in evolution, respectively.”

      • Line 254 I get the idea of all roads leading to Rome, which is very fitting. However, describing the various evolutionary strategies and homeostatic and variable consequence does not sound correct - although I am not sure exactly what is meant here. Looking at Figure 7, I will call strategy I "parallel evolution", that is following the same or similar genetic pathways to adaptation and strategy ii I would call divergent evolution. I am not sure what strategy iii is. I don't want the authors to use the terms parallel and divergent if that's not what they mean. My request here would be that the authors clearly describe these strategies, but then show how their results fit in with the results, and if possible, fit with the naming conventions, of evolutionary biology.

      Thank you for your kind consideration and excellent suggestion. It's our pleasure to adopt your idea in tour study. The evolutionary strategies were renamed according to your recommendation. Both the main text and Fig. 7 were revised as follows.

      (L285-293) “Common mutations22,44 or identical genetic functions45 were reported in the experimental evolution with different reduced genomes, commonly known as parallel evolution (Fig. 7, i). In addition, as not all mutations contribute to the evolved fitness 22,45, another strategy for varied phenotypes was known as divergent evolution (Fig. 7, ii). The present study accentuated the variety of mutations fixed during evolution. Considering the high essentiality of the mutated genes (Table S3), most or all mutations were assumed to benefit the fitness increase, partially demonstrated previously 20. Nevertheless, the evolved transcriptomes presented a homeostatic architecture, revealing the divergent to convergent evolutionary strategy (Fig. 7, iii).”

      Author response image 1.

      • Line 327 Growth rates/fitness. I don't think this should be called growth fitness- a rate is being calculated. I would like the authors to explain how the times were chosen - do the three points have to be during the log phase? Can you also explain what you mean by choosing three ri that have the largest mean and minor variance?

      Sorry for the confusing term usage. The fitness assay was changed to the growth assay. Choosing three ri that have the largest mean and minor variance was to avoid the occasional large values (blue circle), as shown in the following figure. In addition, the details of the growth analysis can be found at https://doi.org/10.3791/56197 (ref. 59), where the video of experimental manipulation, protocol, and data analysis is deposited. The following sentence was added in accordance.

      Author response image 2.

      (L369-371) “The growth rate was determined as the average of three consecutive ri, showing the largest mean and minor variance to avoid the unreliable calculation caused by the occasionally occurring values. The details of the experimental and analytical processes can be found at https://doi.org/10.3791/56197.”

      • Line 403 Chromosomal periodicity analysis. The windows chosen for smoothing (100kb) seem big. Large windows make sense for some things - for example looking at how transcription relates to DNA replication timing, which is a whole-genome scale trend. However, here the authors are looking for the differences after evolution, which will be local trends dependent on specific genes and transcription factors. 100kb of the genome would carry on the order of one hundred genes and might be too coarse-grained to see differences between evos lineages.

      Thank you for the advice. We agree that the present analysis focused on the global trend of gene expression. Varying the sizes may lead to different patterns. Additional analysis was performed according to your comment. The results showed that changes in window size (1, 10, 50, 100, and 200 kb) didn't alter the periodicity of the reduced genome, which agreed with the previous study on a different reduced genome MDS42 of a conserved periodicity (Ying et al., 2013, BMC Genomics). The following sentence was added in the Materials and Methods.

      (L460-461) “Note that altering the moving average did not change the max peak.”

      • Figures - the figures look great. Figure 7 needs a legend.

      Thank you. The following legend was added.

      (L774-777) “Three evolutionary strategies are proposed. Pink and blue arrowed lines indicate experimental evolution and genome reduction, respectively. The size of the open cycles represents the genome size. Black and grey indicate the ancestor and evolved genomes, respectively.”

      Response to Reviewer #2:

      Thank you for reviewing our manuscript and for your fruitful comments. We agree that our study leaned towards elaborating observed findings rather than explaining the detailed biological mechanisms. We focused on the genome-wide biological features rather than the specific biological functions. The underlying mechanisms indeed remained unknown, leaving the questions as you commented. We didn't perform the fitness assay on reconstituted (single and combinatorial) mutants because the research purpose was not to clarify the regulatory or metabolic mechanisms. It's why the RNA-Seq analysis provided the findings on genome-wide patterns and chromosomal view, which were supposed to be biologically valuable. We did understand your comments and complaints that the conclusions were biologically meaningless, as ALE studies that found the specific gene regulation or improved pathway was the preferred story in common, which was not the flow of the present study.

      For this reason, our revision may not address all these concerns. Considering your comments, we tried our best to revise the manuscript. The changes made were highlighted. We sincerely hope the revision and the following point-to-point response are acceptable.

      Major remarks:

      (1) The authors outlined the significance of ALE in genome-reduced organisms and important findings from published literature throughout the Introduction section. The description in L65-69, which I believe pertains to the motivation of this study, seems vague and insufficient to convey the novelty or necessity of this study i.e. it is difficult to grasp what aspects of genome-reduced biology that this manuscript intends to focus/find/address.

      Sorry for the unclear writing. The sentences were rewritten for clarity as follows.

      (L64-70) “Although the reduced growth rate caused by genome reduction could be recovered by experimental evolution, it remains unclear whether such an evolutionary improvement in growth fitness was a general feature of the reduced genome and how the genome-wide changes occurred to match the growth fitness increase. In the present study, we performed the experimental evolution with a reduced genome in multiple lineages and analyzed the evolutionary changes of the genome and transcriptome.”

      (2) What is the rationale behind the lineage selection described in Figure S1 legend "Only one of the four overnight cultures in the exponential growth phase (OD600 = 0.01~0.1) was chosen for the following serial transfer, highlighted in red."?

      The four wells (cultures of different initial cell concentrations) were measured every day, and only the well that showed OD600=0.01~0.1 (red) was transferred with four different dilution rates (e.g., 10, 100, 1000, and 10000 dilution rates). It resulted in four wells of different initial cell concentrations. Multiple dilutions promised that at least one of the wells would show the OD600 within the range of 0.01 to 0.1 after the overnight culture. They were then used for the next serial transfer. Fig. S1 provides the details of the experimental records. The experimental evolution was strictly controlled within the exponential phase, quite different from the commonly conducted ALE that transferred a single culture in a fixed dilution rate. Serial transfer with multiple dilution rates was previously applied in our evolution experiments and well described in Nishimura et al., 2017, mBio; Lu et al., 2022, Comm Biol; Kurokawa et al., 2022, Front Microbiol, etc. The following sentence was added in the Materials and Methods.

      (L344-345) “Multiple dilutions changing in order promised at least one of the wells within the exponential growth phase after the overnight culture.”

      (3) The measured growth rate of the end-point 'F2 lineage' shown in Figure S2 seemed comparable to the rest of the lineages (A1 to H2), but the growth rate of 'F2' illustrated in Figure 1B indicates otherwise (L83-84). What is the reason for the incongruence between the two datasets?

      Sorry for the unclear description. The growth rates shown in Fig. S2 were obtained during the evolution experiment using the daily transfer's initial and final OD600 values. The growth rates shown in Fig. 1B were obtained from the final population (Evos) growth assay and calculated from the growth curves (biological replication, N=4). Fig. 1B shows the precisely evaluated growth rates, and Fig. S2 shows the evolutionary changes in growth rates. Accordingly, the following sentence was added to the Results.

      (L84-87) “As the growth increases were calculated according to the initial and final records, the exponential growth rates of the ancestor and evolved populations were obtained according to the growth curves for a precise evaluation of the evolutionary changes in growth.”

      (4) Are the differences in growth rate statistically significant in Figure 1B?

      Eight out of nine Evos were significant, except F2. The sentences were rewritten and associated with the revised Fig. 1B, indicating significance.

      (L87-90) “The results demonstrated that most evolved populations (Evos) showed improved growth rates, in which eight out of nine Evos were highly significant (Fig. 1B, upper). However, the magnitudes of growth improvement were considerably varied, and the evolutionary dynamics of the nine lineages were somehow divergent (Fig. S2).”

      (5) The evolved lineages showed a decrease in their maximal optical densities (OD600) compared to the ancestral strain (L85-86). ALE could accompany changes in cell size and morphologies, (doi: 10.1038/s41586-023-06288-x; 10.1128/AEM.01120-17), which may render OD600 relatively inaccurate for cell density comparison. I suggest using CFU/mL metrics for the sake of a fair comparison between Anc and Evo.

      The methods evaluating the carrying capacity (i.e., cell density, population size, etc.) do not change the results. Even using CFU is unfair for the living cells that can not form colonies and unfair if the cell size changes. Optical density (OD600) provides us with the temporal changes of cell growth in a 15-minute interval, which results in an exact evaluation of the growth rate in the exponential phase. CFU is poor at recording the temporal changes of population changes, which tend to result in an inappropriate growth rate. Taken together, we believe that our method was reasonable and reliable. We hope you can accept the different way of study.

      (6) Please provide evidence in support of the statement in L115-119. i.e. statistical analysis supporting that the observed ratio of essential genes in the mutant pool is not random.

      The statistic test was performed, and the following sentence was added.

      (L139-141) “The ratio of essential genes in the mutated genes was significantly higher than in the total genes (286 out of 3290 genes, Chi-square test p=0.008).”

      (7) The assumption that "mutation abundance would correlate to fitness improvement" described in L120-122: "The large variety in genome mutations and no correlation of mutation abundance to fitness improvement strongly suggested that no mutations were specifically responsible or crucially essential for recovering the growth rate of the reduced genome" is not easy to digest, in the sense that (i) the effect of multiple beneficial mutations are not necessarily summative, but are riddled with various epistatic interactions (doi: 10.1016/j.mec.2023.e00227); (ii) neutral hitchhikers are of common presence (you could easily find reference on this one); (iii) hypermutators that accumulate greater number of mutations in a given time are not always the eventual winners in competition games (doi: 10.1126/science.1056421). In this sense, the notion that "mutation abundance correlates to fitness improvement" in L120-122 seems flawed (for your perusal, doi: 10.1186/gb-2009-10-10-r118).

      Sorry for the improper description and confusing writing, and thank you for the fruitful knowledge on molecular evolution. The sentence was deleted, and the following one was added.

      (L145-146) “Nevertheless, it was unclear whether and how these mutations were explicitly responsible for recovering the growth rate of the reduced genome.”

      (8) Could it be possible that the large variation in genome mutations in independent lineages results from a highly rugged fitness landscape characterized by multiple fitness optima (doi: 10.1073/pnas.1507916112)? If this is the case, I disagree with the notion in L121-122 "that no mutations were specifically responsible or crucially essential" It does seem to me that, for example, the mutations in evo A2 are specifically responsible and essential for the fitness improvement of evo A2 in the evolutionary condition (M63 medium). Fitness assessment of individual (or combinatorial) mutants reconstituted in the Ancestral background would be a bonus.

      Thank you for the intriguing thinking. The sentence was deleted. Please allow us to adapt your comment to the manuscript as follows.

      (L143-145) “The large variety of genome mutations fixed in the independent lineages might result from a highly rugged fitness landscape 38.”

      (9) L121-122: "...no mutations were specifically responsible or crucially essential for recovering the growth rate of the reduced genome". Strictly speaking, the authors should provide a reference case of wild-type E. coli ALE in order to reach definitive conclusions that the observed mutation events are exclusive to the genome-reduced strain. It is strongly recommended that the authors perform comparative analysis with an ALEed non-genome-reduced control for a more definitive characterization of the evolutionary biology in a genome-reduced organism, as it was done for "JCVI-syn3.0B vs non-minimal M. mycoides" (doi: 10.1038/s41586-023-06288-x) and "E. coli eMS57 vs MG1655" (doi: 10.1038/s41467-019-08888-6).

      The improper description was deleted in response to comments 7 and 8. The mentioned references were cited in the manuscript (refs 21 and 23). Thank you for the experimental advice. We are sorry that the comparison of wild-type and reduced genomes was not in the scope of the present study and will probably be reported soon in our future work.

      (10) L146-148: "The homeostatic periodicity was consistent with our previous findings that the chromosomal periodicity of the transcriptome was independent of genomic or environmental variation" A Previous study also suggested that the amplitudes of the periodic transcriptomes were significantly correlated with the growth rates (doi: 10.1093/dnares/dsaa018). Growth rates of 8/9 Evos were higher compared to Anc, while that of Evo F2 remained similar. Please comment on the changes in amplitudes of the periodic transcriptomes between Anc and each Evo.

      Thank you for the suggestion. The correlation between the growth rates and the amplitudes of chromosomal periodicity was statistically insignificant (p>0.05). It might be a result of the limited data points. Compared with the only nine data points in the present study, the previous study analyzed hundreds of transcriptomes associated with the corresponding growth rates, which are suitable for statistical evaluation. In addition, the changes in growth rates were more significant in the previous study than in the present study, which might influence the significance. It's why we did not discuss the periodic amplitude.

      (11) Please elaborate on L159-161: "It strongly suggested the essentiality mutation for homeostatic transcriptome architecture happened in the reduced genome.".

      Sorry for the improper description. The sentence was rewritten as follows.

      (L191-193) “The essentiality of the mutations might have participated in maintaining the homeostatic transcriptome architecture of the reduced genome.”

      (12) Is FPKM a valid metric for between-sample comparison? The growing consensus in the community adopts Transcripts Per Kilobase Million (TPM) for comparing gene expression levels between different samples (Figure 3B; L372-379).

      Sorry for the unclear description. The FPKM indicated here was globally normalized, statistically equivalent to TPM. The following sentence was added to the Materials and Methods.

      (L421-422) “The resulting normalized FPKM values were statistically equivalent to TPM.”

      (13) Please provide % mapped frequency of mutations in Table S3.

      They were all 100%. The partially fixed mutations were excluded in the present study. The following sentence was added to the caption of Table S3.

      (Supplementary file, p 9) “Note that the entire population held the mutations, i.e., 100% frequency in DNA sequencing.”

      (14) To my knowledge, M63 medium contains glucose and glycerol as carbon sources. The manuscript would benefit from discussing the elements that impose selection pressure in the M63 culture condition.

      Sorry for the missing information on M63, which contains 22 mM glucose as the only carbon source. The medium composition was added in the Materials and Methods, as follows.

      (L334-337) “In brief, the medium contains 62 mM dipotassium hydrogen phosphate, 39 mM potassium dihydrogen phosphate, 15 mM ammonium sulfate, 15 μM thiamine hydrochloride, 1.8 μM Iron (II) sulfate, 0.2 mM magnesium sulfate, and 22 mM glucose.”

      (15) The RNA-Seq datasets for Evo strains seemed equally heterogenous, just as their mutation profiles. However, the missing element in their analysis is the directionality of gene expression changes. I wonder what sort of biological significance can be derived from grouping expression changes based solely on DEGs, without considering the magnitude and the direction (up- and down-regulation) of changes? RNA-seq analysis in its current form seems superficial to derive biologically meaningful interpretations.

      We agree that most studies often discuss the direction of transcriptional changes. The present study aimed to capture a global view of the magnitude of transcriptome reorganization. Thus, the analyses focused on the overall features, such as the abundance of DEGs, instead of the details of the changes, e.g., the up- and down-regulation of DEGs. The biological meaning of the DEGs' overview was how significantly the genome-wide gene expression fluctuated, which might be short of an in-depth view of individual gene expression. The following sentence was added to indicate the limitation of the present analysis.

      (L199-202) “Instead of an in-depth survey on the directional changes of the DEGs, the abundance and functional enrichment of DEGs were investigated to achieve an overview of how significant the genome-wide fluctuation in gene expression, which ignored the details of individual genes.”

      Minor remarks

      (1) L41: brackets italicized "(E. coli)".

      It was fixed as follows.

      (L40) “… Escherichia coli (E. coli) cells …”

      (2) Figure S1. It is suggested that the x-axis of ALE monitor be set to 'generations' or 'cumulative generations', rather than 'days'.

      Thank you for the suggestion. Fig. S1 describes the experimental procedure, so the" day" was used. Fig. S2 presents the evolutionary process, so the "generation" was used, as you recommended here.

      (3) I found it difficult to digest through L61-64. Although it is not within the job scope of reviewers to comment on the language style, I must point out that the manuscript would benefit from professional language editing services.

      Sorry for the unclear writing. The sentences were revised as follows.

      (L60-64) “Previous studies have identified conserved features in transcriptome reorganization, despite significant disruption to gene expression patterns resulting from either genome reduction or experimental evolution 27-29. The findings indicated that experimental evolution might reinstate growth rates that have been disrupted by genome reduction to maintain homeostasis in growing cells.”

      (4) Duplicate references (No. 21, 42).

      Sorry for the mistake. It was fixed (leaving ref. 21).

      (5) Inconsistency in L105-106: "from two to 13".

      "From two to 13" was adopted from the language editing. It was changed as follows.

      (L119) “… from 2 to 13, …”

      Response to Reviewer #3:

      Thank you for reviewing our manuscript and for the helpful comments, which improved the strength of the manuscript. The recommended statistical analyses essentially supported the statement in the manuscript were performed, and those supposed to be the new results in the scope of further studies remained unconducted. The changes made in the revision were highlighted. We sincerely hope the revised manuscript and the following point-to-point response meet your concerns. You will find all your suggested statistic tests in our future work that report an extensive study on the experimental evolution of an assortment of reduced genomes.

      (1) Line 106 - "As 36 out of 45 SNPs were nonsynonymous, the mutated genes might benefit the fitness increase." This argument can be strengthened. For example, the null expectation of nonsynonymous SNPs should be discussed. Is the number of observed nonsynonymous SNPs significantly higher than the expected one?

      (2) Line 107 - "In addition, the abundance of mutations was unlikely to be related to the magnitude of fitness increase." Instead of just listing examples, a regression analysis can be added.

      Yes, it's significant. Random mutations lead to ~33% of nonsynonymous SNP in a rough estimation. Additionally, the regression is unreliable because there's no statistical significance between the number of mutations and the magnitude of fitness increase. Accordingly, the corresponding sentences were revised with additional statistical tests.

      (L123-129) “As 36 out of 45 SNPs were nonsynonymous, which was highly significant compared to random mutations (p < 0.01), the mutated genes might benefit fitness increase. In addition, the abundance of mutations was unlikely to be related to the magnitude of fitness increase. There was no significant correlation between the number of mutations and the growth rate in a statistical view (p > 0.1). Even from an individual close-up viewpoint, the abundance of mutations poorly explained the fitness increase.”

      (3) Line 114 - "They seemed highly related to essentiality, as 11 out of 49 mutated genes were essential (Table S3)." Here, the information mentioned in line 153 ("the ratio of essential to all genes (302 out of 3,290) in the reduced genome.") can be used. Then a statistical test for a contingency table can be used.

      (4) Line 117 - "the high frequency of the mutations fixed in the essential genes suggested the mutation in essentiality for fitness increase was the evolutionary strategy for reduced genome." What is the expected number of fixed mutations in essential genes vs non-essential genes? Is the observed number statistically significantly higher?

      Sorry for the improper and insufficient information on the essential genes. Yes, it's significant. The statistical test was additionally performed. The corresponding part was revised as follows.

      (L134-146) “They seemed highly related to essentiality7 (https://shigen.nig.ac.jp/ecoli/pec/genes.jsp), as 11 out of 49 mutated genes were essential (Table S3). Although the essentiality of genes might differ between the wild-type and reduced genomes, the experimentally determined 302 essential genes in the wild-type E. coli strain were used for the analysis, of which 286 were annotated in the reduced genome. The ratio of essential genes in the mutated genes was significantly higher than in the total genes (286 out of 3290 genes, Chi-square test p=0.008). As the essential genes were determined according to the growth35 and were known to be more conserved than nonessential ones 36,37, the high frequency of the mutations fixed in the essential genes was highly intriguing and reasonable. The large variety of genome mutations fixed in the independent lineages might result from a highly rugged fitness landscape 38. Nevertheless, it was unclear whether and how these mutations were explicitly responsible for recovering the growth rate of the reduced genome.”

      (5) The authors mentioned no overlapping in the single mutation level. Is that statistically significant? The authors can bring up what the no-overlap probability is given that there are in total x number of fixed mutations observed (either theory or simulation is good).

      Sorry, we feel confused about this comment. It's unclear to us why it needs to be statistically simulated. Firstly, the mutations were experimentally observed. The result that no overlapped mutated genes were detected was an Experimental Fact but not a Computational Prediction. We feel sorry that you may over-interpret our finding as an evolutionary rule, which always requires testing its reliability statistically. We didn't conclude that the evolution had no overlapped mutations. Secondly, considering 65 times random mutations happened to a ~3.9 Mb sequence, the statistical test was meaningful only if the experimental results found the overlapped mutations. It is interesting how often the random mutations cause the overlapped mutations in parallel evolutionary lineages while increasing the evolutionary lineages, which seems to be out of the scope of the present study. We are happy to include the analysis in our ongoing study on the experimental evolution of reduced genomes.

      (6) The authors mentioned no overlapping in the single mutation level. How about at the genetic level? Some fixed mutations occur in the same coding gene. Is there any gene with a significantly enriched number of mutations?

      No mutations were fixed in the same gene of biological function, as shown in Table S3. If we say the coding region, the only exception is the IS sequences, well known as the transposable sequences without genetic function. The following description was added.

      (L119-122) “The number of mutations largely varied among the nine Evos, from 2 to 13, and no common mutation was detected in all nine Evos (Table S3). A 1,199-bp deletion of insH was frequently found in the Evos (Table S3, highlighted), which well agreed with its function as a transposable sequence.”

      (7) Line 151-156- It seems like the authors argue that the expression level differences can be just explained by the percentage of essential genes that get fixed mutations. One further step for the argument could be to compare the expression level of essential genes with vs without fixed mutations. Also, the authors can compare the expression level of non-essential genes with vs without fixed mutations. And the authors can report whether the differences in expression level became insignificant after the control of the essentiality.

      It's our pleasure that the essentiality intrigued you. Thank you for the analytical suggestion, which is exciting and valuable for our studies. As only 11 essential genes were detected here and "Mutation in essentiality" was an indication but not the conclusion of the present study, we would like to apply the recommended analysis to the datasets of our ongoing study to demonstrate this statement. Thank you again for your fruitful analytical advice.

      (8) Line 169- "The number of DEGs partially overlapped among the Evos declined significantly along with the increased lineages of Evos (Figure 4B). " There is a lack of statistical significance here while the word "significantly" is used. One statistical test that can be done is to use re-sampling/simulation to generate a null expectation of the overlapping numbers given the DEGs for each Evo line and the total number of genes in the genome. The observed number can then be compared to the distribution of the simulated numbers.

      Sorry for the inappropriate usage of the term. Whether it's statistically significant didn't matter here. The word "significant" was deleted as follows.

      (L205--206) “The number of DEGs partially overlapped among the Evos declined along with the increased lineages of Evos (Fig. 4B).”

      (9) Line 177-179- "In comparison,1,226 DEGs were induced by genome reduction. The common DEGs 177 of genome reduction and evolution varied from 168 to 540, fewer than half of the DEGs 178 responsible for genome reduction in all Evos" Is the overlapping number significantly lower than the expectation? The hypergeometric test can be used for testing the overlap between two gene sets.

      There's no expectation for how many DEGs were reasonable. Not all numbers experimentally obtained are required to be statistically meaningful, which is commonly essential in computational and data science.

      (10) The authors should give more information about the ancestral line used at the beginning of experimental evolution. I guess it is one of the KHK collection lines, but I can not find more details. There are many genome-reduced lines. Why is this certain one picked?

      Sorry for the insufficient information on the reduced genome used for the experimental evolution. The following descriptions were added in the Results and the Materials and Methods, respectively.

      (L75-79) “The E. coli strain carrying a reduced genome, derived from the wild-type genome W3110, showed a significant decline in its growth rate in the minimal medium compared to the wild-type strain 13. To improve the genome reduction-mediated decreased growth rate, the serial transfer of the genome-reduced strain was performed with multiple dilution rates to keep the bacterial growth within the exponential phase (Fig. S1), as described 17,20.”

      (L331-334) “The reduced genome has been constructed by multiple deletions of large genomic fragments 58, which led to an approximately 21% smaller size than its parent wild-type genome W3110.”

      (11) How was the saturated density in Figure 1 actually determined? In particular, the fitness assay of growth curves is 48h. But it seems like the experimental evolution is done for ~24 h cycles. If the Evos never experienced a situation like a stationary phase between 24-48h, and if the author reported the saturated density 48 h in Figure 1, the explanation of the lower saturated density can be just relaxation from selection and may have nothing to do with the increase of growth rate.

      Sorry for the unclear description. Yes, you are right. The evolution was performed within the exponential growth phase (keeping cell division constant), which means the Evos never experienced the stationary phase (saturation). The final evolved populations were subjected to the growth assay to obtain the entire growth curves for calculating the growth rate and the saturated density. Whether the decreased saturated density and the increased growth rate were in a trade-off relationship remained unclear. The corresponding paragraph was revised as follows.

      (L100-115) “Intriguingly, a positive correlation was observed between the growth fitness and the carrying capacity of the Evos (Fig. 1D). It was somehow consistent with the positive correlations between the colony growth rate and the colony size of a genome-reduced strain 11 and between the growth rates and the saturated population size of an assortment of genome reduced strains 13. Nevertheless, the negative correlation between growth rate and carrying capacity, known as the r/K selection30,31 was often observed as the trade-off relationship between r and K in the evolution and ecology studies 32 33,34. As the r/K trade-off was proposed to balance the cellular metabolism that resulted from the cost of enzymes involved 34, the deleted genes might play a role in maintaining the metabolism balance for the r/K correlation. On the other hand, the experimental evolution (i.e., serial transfer) was strictly performed within the exponential growth phase; thus, the evolutionary selection was supposed to be driven by the growth rate without selective pressure to maintain the carrying capacity. The declined carrying capacity might have been its neutral "drift" but not a trade-off to the growth rate. Independent and parallel experimental evolution of the reduced genomes selecting either r or K is required to clarify the actual mechanisms.”

      (12) What annotation of essentiality was used in this paper? In particular, the essentiality can be different in the reduced genome background compared to the WT background.

      Sorry for the unclear definition of the essential genes. They are strictly limited to the 302 essential genes experimentally determined in the wild-type E coli strain. Detailed information can be found at the following website: https://shigen.nig.ac.jp/ecoli/pec/genes.jsp. We agree that the essentiality could differ between the WT and reduced genomes. Identifying the essential genes in the reduced genome will be an exhaustedly vast work. The information on the essential genes defined in the present study was added as follows.

      (L134-139) “They seemed highly related to essentiality7 (https://shigen.nig.ac.jp/ecoli/pec/genes.jsp), as 11 out of 49 mutated genes were essential (Table S3). Although the essentiality of genes might differ between the wild-type and reduced genomes, the experimentally determined 302 essential genes in the wild-type E. coli strain were used for the analysis, of which 286 were annotated in the reduced genome.”

      (13) The fixed mutations in essential genes are probably not rarely observed in experimental evolution. For example, fixed mutations related to RNA polymerase can be frequently seen when evolving to stressful environments. I think the author can discuss this more and elaborate more on whether they think these mutations in essential genes are important in adaptation or not.

      Thank you for your careful reading and the suggestion. As you mentioned, we noticed that the mutations in RNA polymerases (rpoA, rpoB, and rpoD) were identified in three Evos. As they were not shared across all Evos, we didn't discuss the contribution of these mutations to evolution. Instead of the individual functions of the mutated essential gene functions, we focused on the enriched gene functions related to the transcriptome reorganization because they were the common feature observed across all Evos and linked to the whole metabolic or regulatory pathways, which are supposed to be more biologically reasonable and interpretable. The following sentence was added to clarify our thinking.

      (L268-273) “In particular, mutations in the essential genes, such as RNA polymerases (rpoA, rpoB, rpoD) identified in three Evos (Table S3), were supposed to participate in the global regulation for improved growth. Nevertheless, the considerable variation in the fixed mutations without overlaps among the nine Evos (Table 1) implied no common mutagenetic strategy for the evolutionary improvement of growth fitness.”

      (14) In experimental evolution to new environments, several previous literature also show that long-term experimental evolution in transcriptome is not consistent or even reverts the short-term response; short-term responses were just rather considered as an emergency plan. They seem to echo what the authors found in this manuscript. I think the author can refer to some of those studies more and make a more throughput discussion on short-term vs long-term responses in evolution.

      Thank you for the advice. It's unclear to us what the short-term and long-term responses referred to mentioned in this comment. The "Response" is usually used as the phenotypic or transcriptional changes within a few hours after environmental fluctuation, generally non-genetic (no mutation). In comparison, long-term or short-term experimental "Evolution" is associated with genetic changes (mutations). Concerning the Evolution (not the Response), the long-term experimental evolution (>10,000 generations) was performed only with the wild-type genome, and the short-term experimental evolution (500~2,000 generations) was more often conducted with both wild-type and reduced genomes, to our knowledge. Previous landmark studies have intensively discussed comparing the wild-type and reduced genomes. Our study was restricted to the reduced genome, which was constructed differently from those reduced genomes used in the reported studies. The experimental evolution of the reduced genomes has been performed in the presence of additional additives, e.g., antibiotics, alternative carbon sources, etc. That is, neither the genomic backgrounds nor the evolutionary conditions were comparable. Comparison of nothing common seems to be unproductive. We sincerely hope the recommended topics can be applied in our future work.

      Some minor suggestions

      • Figures S3 & Table S2 need an explanation of the abbreviations of gene categories.

      Sorry for the missing information. Figure S3 and Table S3 were revised to include the names of gene categories. The figure was pasted followingly for a quick reference.

      Author response image 3.

      • I hope the authors can re-consider the title; "Diversity for commonality" does not make much sense to me. For example, it can be simply just "Diversity and commonality."

      Thank you for the suggestion. The title was simplified as follows.

      (L1) “Experimental evolution for the recovery of growth loss due to genome reduction.”

      • It is not easy for me to locate and distinguish the RNA-seq vs DNA-seq files in DRA013662 at DDBJ. Could you make some notes on what RNA-seq actually are, vs what DNA-seq files actually are?

      Sorry for the mistakes in the DRA number of DNA-seq. DNA-seq and RNA-seq were deposited separately with the accession IDs of DRA013661 and DRA013662, respectively. The following correction was made in the revision.

      (L382-383) “The raw datasets of DNA-seq were deposited in the DDBJ Sequence Read Archive under the accession number DRA013661.”

    1. Author response:

      eLife assessment

      In this valuable study, Kumar et al., provide evidence suggesting that the p130Cas drives the formation of condensates that sprout from focal adhesions to cytoplasm and suppress translation. Pending further substantiation, this study was found to be likely to provide previously unappreciated insights into the mechanisms linking focal adhesions to the regulation of protein synthesis and was thus considered to be of broad general interest. However, the evidence supporting the proposed model was incomplete; additional evidence is warranted to substantiate the relationship between p130Cas condensates and mRNA translation and establish corresponding functional consequences.

      We thank the Elife editorial team for their positive assessment of the broad significance of our manuscript. We fully agree that the functional consequences need to be explored in more detail. We feel that many of the criticisms are valid points that are not easily addressed via available tools, thus, should be considered limitations of present approaches. We hope that readers appreciate that identification of a new class of liquid-liquid phase separations calls for much more work to fully explore their characteristics, regulation and function, which will likely advance many areas of cell biology and perhaps even medicine.

      Reviewer #1 (Public Review):

      Summary:

      The authors demonstrated the phenomenon of p130Cas, a protein primarily localized at focal adhesions, and its formation of condensates. They identified the constituents within the condensates, which include other focal adhesion proteins, paxillin, and RNAs. Furthermore, they proposed a link between p130Cas condensates and translation.

      Strengths:

      Adhesion components undergo rapid exchange with the cytoplasm for some unclear biological functions. Given that p130Cas is recognized as a prominent mechanical focal adhesion component, investigating its role in condensate formation, particularly its impact on the translation process, is intriguing and significant.

      We thank the reviewer for recognizing the functional significance of the work.

      Weaknesses:

      The authors identified the disordered region of p130Cas and investigated the formation of p130Cas condensate. They attempted to demonstrate that p130Cas condensates inhibit translation, but the results did not fully support this assertion. There are several comments below:

      (1) Despite isolating p130Cas-GFP protein using GFP-trap beads, the authors cannot conclusively eliminate the possibility of isolating p130Cas from focal adhesions. While the characterization of the GFP-tagged pulls can reveal the proteins and RNAs associated with p130Cas, they need to clarify their intramolecular mechanism of localization within p130Cas droplets. Whether the protein condensates retain their liquid phase or these GFP-p130Cas pulls represent protein aggregate remains uncertain.

      We agree, the isolation from cell lysates does not distinguish between focal adhesions and cytoplasmic LLPS. We note that p130Cas in focal adhesions also appears to be in LLPS. But there are no methods available to isolate them separately. We acknowledge this is a limitation of the study.

      (2) The authors utilized hexanediol and ammonium acetate to highlight the phenomenon of p130Cas condensates. Although hexanediol is an inhibitor for hydrophobic interactions and ammonium acetate is a salt, a more thorough explanation of the intramolecular mechanisms underlying p130Cas protein-protein interaction is required. Additionally, given that the size of p130Cas condensates can exceed >100um2, classification is needed to differentiate between p130Cas condensates and protein aggregation.

      Ammonium acetate, which works by promoting hydrophobic interactions and weak Van der Waals forces, has been widely used in phase separation studies to change ionic strength without altering intracellular pH. Conversely, hexanediol weakens hydrophobic/ Van der Walls interactions that commonly mediate phase separation of IDRs. In the case of p130Cas, the multiple tyrosines and within the scaffolding domain are obvious targets. If the reviewer is asking us to resolve the detailed hydrophobic interactions within the scaffolding domain, this is far beyond the scope of the current paper.

      Protein aggregates are defined by their characteristics (e.g irreversibility, departure from spherical) not by size. Older, larger droplets remain circular and show slower but still measurable rates of exchange. Moreover, droplets are essentially absent after trypsinizing and replating cells. All these results argue against aggregates.

      (3) The connection between p130Cas condensates and translation inhibition appears tenuous. The data only suggests a correlation between p130Cas expression and translation inhibition. Further evidence is required to bolster this hypothesis.

      The optogenetic experiment shows that triggering LLPS by dimerizing p130Cas results in inhibition of translation. This is a causal not a correlative experiment. The reviewer may be thinking that dimerizing p130Cas could stimulate focal adhesion signaling, activating FAK or a src family kinase or other signals. However, none of these signals has been linked to inhibition of cell growth or migration. Thus, we agree that this is a limitation but consider it a low probability mechanism.

      Reviewer #2 (Public Review):

      Summary:

      In this article, Kumar et al., report on a previously unappreciated mechanism of translational regulation whereby p130Cas induces LLPS condensates that then traffic out from focal adhesion into the cytoplasm to modulate mRNA translation. Specifically, the authors employed EGFP-tagged p130Cas constructs, endogenous p130Cas, and p130Cas knockouts and mutants in cell-based systems. These experiments in conjunction with various imaging techniques revealed that p130Cas drives assembly of LLPS condensates in a manner that is largely independent of tyrosine phosphorylation. This was followed by in vitro EGFP-tagged p130Cas-dependent induction of LLPS condensates and determination of their composition by mass spectrometry, which revealed enrichment of proteins involved in RNA metabolism in the condensates. The authors excluded the plausibility that p130Cas-containing condensates co-localize with stress granules or p-bodies. Next, the authors determined mRNA compendium of p130Cas-containing condensates which revealed that they are enriched in transcripts encoding proteins implicated in cell cycle progression, survival, and cell-cell communication. These findings were followed by the authors demonstrating that p130Cas-containing condensates may be implicated in the suppression of protein synthesis using puromycylation assay. Altogether, it was found that this study significantly advances the knowledge pertinent to the understanding of molecular underpinnings of the role of p130Cas and more broadly focal adhesions on cellular function, and to this end, it is likely that this report will be of interest to a broad range of scientists from a wide spectrum of biomedical disciplines including cell, molecular, developmental and cancer biologists.

      Strengths:

      Altogether, this study was found to be of potentially broad interest inasmuch as it delineates a hitherto unappreciated link between p130Cas, LLPS, and regulation of mRNA translation. More broadly, this report provides unique molecular insights into the previously unappreciated mechanisms of the role of focal adhesions in regulating protein synthesis. Overall, it was thought that the provided data sufficiently supported most of the authors' conclusions. It was also thought that this study incorporates an appropriate balance of imaging, cell and molecular biology, and biochemical techniques, whereby the methodology was found to be largely appropriate.

      We thank reviewer for this positive assessment.

      Weaknesses:

      Two major weaknesses of the study were noted. The first issue is related to the experiments establishing the role of p130Cas-driven condensates in translational suppression, whereby it remained unclear whether these effects are affecting global mRNA translation or are specific to the mRNAs contained in the condensates. Moreover, some of the results in this section (e.g., experiments using cycloheximide) may be open to alternative interpretation. The second issue is the apparent lack of functional studies, and although the authors speculate that the described mechanism is likely to mediate the effects of focal adhesions on e.g., quiescence, experimental testing of this tenet was lacking.

      We appreciate the reviewer’s insights. Assessing translational inhibition for specific genes rather than global measurement of translation is an important direction for future work.

      Regarding the cycloheximide experiments, we are unsure what the reviewer means. We used it as a control for puromycin labeling but this is a very standard approach. It seems more likely that the question concerns Fig 5G, where we used it to sequester mRNAs on ribosomes to deplete from other pools. In this case, p130cas condensates decrease after 2 minutes. The reviewer may be suggesting that this effect could be due to blocked translation per se and loss of short-lived proteins. We acknowledge that this is possible but given the very rapid effect (2 min), we think it unlikely.

      Lastly, we agree with the reviewer that further functional studies in quiescence or senescence are warranted; however, these are extensive, open-ended studies and we will not be able to include them as part of the current paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this valuable study, the authors investigate the transcriptional landscape of tuberculous meningitis, revealing important molecular differences contributed by HIV co-infection. Whilst some of the evidence presented is compelling, the bioinformatics analysis is limited to a descriptive narrative of gene-level functional annotations, which are somewhat basic and fail to define aspects of biology very precisely. Whilst the work will be of broad interest to the infectious disease community, validation of the data is critical for future utility.

      We appreciate with eLife’s positive assessment, although we challenge the conclusion that we ‘fail to define aspects of biology very precisely’. Our stated objective was to use bioinformatics tools to identify the biological pathways and hub genes associated with TBM pathogenesis and the eLife assessment affirms we have investigated ‘the transcriptional landscape of tuberculous meningitis’. To more precisely define aspects of the biology will require another study with different design and methods.

      Reviewer #1 (Public Review):

      Summary:

      Tuberculous meningitis (TBM) is one of the most severe forms of extrapulmonary TB. TBM is especially prevalent in people who are immunocompromised (e.g. HIV-positive). Delays in diagnosis and treatment could lead to severe disease or mortality. In this study, the authors performed the largest-ever host whole blood transcriptomics analysis on a cohort of 606 Vietnamese participants. The results indicated that TBM mortality is associated with increased neutrophil activation and decreased T and B cell activation pathways. Furthermore, increased angiogenesis was also observed in HIV-positive patients who died from TBM, whereas activated TNF signaling and down-regulated extracellular matrix organisation were seen in the HIV-negative group. Despite similarities in transcriptional profiles between PTB and TBM compared to healthy controls, inflammatory genes were more active in HIV-positive TBM. Finally, 4 hub genes (MCEMP1, NELL2, ZNF354C, and CD4) were identified as strong predictors of death from TBM.

      Strengths:

      This is a really impressive piece of work, both in terms of the size of the cohort which took years of effort to recruit, sample, and analyse, and also the meticulous bioinformatics performed. The biggest advantage of obtaining a whole blood signature is that it allows an easier translational development into a test that can be used in the clinical with a minimally invasive sample. Furthermore, the data from this study has also revealed important insights into the mechanisms associated with mortality and the differences in pathogenesis between HIV-positive and HIV-negative patients, which would have diagnostic and therapeutic implications.

      Weaknesses:

      The data on blood neutrophil count is really intriguing and seems to provide a very powerful yet easy-to-measure method to differentiate survival vs. death in TBM patients. It would be quite useful in this case to perform predictive analysis to see if neutrophil count alone, or in combination with gene signature, can predict (or better predict) mortality, as it would be far easier for clinical implementation than the RNA-based method. Moreover, genes associated with increased neutrophil activation and decreased T cell activation both have significantly higher enrichment scores in TBM (Figure 9) and in morality (Figure 8). While I understand the basis of selecting hub genes in the significant modules, they often do not represent these biological pathways (at least not directly associated in most cases). If genes were selected based on these biologically relevant pathways, would they have better predictive values?

      We conducted a sensitivity analysis including blood neutrophil as a potential predictor in the multivariate Cox elastic-net regression model for important predictor selection (Table S14). In this analysis, all six selected important predictors (genes and clinical risk factors) identified in the original analysis (Table S13) were also selected, together with blood neutrophil number. Additionally, we evaluated the predictive value of blood neutrophil alone, which demonstrated poor performance, with an optimism-corrected AUC of 0.63 for all TBM, 0.67 for HIV-negative TBM, and 0.70 for HIV-positive TBM. Even when combined with identified gene signatures, blood neutrophil did not improve the overall performance of predictive model (optimism-corrected AUC of 0.79 for all TBM, 0.76 for HIV-negative TBM, and 0.80 for HIV-positive). These results indicate that identified hub genes exhibit better predictive values compared to blood neutrophil alone or in combination. These findings have been incorporated into our manuscript results.

      To test whether pathway representative genes have better predictive values than hub genes, we included all these genes in the analysis for important predictor selection. Pathway representative genes comprised ANXA3 and CXCR2 representing neutrophil activation and IL1b representing acute inflammatory response. We observed that all hub genes (MCEMP1, NELL2, ZNF354C, and CD4) consistently emerged as the most important genes with the highest selection in the models, compared to the rest, in both the HIV-negative TBM and HIV-positive TBM cohorts. Additionally, these identified hub genes were still selected when testing together with other hub genes representing relevant biological pathways associated with TBM mortality, such as CYSTM1 involved in neutrophil activation, TRAF5 involved in NF-kappa B signaling pathway, CD28 and TESPA1 involved in T cell receptor signaling. These results show that selected genes based on known biologically relevant pathways did not give better predictive values than the identified hub genes in the significant modules.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript describes the analysis of blood transcriptomic data from patients with TB meningitis, with and without HIV infection, with some comparison to those of patients with pulmonary tuberculosis and healthy volunteers. The objectives were to describe the comparative biological differences represented by the blood transcriptome in TBM associated with HIV co-infection or survival/mortality outcomes and to identify a blood transcriptional signature to predict these outcomes. The authors report an association between mortality and increased levels of acute inflammation and neutrophil activation, but decreased levels of adaptive immunity and T/B cell activation. They propose a 4-gene prognostic signature to predict mortality.

      Strengths:

      Biological evaluations of blood transcriptomes in TB meningitis and their relationship to outcomes have not been extensively reported previously.

      The size of the data set is a major strength and is likely to be used extensively for secondary analyses in this field of research.

      Weaknesses:

      The bioinformatic analysis is limited to a descriptive narrative of gene-level functional annotations curated in GO and KEGG databases. This analysis cannot be used to make causal inferences. In addition, the functional annotations are limited to 'high-level' terms that fail to define biology very precisely. At best, they require independent validation for a given context. As a result, the conclusions are not adequately substantiated. The identification of a prognostic blood transcriptomic signature uses an unusual discovery approach that leverages weighted gene network analysis that underpins the bioinformatic analyses. However, the main problem is that authors seem to use all the data for discovery and do not undertake any true external validation of their gene signature. As a result, the proposed gene signature is likely to be overfitted to these data and not generalisable. Even this does not achieve significantly better prognostic discrimination than the existing clinical scoring.

      As explained in response to the eLife assessment, our objective was to use bioinformatics tools to identify the biological pathways and hub genes associated with TBM pathogenesis. We agree that ‘This analysis cannot be used to make causal inferences’: that would require different study design and approaches. The proposed gene signature has higher AUC values than the existing clinical model alone or in combination with clinical risk factors (Table 4). We agree that independent validation of the gene signature will be a crucial next step for future utility. We have performed qPCR in another sample set, and have added these results in the revision (Table 4 and supplementary figure S8)

      Reviewer #1 (Recommendations For The Authors):

      I have a few additional comments most of which are relatively minor:

      (1) Can the authors please clarify if all the PTB cases are also HIV-negative?

      This has been added to the methods section.

      (2) For Table 1, can the authors please list the total number of patients with microbiologically confirmed TB regardless of the methods used? And for the two TBM groups, was the positive microbiology based on CSF findings?

      The total number of patients with microbiologically confirmed TB was presented in Table 2 in definite TBM group, which was microbiologically confirmed TB diagnosed using microscopy, culture, and Xpert testing in cerebrospinal fluid (CSF) samples. We have updated the note in Table 2 to provide clarity on the definition.

      (3) How was the discovery and validation set selected? Was it based on randomisation?

      We randomly split TBM data into two datasets, a discovery cohort (n=142) and a validation cohort (n=139) with a purpose to ensure reproducibility of data analysis. We described this in the methods section.

      (4) Line 107 can be better clarified by stating that the overall 3-month mortality rate is 21.7% for TBM regardless of HIV status.

      Thank you, we have restated this sentence in the results section.

      (5) The authors stated that samples were collected at enrolment when patients would have received less than 6 days of anti-tubercular treatment. Is there information on the median and IQR on the number of days that the patients would have received Rx, especially between the groups? Did the authors control for this variable when analysing for DEGs?

      One of criteria to enroll participants in LAST-ACT and ACT-HIV trials is that they must receive less than 6 consecutive days of two or more drugs active against M. tuberculosis. However, the information of the days that the patients would have received Rx was not recorded and we could not control this variable when performing differential expression analysis for DEGs. This has been clarified further in the methods section: ‘The samples were taken at enrollment, when patients could not have received more than 6 consecutive days of two or more drugs active against M. tuberculosis.’

      (6) I am a little bit concerned with the reads mapping accuracy (57%) to the human genome, which is fairly low. Did the authors investigate the reasons behind this low accuracy?

      Thank you. It was indeed a typo. We have corrected it in the results section.

      (7) On Tables S2-S4, can the authors please clarify what the last column (labelled as "B") shows?

      Tables S2-S4 now have been changed to S3-S5. We have updated the legend of these tables to provide clarification regarding the meaning of the last column.

      Reviewer #2 (Recommendations For The Authors):

      If the authors wish to revise their manuscript, I suggest the following amendments:

      (1) Provide a consort diagram for the selection of samples included in the present analysis (from parent study cohorts), allocation to test and validation splits for bioinformatics analysis, and outcomes.

      We have provided our consort diagram in supplementary Figure S10.

      (2) Provide details of inclusion criteria for pulmonary TB cohort, and how samples from this cohort were selected for inclusion in the present analysis. Please clarify whether this cohort excluded HIV-positive participants by design or by chance.

      The inclusion criteria for the pulmonary TB cohort were described in the methods section. Due to the very low prevalence of HIV in this prospective observational study, HIV-positive participants were excluded. We have clarified in the amended manuscript that the pulmonary TB cohort only included HIV-negative participants.

      (3) Baseline characteristics of HIV-positive participants (Table 1) should include CD4 count, HIV viral load, and whether anti-retroviral therapy was naïve or experienced.

      We have included pre-treatment CD4 cell count, information on anti-retroviral therapy, and HIV viral load data in Table 1, as well as described these information in the results section.

      (4) I note that the TBM samples were derived from RCTs of adjunctive steroid therapy, but not stratified in the present analysis by treatment arm allocation. Clearly, this may affect the survival/mortality outcomes that are the central focus of this manuscript. Therefore, they should be included in the models for differential gene expression analysis and prognostic signature discovery. To do so, the authors may need to wait until they are able to unblind the trial metadata.

      With permission from the trial investigators, we were able to adjust the analyses for treatment with corticosteroids. The investigators remained blind to the allocation and we have not reported any direct effects of corticosteroids on outcome – such an analysis could only be done once the LAST-ACT trial has been reported (which won’t be until the end of 2024). Treatment outcome and effect were blinded by extracting only the fold change difference between survival and death in the linear regression model, in which gene expression was outcome and survival and treatment were covariates.

      (5) I understood from the methods (lines 460-461) that batch correction of the RNAseq data was necessary. However, it is not clear how the samples were batched. PCA of the transcriptomes before and after batch correction with batch and study group labels should be provided. I would also advocate for a sensitivity analysis to check the robustness of the main findings without batch correction. I assume Fig2A represents batch-corrected data, but this is not clear.

      We have now added information about the RNA sequencing batch and the batch correction approach, analyses and data visualizations utilized batch-corrected data in the methods section. We have also updated results related to batch correction in Fig. 2A and Supplementary Figure S9.

      (6) I would encourage the authors to include a differential gene expression analysis to directly compare the transcriptome of TBM to that of pulmonary TB. I think it would add additional value to their focus on describing the transcriptome in TBM.

      We thank for reviewer’s suggestion. Conducting differential gene expression analysis to compare the transcriptome of TBM with that of PTB is beyond the scope of this manuscript and we will examine this question separately.

      (7) I don't really understand the purpose of splitting their data set into test and validation for the purposes of showing that WGCNA analysis is mostly reproduced in the two halves of the data. I would advocate that they scrap this approach to maximise the statistical power of their analysis in the descriptive work.

      As mentioned in response to reviewer #1 in question #3, the purpose of splitting data is to ensure the reproducibility of the data analysis as suggested by Langfelder et al. (PMID: 21283776). This approach served two purposes: (i) to affirm the existence of functional modules in an independent cohort and (ii) to validate the association of interested modules or their hub genes with survival outcomes.

      (8) The authors should soften the confidence in their interpretation of the GO/KEGG annotations of WGCNA modules. At least, they should include a paragraph that explicitly details the limitations of their analyses, including (i) the accuracy GO/KEGG annotations are not validated in this context (if at all), (ii) that none of the data can be used to make causal inferences and (iii) that peripheral blood assessments that are obviously impacted by changes in cellular composition of peripheral blood do not necessarily reflect immunopathogenesis at the site of disease - in fact if circulating cells are being recruited to the site of disease or other immune compartments, then quite the opposite interpretations may be true.

      We appreciate the reviewer's comment. (i) In our analysis, we initially confirmed the existence of Weighted Gene Co-expression Network Analysis (WGCNA) modules in discovery cohort and validated the association of these modules with mortality outcomes in validation cohort. We then applied GO/KEGG annotations to define the biological functions involved in WGCNA modules. Finally, we performed Qusage analysis to directly test the association of top-hit pathways of each WGCNA module with mortality outcomes (see supplementary S6). This analysis approach helped to identify and validate modules and biological pathways associated with TBM mortality in this context, avoiding potential false positives in GO/KEGG annotations of WGCNA modules. (ii) We agree with the assessment that 'This analysis cannot be used to make causal inferences,' as that would require a different study design and approach. (iii) The focus of this study is to investigate the pathogenesis of TBM in the systemic immune system. We have highlighted this focus in the title and the aim of the manuscript.

      (9) For the prognostic signature discovery and validation, I strongly recommend the authors include more robust validation. For example, to undertake an 80:20 split for sequential discovery (for feature selection and derivation of a prognostic model), followed by validation of a 'locked' model in data that made no contribution to discovery. In two separate sensitivity analyses. I also suggest they split their dataset (i) by treatment allocation in the RCT and (ii) by HIV status. In addition, their method for feature selection has to be clearer- precisely how they select hub genes from their WGCNA analysis as candidate predictors is not explained. Since this is such a prominent output of their manuscript, the results of this analysis should really be included in the main manuscript, and all performance metrics for discrimination should include confidence intervals.

      Employing an 80:20 split for training and testing models is a good approach for an internal validation. However, we addressed the issue of overestimating the performance of a prognostic model by bootstrapping sampling approach proposed by Steyerberg et al. (PMID: 11470385). This approach has been proven to provide stable estimates with low bias. The overall model performance for discrimination, reported in our manuscript, was corrected for “optimism” to ensure internal validity. This adjustment was achieved through a 1000-times bootstrapping approach, which effectively accounted for estimation uncertainty. As such, there is no need to present confidence intervals for these metrics.

      Moreover, in our revision, to confirm prognostic signatures independently, we have evaluated the predictive value of identified gene signatures using qPCR in another set of samples. The results have been added in Table 4, supplementary Figure S8 and the results section.

      For the reasons given above (comment 4), we are unable to split our dataset by treatment allocation in this analysis. But as described, we have adjusted the analysis for corticosteroid treatment. Once the primary results of the LAST ACT trial have been published, we will examine the impact of corticosteroids on TBM pathophysiology and outcomes, seeking to better understand the mechanisms by which steroids have their therapeutic effects.

      Given the difference in pathogenesis and immune response by HIV-coinfection, we stratified our analysis by HIV status. As the reviewer’s suggestion, we have provided additional details in the methods section regarding the selection of hub genes from associated WGCNA modules and the feature selection process for predictive modeling.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      __Below is our point-by-point reply to the reviewer's comments __

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      PNKP is one of critical end-processing enzymes for DNA damage repair, mainly base excision & single strand break repair, and double strand break repair to a certain extent. This protein has dual enzyme function: 3' phosphatase and 5' kinase to make DNA ends proper for ligation. It has been demonstrated that PTM of PNKP (e.g., S114, S126), particularly phosphorylation by either ATM or DNAPK, is important for PNKP function in DNA damage repair. The authors found a new phosphorylation site, T118, of PNKP which might be modified by CDK1 or 2 during S phase. This modification of phosphorylation is involved in maintenance and stability of the lagging strand, particularly Okazaki fragments. Loss of this phosphorylation could result in increased single strand gaps, accelerated speed of fork progression, and eventually genomic instability. And for this process, PNKP enzyme activity is not that important. And the authors concluded that PNKP T118 phosphorylation is important for lagging strand stability and DNA damage repair.

      Major comments

      In general, enzymes have protein interactions with its/their substrates. If PNKP is phosphorylated by either/both CDK1/2, the protein interaction between these would be expected. However, the authors did not provide any protein interactions in PNKP and CDKs. *Thank you for your suggestion. We will perform GFP-pulldown assays using cell extracts from HEK293 cells expressing GFP-WT-PNKP, GFP-T118A-PNKP. And then to confirm the interaction of PNKP and CDK1/2, we will blot with CDK1 and CDK2 antibodies. *

      It is not clear how T118 phosphorylation is involved in DNA damage repair itself as the authors suggested. The data presenting the involvement of T118 phosphorylation in this mechanism are limited. This claim opens more questions than answers. CDK1/2 still phosphorylates T118 in this DNA damage repair process? What would happen to DNA damage repair in which PNKP involves outside of S phase in terms of T118 phosphorylation?

      Thank you for your comment. We have investigated how T118 phosphorylation is important in DNA damage repair by several experiments. In figure S8, we tested SSB and DSB repair abilities of PNKP KO cells expressing PNKP T118A mutant, in which PNKP T118 phosphorylation has critical roles in both SSB and DSB repair pathways. Interestingly, the result of SSB repair assay (figure S8A & B) may indirectly indicate that T118 phosphorylation is important for SSB repair throughout cell cycle as these SSBs are instantly induced by IR exposure and recovered only for 30 mins that is presumably not enough time for cells to go through cell cycle. Along with the repair abilities, we also analyzed a recruitment kinetics/ability to DNA damage in PNKP T118A and T118D mutants using laser micro-irradiation assay in figure S9. This result indicates that the phosphorylation of PNKP at T118 is controlling its recruitment to at least laser-induced DNA damage sites. Moreover, we have analyzed recruitment of PNKP to a single-strand DNA gap structure, which mimics intermediates of some DNA repair pathways and incomplete Okazaki fragment maturation, using cell extracts from PNKP KO cells expressing PNKP T118A and T118D mutants and biochemical assay in figure 4H. This assay is much cleaner and shows that loss of T118 phosphorylation impairs PNKP recruitment to the ssDNA gap structure. We believe that these data sufficiently support our model that the phosphorylation of T118 on PNKP is involved in DNA repair in general. However, we agree with that we have not yet directly tested DNA repair ability of PNKP T118A in outside of S-phase. Therefore, in addition to these data, we will perform H2O2-induced SSB and IR-induced DSB repair assay using EdU (S phase) pulse labelling in PNKP KO cells expressing PNKP T118A mutant, then we will measure the ADP-ribose intensity and pH2AX foci in EdU negative cells (outside of S phase as the reviewer suggested).

      Along the same line with #1/2 comments, the recruitment of PNKP to the damage sites is XRCC1 dependent. Is not clear whether PNKP recruitment to gaps on the lagging strand is XRCC1 independent or dependent. It might be interesting to examine (OPTIONAL)

      *Thank you for an important suggestion. XRCC1 acts as a scaffold of PNKP and is required for recruitment of PNKP for canonical SSB repair, although we propose that PNKP is involved in two pathways in DNA replication: PARP1-XRCC1-dependent ssDNA gap filling pathway and Okazaki fragment maturation pathway working with FEN1. It is still important to address how XRCC1 is required for PNKP recruitment to the single-strand gaps on nascent DNA. Therefore, we will perform iPOND analysis in XRCC1 knock down + GFP-WT-PNKP expressed HEK293 cells. *

      Minor comments

      In results: 'Generation of PNKP knock out U2OS cell line' - In figure S2A; There are no data regarding diminishing the phosphorylation of g-H2AX.

      Thank you for your suggestion. We will add pH2AX blot data in fig S2A (all reviewers requested).

      • By showing data in figure S2B/C/D/E, the authors describe 'PNKP KO cells impaired the SSBs repair activity'. However, as the authors mentioned in this manuscript, PNKP could bind to either XRCC1 or XRCC4. Also for this experiment, IR had been applied, which induces DNA double strand breaks. Therefore, it is not certain that the authors' description is fully supported by these data presented. Perhaps, SSB inducing reagents should be used instead of IR.

      In figure S2B/C/D/E, we used gamma-ray as IR source, which classified as low energy transfer irradiation. which mainly act as indirect effect to the DNA. It is estimated gamma-ray induce DNA damage as 60-80% SSBs and 20-40 % DSBs. We believe our results are reasonable. In addition to these results, we will perform poly-ADP-ribose assay with H2O2 treatment to more specifically assess SSBs repair activity.

      • Is there any FACS analysis data to support the description of the last sentence 'especially the phosphorylation of PNKP T118, is required for S phase progression and proper cell proliferation'?

      Thank you for your suggestion. We will add the FACS analysis data of cell cycle profiles in PNKP KO cells expressing GFP, GFP-PNKP WT, T118A.

      In results: 'CDKs phosphorylate T118 of PNKP ~~~ replication forks'

      • In figure 3A, Is there any change in total PNKP (both GFP-tagged & endogenous) level?

      *Thank you for your suggestion. We agree with your comment. We will add the PNKP expression analysis in different cell cycle population in asynchronized and synchronized cells (G1, S, G2/M samples). *

      In results: 'Phosphorylation of PNKP at T118 ~~~ between Okazaki fragments'

      • In figure 4D, What happens in the ADP-ribose level, when T118D PNKP is expressed?

      *Thank you for your suggestion. This is interesting question. We will perform ADP-ribosylation assay in PNKP KO cells and PNKP KO cells expressing PNKP WT and T118D, and add data of ADP-ribose levels in those cells. *

      In results: 'PNKP is involved in postreplicative single-strand DNA gap-filling pathway'

      • The description regarding data presented in figure 6 is not clear enough. These data might suggest that wildtype U2OS does not have SSB which is a substrate for S1 nuclease (except under FEN1i and PARPi treatment), whereas PNKP KO has SSB during both IdU and CIdU incorporation, so that S1 nuclease treatment dramatically reduces the speed of fork formation in PNKP KO cells. Also In figure 6B/C/D, adding an experimental group of PNKP KO with S1 nuclease + PARPi might help to understand the role of PNKP during replication better. Also these additional data could support the description in discussion 'Furthermore, PNKP is required for the PARP1-dependent single-strand gap-filling pathway ~~~ DNA gap structure'.

      • *

      *We agree with reviewer's comment and suggestion. Since this point is also raised by reviewer 3, we will add the rationale of the experiment and more detailed description about the results, which would substantially improve this manuscript. We will also revise our representation in text followed by the comment. In addition to revising the text, we will add experiment groups of PNKP KO with S1 nuclease with/without PARPi as the reviewer suggested. *

      In results: 'Phosphorylation of PNKP at T118 is essential for genome stability'

      • In figure S8C, Did you measure g-H2AX foci disappearance for later time point, such as 24 hrs after DNA damage? Is not clear whether non-phosphorylated PNKP at T118 inhibit DNA damage repair or make it slower? How does T114A-PNKP behave in this experimental condition? T114 is well known target of ATM/DNAPK for DDR & DSB repair.

      Thank you for your suggestion. We agree with your point. It is very important to analyze whether T118A mutant shows delayed or total loss of DSB repair ability. We will add the measurement of pH2AX foci at 24 hrs after IR in PNKP KO cells expressing GFP, WT-PNKP, T118A-PNKP. Although the analysis of pS114 PNKP is previously reported (Segal-Raz et al., EMBO reports, 2011 and Zolner et al., Nucleic Acids Research, 2011), we will also perform pH2AX assay in PNKP KO cells expressing S114A-PNKP as a control.

      The result shown in figure S9 should be described in the result section, not in the discussion section.

      Thank you for your suggestion. This is a point also raised by Reviewer 3. Since we are going to re-consider the layout of the manuscript upon the planned revision (as reviewer 3 suggested), we will move these points to the appropriate result section from the discussion.

      **Referees cross-commenting**

      I could see a similar degree of positive tendency toward the manuscript. I agree with the comments and suggestions in additional experiments made by reviewers 2 and 3. Those suggestions will improve an impact of the manuscript in the DNA damage repair field.

      Reviewer #1 (Significance (Required)):

      Significance

      The authors discovered new phosphorylation site (T118) of PNKP which is an important DNA repair protein. This modification seems to play a role in maintenance of the lagging strand stability in S phase. This discovery is something positive in DNA repair field to expand the canonical and non-canonical functions of DNA repair factors.

      The data presented to support PNKP functions and T118 phosphorylation in S phase seem solid in general, yet it is not sure how much PNKP is critical in the Okazaki fragment maturation process which is known that several end processing enzymes (like FEN1, EXO1, DNA2 etc which leave clean DNA ends.) are involved.

      These finding might draw good attentions from researchers interested broadly in cell cycle, DNA damage repair, replication, and possibly new tumor treatment.

      My field and research interest: DNA damage response (including cell cycle arrest and programmed cell death), DNA damage repair (including BER, SSBR, DSBR)

      Thank you very much for your positive comment. As you mentioned, there are several other end processing enzymes that seem to be involved in Okazaki fragment maturation, however, none of those enzymes is reported as a protein involved in the gap-filling pathway as well. Therefore, the role(s) of PNKP in DNA replication are very outstanding as PNKP could be involved in two separate pathways, Okazaki fragment maturation and a back-up gap-filling repair process. As you suggested, we will add several experiments such as iPOND experiments using XRCC1-depleted cells, analysis of DNA repair ability of PNKP T118A mutant throughout cell cycle and S1 nuclease DNA fiber assays in PNKP KO cells with/without PARP inhibitor treatment, to reveal how much PNKP is critical in the Okazaki fragment maturation. We believe that performing those experiments makes the conclusion and this manuscript more solid and convincing.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Polynucleotide kinase phosphatase (PNPK) participates in multiple DNA repair processes, where it acts on DNA breaks to generate 5'-phosphate and 3'-OH ends, facilitating the downstream activities of DNA ligases or polymerases.

      This manuscript identifies a CDK-dependent phosphorylation site on threonine 118 in PNKP's linker region. The authors provide some convincing evidence that this modification is important to direct the activity of PNPK towards ssDNA gaps between Okazaki fragments during DNA replication. The authors monitored protein expression levels, enzymatic activity, the growth rate and replication fork speed, as well as the presence of ssDNA damage to make a comprehensive overview of the features of PNKP necessary for its function.

      Overall, the conclusions are sufficiently supported by the results and this manuscript is relevant and of general interest to the DNA repair and genome stability fields. Some level of revision to the experimental data and text would help strengthen its message and conclusions.

      Major points:

      In an iPOND experiment the authors detect the wt PNKP and the T118 phosphorylated form at the forks and conclude that this phosphorylation promotes interaction with nascent DNA (Figure 3E). An informative sample to include here would have been the T118A mutant. Based on the model proposed, the prediction would be that it would not be associated with the forks, or at least, associated at reduced levels compared to the wt. *Thank you for your suggestion. We agree with your comment. We will add the iPOND analysis in PNKP KO cells expressing T118A mutant to confirm that pT118 is important for recruitment of PNKP at nascent DNA. *

      The quality of the gels showing the phosphatase and kinase assays in Figure 5 could be improved to facilitate quantification of the results. The gel showing the phosphatase activity has a deformed band corresponding to K378A mutant. The gel showing the kinase activity seems to be hitting the detection limits, and the overall high background might influence the quantification of D171A mutant in the area of interest. The authors should provide a better quality of these gels, focusing on better separation (running them longer, eventually with a slightly increased electric current) and higher signal of the analyzed bands (longer incubation phosphatase/kinase prior to quenching or loading higher amount of DNA).

      We agree with your suggestion. This phosphatase and kinase assay could be improved. We will perform this assay again followed by reviewer's suggestions.

      The authors sometimes make statements like: "a slight increase, slightly increased, relatively high" without an evaluation of the statistical significance for the presented data. An example of such a statement is: "T118A mutant-expressing cells exhibited a marked delay in cell growth, which was not observed for S114A, although T122A, S126A, and S143A were slightly delayed," based on the figure 2E. A similar comment applies also to figures 4A, 5A, 5E. Whenever possible, the authors should include also an evaluation of the statistical significance in the statement.

      Thank you for your suggestion. We will check manuscript and revise representation as reviewer's suggestion.

      Minor revisions:

      I could not find a gH2AX blot for figure S2A.

      Thank you for your suggestion. We will add pH2AX blot data in fig S2A.

      The authors established two PNKP-/- clones and supported it with sequencing and several functional observations However, the C-terminal antibody appears to detect lower-intensity bands (Figure 1A). Can authors comment on those bands?

      Thank you for your comment. One possibility of this band is artificially recognized bands. To improve this problem, we will try electrophoresis for longer time to separate this band.

      Why the S1 nuclease data on DNA fibers do not show the same level of epistasis with the Fen1i, as do those on ADP-ribosylation?

      Because FEN1 dependent Okazaki fragment maturation and PARP1-XRCC1 dependent gap-filling pathway are different pathways, FEN1i and PARPi treatment resulted in an additive effect in S1 nuclease data in PNKP WT cells. To facilitate better understanding, we will add graphical scheme in figure 6 (a similar problem was raised by Reviewer 3 below) and revise the description of the result.

      **Referees cross-commenting**

      I agree with all the comments from the reviewers 1 and 3.

      Reviewer #2 (Significance (Required)):

      Significance:

      The manuscript identifies a CDK phosphorylation site in a relevant DNA repair protein. The experiments on this part are elegant and convincing. It seems that this phosphorylation is important during DNA replication and there is some supporting evidence in this point, although not as robust, meaning that it is not clear whether this phosphorylation is controlling specifically the recruitment to Okazaki fragments, or a general role in DNA repair. Maybe if they see a reduced recruitment of the T118A mutant to the forks (iPOND experiment) this would further increase the impact.

      This work will be relevant to the basic research, especially in the fields of DNA repair and DNA replication.

      My expertise: DNA replication, genome stability, telomere biology.

      Thank you very much for your positive comment. As you suggested, we will perform an iPOND assay using PNKP T118A mutant. In addition of the T118A iPOND assay, we will also analyze the DNA repair function of PNKP T118A mutant throughout cell cycle as reviewer 1 suggested. We believe that results of these experiments will pin down whether the phosphorylation of PNKP on T118 is controlling its recruitment to Okazaki fragments specifically or single-strand DNA gaps in general, and solidify the conclusion of the manuscript.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Tsukada and colleagues studied the role of PNKP phosphorylation in processing single-strand DNA gaps and its link to fork progression and processing of Okazaki fragments.

      They generated two PNKP KO human clonal cell lines and described defects in cell growth, accumulation in S-phase, and faster fork progression. With some elegant experiments, they complement the KO cell lines with deletion and point mutants for PNKP, identifying a critical phosphorylation site (T118) in the linker regions, which is important for cell growth and DNA replication.

      They show that phosphorylation of PNKP peaks in the mid-S phase. CDK1 and CDK2/ with Cyclin A2 are the two main CDK complexes responsible for this modification. With the IPOND experiment, the author shows that PNKP is recruited at nascent DNA during replication.

      They described increased parylation activity in PNKP KO cells, and by using HU and emetin, they concluded that this increased activity depends on replication and synthesis of Okazaki fragments.

      Interfering with Okazaki fragment maturation by FEN1 inhibition is epistatic with PNKP KO (and T118A) in influencing parylation activity in the S phase and fork progression. The authors try to understand by mutant complementation which of the two functions (Phosphatase vs Kinase) is important in processing OF, and they propose a primary role for the phosphatase activity of PNKP. They also show that T118 is important in controlling genome stability following different genotoxic stress. Finally, by coupling the measurement of fork progression with PARP/FEN1 inhibitors and S1 treatment, they propose a role of PNKP in the post-replicative repair of single-strand gaps due to unligated OF.

      Here are my major points:

      The authors use a poly ADP ribose deposition measurement to estimate SSB nick/gap formation. Even if PARP activity is strictly linked to SSB repair, ADP ribosylation does not directly estimate SSB/nick gap formation. In addition, in Figs S2A, B, and C, the authors use IR and PARG inhibition to measure poly-ADP ribosylation in WT and PNKP KO cells. IR produces both SSB and DSB. A better and cleaner experiment would be to directly measure SSB formation (with alkaline comet assay, for example) in combination with treatments that are known to mainly cause SSB (H2O2, or low doses of bleomycin). Thank you for your suggestion. The main purpose of this manuscript is to clarify the potential role of PNKP in DNA replication. Therefore, we generated PNKP KO human cells and figure S2 showed confirmation of function of established role of PNKP in SSBs and DSBs repair. In addition, previous our report published in EMBO Journal (Shimada et al., 2015), we showed SSBs and DSBs repair defect in PNKP KO MEF with comet assay (both alkaline and neutral) after IR and H2O2 treatment. In addition to those observations, we will also perform BrdU incorporation assay in PNKP WT and KO cells treated with H2O2. BrdU staining under an undenatured condition has now been commonly used and is a more direct method to detect ssDNA nick/gap formation. We believe that the importance of PNKP in SSB repair is sufficiently supported by all data such as previous comet assays in PNKP KO MEF cells and two SSB repair assays in human cells using ADP-ribose staining or BrdU incorporation, which will be provided in the revised manuscript.

      The manuscript would benefit from substantially restructuring the figures' order and panels. Before starting the T118 part, the authors could create several figures to explain the main consequences of the loss of PNKP. A figure could be focused on DSB-driven genome instability (fig1 + fig S8 and S9). Then, a figure for the single-strand break and link to the S-phase. For example, by using data from Figure 6 and showing only WT vs PNKP KO +- Nuclease S1 (without FEN1 or PARP inhibitors), the authors could easily convince the readers that loss of PNKP leads to the accumulation of single-strand gaps. Only in the second part of the manuscript could they introduce all the T118 parts. Thank you for your suggestion. The layout of this manuscript makes reviewers feeling confusing. After performing all planned experiments, we will carefully re-consider the total layout of the revised manuscript.

      I understand the use of a FEN1 inhibitor to link the PNKP KO phenotype to OF processing, but this drug does not either rescue or exacerbate any of the phenotypes described by the authors. It seems to have just an epistatic effect everywhere. So, what other conclusion can we have if not that PNKO has a similar effect to FEN1? I think that the presence of this inhibitor in many plots complicates the digestion of several figures a little bit. Maybe clustering the data in a different way (DMSO on one side FEN1i on the other) would help. Thank you for your suggestion. We agree that this data set is complicate. To facilitate better understanding, we will change organization of the data according to your suggestion and add graphical scheme in figure 6.

      In terms of the other conclusion we can have from those experiments, the other conclusion is that PNKP might plays two important roles in DNA replication: Okazaki fragment maturation, which seems an epistatic effect with FEN1, and PARP1-XRCC1 dependent single-strand gap filling pathway, which is required for repairing single-strand gaps between Okazaki fragments when Okazaki fragment maturation pathway does not work properly (e.g., loss of FEN1 or PNKP). In figure 6D, we show that a double treatment of FEN1i and PARPi in PNKP WT cells with S1 nuclease treatment shows extensive amount of digested DNA fibers, although a single treatment of either FEN1i or PARPi in PNKP WT cells with S1 nuclease treatment leads to only limited amount of digested DNA fibers, which indicates that two pathways regulated by FEN1 or PARP are coordinately required for preventing eruption of ssDNA gaps in DNA replication. On the other hand, PNKP KO cells with S1 nuclease treatment cause extensive amount of digested DNA fibers even without FEN1i and PARP1i treatments, also it is not further increased by FEN1i and PARPi treatment. Those results indicate that PNKP itself is involved in two pathways mentioned above. Therefore, loss of PNKP has a similar phenotype with loss of FEN1 in terms of Okazaki fragment maturation, but also there is an additional effect in repairing ssDNA nicks/gaps, which is created in FEN1 loss condition, but FEN1 seems not dealing with it.

      Fig S9 should be removed from the discussion. Additionally, the authors should consider whether they want to keep that piece of data in a manuscript that is already pretty dense. Why should we focus on additional linker residues and microirradiation data at the end of this manuscript? *Thank you for your suggestion. This is a point also raised by Reviewer 1. Since we are going to re-consider the layout of the manuscript upon the planned revision, we will move these points to the appropriate result section from the discussion. *

      I suggest using a free AI writing assistant. I think this manuscript would substantially benefit from one. As a non-native English speaker, I personally use one of them and find it extremely useful. Thank you for your suggestion. Our manuscript was revised by a native speaker from an English correction company. However, for revised manuscript, we will discuss with native speakers as well as use a free AI writing assistant to improve the quality of the manuscript.

      Minor points:

      In Figure S1A, the author refers to P-H2AX, but I do not see this marker in the western blot. Thank you for your suggestion. We will add pH2AX blot data in fig S2A.

      **Referees cross-commenting**

      I agree with all comments from reviewer 1 and 2.

      Reviewer #3 (Significance (Required)):

      This is an interesting paper with generally solid data and proper statistical analysis. The figures are pretty straightforward. Unfortunately, the manuscript is dry, and the reader needs help to follow the logical order and the rationale of the experiments proposed. This is also complicated by the enormous amount of data the authors have generated. The authors should improve their narrative, explaining better why they are performing the experiment and not simply referring to a previous citation. Reordering panels and figures would help in this regard. Overall, with some new experiments, tone-downs over strong claims and a better explanation of the rationale behind experiments the authors could create a fascinating paper.

      Thank you very much for your positive comment about the data/analysis and the logic behind the experiments provided in the manuscript. We agree with that a manner and a structure of the manuscript could be improved by reordering figures, cutting down some redundant experiments, adding better explanation of the rationale behind experiments, and toning-down some claims. With rewriting the manuscript as stated above and performing several additional experiments suggested by the reviewers, we believe that the revised manuscript will be more convincing and fascinating.

      1. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      • *

      • *

      Reviewer #1:

      Minor comments

      • Is there any difference (except for PARGi exposure time?!) between figure S2B/C and S2D/E? Both data show increased ADP ribose after IR. It seems redundancy. Also it is hard to imagine that there is absolutely no sign of ADP ribose after IR w/o PARGi treatment (figure S2D).

      Figure S2B/C show spontaneous single strand DNA breaks (SSBs) in PNKP KO cells, on the other hand, figure 2S/E show ectopic SSBs induced by IR exposure in PNKP KO cells. We believe these data help for readers to understand the effect of endo or exo damage in PNKP KO cells. Poly-ADP ribosylations are immediately removed from SSB sites after repair as demonstrated previously (Tsukada, et al., PLoS One 2019, Kalasova et al., Nucleic Acids Research, 2020), although not zero (low level), it is very difficult to detect without PARGi treatment.

      • *

      Legend for figure S3 - typo!

      Thank you for your suggestion about typo. The legend for figure S3 is corrected as "Protein expression of PNKP mutants in U2OS cells".

      • *

      • In figure S3A/B, it is quite interesting that the PNKP antibody used for this analysis can detect all truncated and alanine substituted PNKP proteins. It might be helpful to indicate for other researchers which antibody used (Novus; epitope - 57aa to 189 aa or Abcam; epitope not revealed).

      In S3A/B, Novus PNKP antibody was used for all blots. We indicated this in the figure legend as "PNKP antibody (Novus: NBP1-87257) was used for comparing expression levels of endogenous and exogenous PNKP".

      • *

      In results: 'PNKP phosphorylation, especially of T118 ~~~ proliferation'

      • In the fork progression experiment (figure 2C), is there any statistical difference between D2 and D3/4 expressing cells?

      *Thank you for your suggestion. We performed statistical analysis as the reviewer suggested. Statistical analysis shows that there are no significant differences between D2 and D3/D4. Meanwhile, there are significant differences between WT and D3(P- What is the basis of the description 'Since the linker region of PNKP is considered to be involved in fork progression'? Any reference?

      This sentence was considered based on the above sentences "Furthermore, D2 mutant-expressing cells also showed an increased speed of the replication fork compared to WT and D1 mutant-expressing cells, although D3 and D4 showed mildly high-speed fork progression.". The D2 mutant lacks a whole linker region, which shows increased speed of DNA fiber in figure 2C. Therefore, we originally explained as the sentence above. We have revised the sentence to "Since these results may indicate the linker region of PNKP is involved in proper fork progression".

      • *

      • In figure 3B: pS114-PNKP (also pS15-p53) is DNA damage inducible. In this experiment, was DNA damage introduced? Roscovitine could hinder DNA repair process, but not inducing DNA damage itself.

      Thank you for your suggestion. DNA damage induction was not applied in this experiment. We agree that this panel makes confusing. We think that endogenously S114-PNKP (also S15-p53) might be phosphorylated slightly but not significant, although this is not the scope of this manuscript. This result showing that phosphorylated-T118 is reduced by Roscovitine treatment maybe redundant as we also have a result of in vitro phosphorylation assay using several combinations of CDKs and Cyclin proteins, which is a cleaner experiment to prove which CDK/Cyclin complex is directly controlling the T118 phosphorylation. Since the manuscript already contains enough amount of data to support the conclusion (as reviewer 3 also stated), we removed those blots result from the panel to avoid complicating the conclusion.

      • *

      In results: 'Phosphatase activity of PNKP is ~~~ of Okazaki fragments'

      • In figure 5C, any statistical analysis between WT-PNKP KO vs D171A-PNKP KO or K378A-PNKP KO has been done?

      Thank you for your comment. Statistical analysis shows P *

      In discussion, 'In contrast, the T118A mutants showed the absence of both SSBs and DSBs repair (Fig. S7) : figure S7 does not indicate what the authors describe.

      Thank you for pointing out this. This should refer to figure S8 instead of figure S7. We have corrected this error.

      In addition, the same sentence in discussion: No evidence demonstrate that 'the absence of both SSBs and DSBs repair', and the following sentence is not clear.

      *This is same point with above. We have corrected this mis-referencing and revised the sentence to "In contrast, the T118A mutants showed the impaired abilities of both SSBs and DSBs repair (Fig. S8).". We also revised the following sentence to "However, residual SSBs due to impaired SSB repair ability (e.g., in PARPi-treated cells and T118A cells) sometimes cause DNA replication-coupled DSBs formation in S phase, and the phenotype in DSB repair assay of the T118A mutant may be caused by an accumulated formation of DNA replication-coupled DSBs. Future works will be needed to distinguish whether the T118 phosphorylation directly regulate PNKP recruitment to DSBs as well as SSBs." for better explanation of the result. *

      • *

      In discussion, 'Because both CDK1/cyclin A2 and CDK2/cyclin A2 are involved in PNKP phosphorylation, cyclin A2 is likely important for these activities': It is not clear what this description intends? Is 'cyclin A2' important in what stance?

      This description is coming from Fig3C observation. Since both CDK1 and CDK2 activities are cyclin A2 dependent, we speculated cyclin A2 is important for CDK1/CDK2 dependent PNKP T118 phosphorylation. We revised the description to "Since both CDK1/Cyclin A2 and CDK2/Cyclin A2 phosphorylate T118 of PNKP, we speculated that PNKP T118 is phosphorylated in S phase to G2 phase in CDK1/Cyclin A2- and CDK2/Cyclin A2-dependent manner (Fig. 3B and C)".

      • *

      In discussion, 'This may be explained by the fact that mutations in the phosphorylated residue in the linker region are embryonic lethal': any reference to support this embryonic lethality?

      Thank you for your suggestion. We agree with that this sentence is overwriting. We revise the sentence to "This observation may indicate that mutations in the phosphorylated residue (T118) in the linker region are potentially embryonic lethal due to the importance of T118 in DNA replication, which is revealed in the present study.".

      • *

      • *

      Reviewer #2:

      Minor comments

      Sometimes there are incorrect references to the figures in the discussion (e.g. FigS9A, B, and C, are called out instead of E, F and G), a similar issue is found 4 lines below in the same page.

      Thank you for pointing out these errors. We checked the references in the discussion and corrected to the appropriate references.

      Based on the data in Figure 3A the authors suggest that pT118-PNKP follows Cyclin A2 levels, but this does not appear very clearly in the gel, especially for the last point. Even though the results are convincing, the authors should rephrase the conclusions of Figure 3A to reflect better the results.

      Thank you for your suggestion. We agree that this phrase is overwriting. We revised the conclusion to "pT118-PNKP was detected in asynchronized cells but increased particularly in the S phase, similar to Cyclin A2 expression levels, although the reduction of pT118, possibly dephosphorylation of T118, seems not as robust as the reduction of the Cyclin A2 expression level at the 12 hours time point. However, this effect was very weak during mitosis, suggesting that T118 phosphorylation plays a specific role in the S phase.".

      I did not find a reference to what seems to be a relevant work in this topic: PMID: 22171004

      Thank you for your suggestion. We have added the ref (Coquelle et al., PNAS, 2011) in Introduction section.


      Reviewer #3:

      Major comments

      The authors should consider and discuss the potential role of PNKP KO outside of the S-phase. In Figure 4C, while it is clear that poly ADP ribosylation is higher in S-phase, the effects of PNKP KO and complementation by WT or T118A are equally present. This would be more immediate if comparison, fold change, and statistical significance calculation were done within the same cell cycle phase instead of between cell stages. This is also clear by IF in Figure 4B. How do the authors explain this? Thank you for your suggestion. We agree with reviewer's suggestion. We compared intensities of ADP-ribose between cell lines in same cell cycle rather than between different cell cycles in a same cell line and added the respective statistics in figure 4C. Also, we agree with that poly ADP-ribose intensity is changed outside of S phase between WT and T118A PNKP expressing PNKP KO cells. As shown in figure S8, PNKP pT118 is also involved in DNA repair. These results might reflect of PNKP function outside of S phase. We have added the sentence "Of note, PNKP/*cells and PNKP T118A cells showed markedly higher ADP-ribose intensity in outside the S phase as well, which indicate that PNKP and T118 may have an endogenous role to prevent SSBs formation in outside the S phase. Since FEN1 has been reported to function in R-loop processing, PNKP could also be involved in this process. Future studies of a role of PNKP in different cell cycle will be able to address this question." to discuss about the function of PNKP outside the S phase. We have added the ref (Cristini et al., Cell Reports, 2019, and Laverde et al., Genes, 2022). *

      • *

      • *

      In connection with the previous point, can the author provide the same quantification in Figure 4E also for G2/M and not only the S phase? This should give an estimate of the activity of FEN1 outside the S-phase. This is important because FEN1 has other functions apart from OF maturation, such as R loop processing (Cristini 2019; Laverde 2023) Thank you for your suggestion. Here attached is the data of ADP-ribose intensity in cells outside the S phase as you suggested. FEN1i treatment still induces increased ADP-ribose intensity in outside the S phase as well, although the difference between with/without FEN1i treatment is much smaller than that in S phase, indicating that FEN1 has other functions outside the S phase. This finding is very interesting. However, the function of FEN1 in outside the S phase is outside the scope of this manuscript. Therefore, we would like to not put this data in the manuscript to avoid complicating the conclusion (as reviewer 3 also suggested).

      • *

      Why does FEN1 inhibition induce a faster fork progression in Fig4 but not in Fig5 and Fig6? Yes, it does in figure 4 and figure 5. In PNKP WT cells, FEN1i-treated fibers (CldU) show an increased speed of forks compared to non-treated fibers (IdU). However, loss of PNKP and T118 phosphorylation themselves cause a faster fork progression even if without FEN1i treatment, therefore the difference of speeds of forks before/after FEN1i treatment in PNKP KO and T118A cells is disappeared as both fibers grow faster than intact fibers in normal cells. In regard to figure 6, as you mentioned in a latter comment about figure 6, the title of vertical axis of the graph showing CldU length should not be speeds of replication forks as those DNA fibers are potentially digested by S1 nuclease, which is modified in the revised manuscript. Even so, DNA fibers from FEN1i-treated cells (CldU) with S1 nuclease shows similar length with fibers from untreated cells with S1 nuclease, whereas FEN1 inhibitor treatment accelerates a speed of forks in general (figure 4 and figure 5, assays without S1 nuclease), indicating that FEN1i treatment induces remaining of some ssDNA nicks/gaps which are substrates of S1 nuclease.

      • *

      How do the authors explain the impaired DNA gap binding activity of the phospho-mimetic T118D? Thank you for your suggestion. We think that the appropriate timing of phosphorylation of PNKP T118 is important, while the phosphor-mimetic mutant T118D mimics consecutively phosphorylated situation that may result in incomplete complementation of PNKP function.

      • *

      I would like to see a representative fiber image from Fig 6. Additionally, in Figure 6, the author should not label the y-axis as CldU-fork speed. Nuclease S1 treatment destroys single-strand gaps (in vitro) and does not affect the fork speed (in vivo) Thank you for your suggestion. We have added a representative fiber image. We also agree with that CldU fork speed is not a right label of y-axis as CldU fibers are potentially digested by S1 nuclease. We changed the y-axis label to "CldU tract length [kb/min]" in figure 6.

      • *

      Figure 5E: both mutants (kinase vs phosphatase) increase polyADP ribose intensity, while the title of this figure only emphasizes the phosphatase activity. We agree with your comment. We have changed this subtitle to "Enzymatic activities of PNKP is important for the end-processing of Okazaki fragments".

      • *

      • *

      Minor comments

      • *

      The authors refer to Hoch Nature 2017 when referring to polyADP ribose IF + PARG inhibition. Should they not refer to Hanzlikova Mol Cell 2018?

      Thank you for your suggestion. We have added the ref (Hanzlikova et al., Mol Cell 2018).

      Statistical analysis should be performed on the cell cycle profile in Figure 1B * *

      We performed statistical analysis to check whether there are significant differences of S phase population between WT and PNKP KO cells. There were significant differences between WT vs PNKP KO C1 (PThe authors should not refer to fork degradation or protection as a given fact without assessing it in these conditions. Thank you for your suggestion. We assume that this comment refers to the result section of figure 1 and figure 4. We have added a sentence "although future studies will be needed to investigate whether PNKP/ cells has the fork protection phenotype" in the result section of figure 1. We have changed representation in the section according to the reviewer's suggestion in the result section of figure 4.*

      • *

      • *

    1. Reviewer #1 (Public Review):

      Summary:

      The authors want to explore how much two known minibinder protein domains against the Spike protein of SARS-CoV-2 can function as a binding domain of 2 sets of synthetic receptors (SNIPR and CAR); the authors also want to know how some modifications of the linkers of these new receptors affect their activation profile.

      Major strengths and weaknesses of the methods and results:

      - Strengths include: analysis of synthetic receptor function for 2 classes of synthetic receptors, with robust and appropriate assays for both kinds of receptors. The modifications of the linkers are also interesting and the types of modifications that are often used in the field.

      - Weaknesses include: none of the data analysis provides statistical interpretation of the results (that I could find). One dataset is confusing: Figures 5A and C, are said to be the same assay with the same constructs, but the results are 30% in A, and 70% in C.

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      Given the open-ended nature of the goal (implicit in it being an exploration), it is hard to say if the authors have reached their aims; they have done an exploration for sure; is it big enough an exploration? This reviewer is not sure.

      The results are extremely clearly presented, both in the figures and in the text, both for the methods and the results. The claims put forward (with limited exceptions see below) are very solidly supported by the presented data.

      A discussion of the likely impact of the work on the field, and the utility of the methods and data to the community:

      The work may stimulate others to consider minibinders as potential binding domains for synthetic receptors. The modifications that are presented although not novel, do provide a starting point for larger-scale analysis.

      It is not clear how much this is generalizable to other binders (the authors don't make such claims though). The claims are very focused on the tested modifications, and the 2 receptors and minibinder used, a scope that I would define as narrow; the take-home message if one wants to try it with other minibinders or other receptors seems to be: test a few things, and your results may surprise you.

      Any additional context you think would help readers interpret or understand the significance of the work:

      We are at the infancy stage of synthetic receptors optimization and next-generation derivation; there is a dearth of systematic studies, as most focus is on developing a few ones that work. This work is an interesting attempt to catalyze more research with these new minibinders. Will it be picked up based on this? Not sure.

    1. Social Media platforms use the data they collect on users and infer about users to increase their power and increase their profits.

      I agree with this statement and feel deeply concerned. When we use most social media platforms, they usually select user preference content for us when registering an account in order to push content to users that they are more interested in. And this is actually a way to obtain information and find ways to attract the user's attention. Moreover, as we use the software, we are also using different algorithms to infer how our interests have changed. At the same time, we may also cooperate with shopping software to directly push the items of interest we just mentioned in the video or forum social media so that we can purchase them. I think it’s a bit scary how much big data knows about us.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      In this manuscript, Huang and colleagues explored the role of iron in bacterial therapy for cancer. Using proteomics, they revealed the upregulation of bacterial genes that uptake iron, and reasoned that such regulation is an adaptation to the iron-deficient tumor microenvironment. Logically, they engineered E. Coli strains with enhanced iron-uptake efficiency, and showed that these strains, together with iron scavengers, suppress tumor growth in a mouse model. Lastly, they reported the tumor suppression by IroA-E. Coli provides immunological memory via CD8+ T cells. In general, I find the findings in the manuscript novel and the evidence convincing.

      (1) Although the genetic and proteomic data are convincing, would it be possible to directly quantify the iron concentration in (1) E. Coli in different growth environments, and (2) tumor microenvironment? This will provide the functional consequences of upregulating genes that import iron into the bacteria.

      We appreciate the reviewer’s comment regarding the precise quantification of iron concentrations. In our study, we attempted various experimental approaches, including Immunohistochemistry utilizing an a Fe3+ probe, iron assay kit (ab83366), and Inductively Coupled Plasma Mass Spectrometry (ICP-MS). Despite these attempts, the quantification of oxidized Fe3+ concentrations proved challenging due to the inherently low levels of Fe ions and difficulty to distinguish Fe2+ and Fe3+. We observed measurements below the detection threshold of even the sensitive ICP-MS technique. To circumvent this limitation, we designed an experiment wherein bacteria were cultured in a medium supplemented with Chrome Azurol S (CAS) reagent, which colormetrically detects siderophore activity. We compared WT bacteria and IroA-expressing bacteria at varying levels of Lcn2 proteins. The outcome, as depicted in the updated Fig. 3b, reveals an enhanced iron acquisition capability in IroA-E. coli under the presence of Lcn2 proteins, in comparison to the wild-type E. coli strains. In addition to the Lcn2 study, the proteomic study in Figure 4 highlights the competitive landscape between cancer cells and bacteria. We observed that IroA-E. coli showed reduced stress responses and exerted elevated iron-associated stress to cancer cells, thus further supporting the IroA-E. coli’s iron-scavenging capability against nutritional immunity.

      (2) Related to 1, the experiment to study the synergistic effect of CDG and VLX600 (lines 139-175) is very nice and promising, but one flaw here is a lack of the measurement of iron concentration. Therefore, a possible explanation could be that CDG acts in another manner, unrelated to iron uptake, that synergizes with VLX600's function to deplete iron from cancer cells. Here, a direct measurement of iron concentration will show the effect of CDG on iron uptake, thus complementing the missing link.

      We appreciate the reviewer’s comment and would like to point the reviewer to our results in Figure S3, which shows that the expression of CDG enhances bacteria survival in the presence of LCN2 proteins, which reflects the competitive relationship between CDG and enterobactin for LCN2 proteins as previously shown by Li et al. [Nat Commun 6:8330, 2015]. We regret to inform the reviewer that direct measurement of iron concentration was attempted to no avail due to the limited sensitivity of iron detecting assays. We do acknowledge that CDG may exert different effects in addition to enhancing iron uptake, particularly the potentiation of the STING pathway. We pointed out such effect in Fig 2c that shows enhanced macrophage stimulation by the CDG-expressing bacteria. We would like to accentuate, however, that a primary objective of the experiment is to show that the manipulation of nutritional immunity for promoting anticancer bacterial therapy can be achieved by combining bacteria with iron chelator VLX600. The multifaceted effects of CDG prompted us to focus on IroA-E. coli in subsequent experiments to examine the role of nutritional immunity on bacterial therapy. We have updated the associated text to better convey our experimental design principle.

      Lines 250-268: Although statistically significant, I would recommend the authors characterize the CD8+ T cells a little more, as the mechanism now seems quite elusive. What signals or memories do CD8+ T cells acquire after IroA-E. Coli treatment to confer their long-term immunogenicity?

      We apologize for the overinterpretation of the immune memory response in our previous manuscript and appreciate the reviewer’s recommendation to further characterize CD8+ T cells post-IroA-E. coli treatment. Our findings, which show robust tumor inhibition in rechallenge studies, indicate establishment of anticancer adaptive immune responses. As the scope of the present work is aimed at demonstrating the value of engineered bacteria for overcoming nutritional immunity, expounding on the memory phenotypes of the resulting cellular immunity is beyond the scope of the study. We do acknowledge that our initial writing overextended our claims and have revised the manuscript accordingly. The revised manuscript highlights induction of anticancer adaptive immunity, attributable to CD8+ T cells, following the bacterial therapy.

      (3) Perhaps this goes beyond the scope of the current manuscript, but how broadly applicable is the observed iron-transport phenomenon in other tumor models? I would recommend the authors to either experimentally test it in another model or at least discuss this question.

      We highly appreciate the reviewer’s suggestion regarding the generalizability of the iron-transport phenomenon in diverse tumor models. To address this, we extended our investigations beyond the initial model, employing B16-F10 melanoma and E0771 breast cancer in mouse subcutaneous models. The results, as depicted in Figures 3g to 3j and Figure S5, demonstrate the superiority of IroA-E. coli over WT bacteria in tumor inhibition. These findings support the broad implication of nutritional immunity as well as the potential of iron-scavenging bacteria for different solid tumor treatments.

      Reviewer #2 (Public Review):

      Summary:

      The authors provide strong evidence that bacteria, such as E. coli, compete with tumor cells for iron resources and consequently reduce tumor growth. When sequestration between LCN2 and bacterobactin is blocked by upregulating CDG(DGC-E. coli) or salmochelin(IroA-E.coli), E. coli increase iron uptake from the tumor microenvironment (TME) and restrict iron availability for tumor cells. Long-term remission in IroA-E.coli treated mice is associated with enhanced CD8+ T cell activity. Additionally, systemic delivery of IroA-E.coli shows a synergistic effect with chemotherapy reagent oxaliplatin to reduce tumor growth.

      Strengths:

      It is important to identify the iron-related crosstalk between E. coli and TME. Blocking lcn2-bacterobactin sequestration by different strategies consistently reduces tumor growth.

      Weaknesses:

      As engineered E.coli upregulate their function to uptake iron, they may increase the likelihood of escaping from nutritional immunity (LCN2 becomes insensitive to sequester iron from the bacteria). Would this raise the chance of developing sepsis? Do authors think that it is safe to administrate these engineered bacteria in mice or humans?

      We appreciate the reviewer’s comment on the safety evaluation of the iron-scavenging bacteria. To address the concern, we assessed the potential risk of sepsis development by measuring the bacterial burden and performing whole blood cell analyses following intravenous injection of the engineered bacteria. As illustrated in Figures 3k and 3l, our findings indicate that the administration of these engineered bacteria does not elevate the risk of sepsis. The blood cell analysis suggests that mice treated with the bacteria eventually return to baseline levels comparable to untreated mice, supporting the safety of this approach in our experimental models.

      Reviewer #3 (Public Review):

      Summary:

      Based on their observation that tumor has an iron-deficient microenvironment, and the assumption that nutritional immunity is important in bacteria-mediated tumor modulation, the authors postulate that manipulation of iron homeostasis can affect tumor growth. They show that iron chelation and engineered DGC-E. coli have synergistic effects on tumor growth suppression. Using engineered IroA-E. coli that presumably have more resistance to LCN2, they show improved tumor suppression and survival rate. They also conclude that the IroA-E. coli treated mice develop immunological memory, as they are resistant to repeat tumor injections, and these effects are mediated by CD8+ T cells. Finally, they show synergistic effects of IroA-E. coli and oxaliplatin in tumor suppression, which may have important clinical implications.

      Strengths:

      This paper uses straightforward in vitro and in vivo techniques to examine a specific and important question of nutritional immunity in bacteria-mediated tumor therapy. They are successful in showing that manipulation of iron regulation during nutritional immunity does affect the virulence of the bacteria, and in turn the tumor. These findings open future avenues of investigation, including the use of different bacteria, different delivery systems for therapeutics, and different tumor types.

      Weaknesses:

      • There is no discussion of the cancer type and why this cancer type was chosen. Colon cancer is not one of the more prominently studied cancer types for LCN2 activity. While this is a proof-of-concept paper, there should be some recognition of the potential different effects on different tumor types. For example, this model is dependent on significant LCN production, and different tumors have variable levels of LCN expression. Would the response of the tumor depend on the role of iron in that cancer type? For example, breast cancer aggressiveness has been shown to be influenced by FPN levels and labile iron pools.

      We highly appreciate the reviewer’s insightful comment on the varying LCN2 activities across different tumor types. In light of the reviewer’s suggestion, we extended our investigations beyond the initial colon cancer model, employing B16-F10 melanoma and E0771 breast cancer in mouse subcutaneous models. The results, as depicted in Figures 3g to 3j and Figure S5, demonstrate that IroA-E. coli consistently outperforms WT bacteria in tumor inhibition. We acknowledge the reviewer’s comment regarding LCN2 being more prominently examined in breast cancer and have highlighted this aspect in the revised manuscript. For colon and melanoma cancers, several reports have pointed out the correlation of LCN2 expression and the aggressiveness of these cancers [Int J Cancer. 2021 Oct 1;149(7):1495-1511][Nat Cancer. 2023 Mar;4(3):401-418], albeit to a lesser extent. These findings support the broad implication of nutritional immunity as well as the potential of iron-scavenging bacteria for different solid tumor treatments. The manuscript has been revised to reflect the reviewer’s insightful comment.

      • Are the effects on tumor suppression assumed to be from E. coli virulence, i.e. Does the higher number of bacteria result in increased immune-mediated tumor suppression? Or are the effects partially from iron status in the tumor cells and the TME?

      We appreciate the reviewer’s question regarding the therapeutic mechanism of IroA-E. coli. Bacterial therapy exerts its anticancer action through several different mechanisms, including bacterial virulence, nutrient and ecological competition, and immune stimulation. Decoupling one mechanism from another would be technically challenging and beyond the scope of the present work. With the objective of demonstrating that an iron-scavenging bacteria can elevate anticancer activity by circumventing nutritional immunity, we highlight our data in Fig. S6, which shows that IroA-E. coli administration resulted in higher bacterial colonization within solid tumors compared to WT-E. coli on Day 15. This increased bacterial presence supports our iron-scavenging bacteria design, and we highlight a few anticancer mechanisms mediated by the engineered bacteria. Firstly, as shown in Fig. 4d, IroA-E. coli is shown to induce an elevated iron stress response in tumor cells as the treated tumor cells show increased expression of transferrin receptors. Secondly, our experiments involving CD8+ T cell depletion indicates that the IroA-E. coli establishes a more robust anticancer CD8+ T cell response than WT bacteria. Both immune-mediated responses and alterations in iron status within the tumor microenvironment are demonstrated to contribute to the enhanced anticancer activity of IroA-E. coli in the present study.

      • If the effects are iron-related, could the authors provide some quantification of iron status in tumor cells and/or the TME? Could the proteomic data be queried for this data?

      We appreciate the reviewer’s query regarding the quantification of iron concentrations. In our study, we attempted various experimental approaches, including Immunohistochemistry utilizing an a Fe3+ probe, iron assay kit (ab83366), and Inductively Coupled Plasma Mass Spectrometry (ICP-MS). Despite these attempts, the quantification of oxidized Fe3+ concentrations proved challenging due to the inherently low levels of Fe ions and difficulty to distinguish Fe2+ and Fe3+. We observed measurements below the detection threshold of even the sensitive ICP-MS technique. Consequently, to circumvent this limitation, we designed an experiment wherein bacteria were cultured in a medium supplemented with Chrome Azurol S (CAS) reagent, which colormetrically detects siderophore activity. We compared WT bacteria and IroA-expressing bacteria at varying levels of Lcn2 proteins. The outcome, as depicted in the updated Fig. 3b, reveals an enhanced iron acquisition capability in IroA-E. coli under the presence of Lcn2 proteins, in comparison to the wild-type E. coli strains. In addition to the Lcn2 study, the proteomic study in Figure 4 highlights the competitive landscape between cancer cells and bacteria. We observed that IroA-E. coli showed reduced stress responses and exerted elevated iron-associated stress to cancer cells, thus further supporting the IroA-E. coli’s iron-scavenging capability against nutritional immunity.

      Reviewing Editor:

      The authors provide compelling technically sound evidence that bacteria, such as E. coli, can be engineered to sequester iron to potentially compete with tumor cells for iron resources and consequently reduce tumor growth. Long-term remission in IroA-E.coli treated mice is associated with enhanced CD8+ T cell activity and a synergistic effect with chemotherapy reagent oxaliplatin is observed to reduce tumor growth. The following additional assessments are needed to fully evaluate the current work for completeness; please see individual reviews for further details.

      We appreciate the editor’s positive comment.

      (1) The premise is one of translation yet the authors have not demonstrated that manipulating bacteria to sequester iron does not provide a potential for sepsis or other evidence that this does not increase the competitiveness of bacteria relative to the host. Only tumor volume was provided rather than animal survival and cause of death, but bacterial virulence is enhanced including the possibility of septic demise. Alternatively, postulated by the authors, that tumor volume is decreased due to iron sequestration but they do not directly quantify the iron concentration in (1) E. Coli in different growth environments, and (2) tumor microenvironment. These important endpoints will provide the functional consequences of upregulating genes that import iron into the bacteria.

      We appreciate the editor’s comment and have added substantial data to support the translational potential of the iron-scavenging bacteria. In particular, we added evidence that the iron-scavenging bacteria does not increase the risk of sepsis (Fig. 3k, l), evidence of increased bacteria competitiveness and survival in tumor (Fig. S6), and iron-scavenging bacteria’s superior anticancer ability and survival benefit across 3 different tumor models (Fig. 3e-j; Fig. S5). While direct measurement of iron concentration in the tumor environment is technically difficult due to the challenge in differentiating Fe2+ and Fe3+ by available techniques, we added a colormetric CAS assay to demonstrate the iron-scavenging bacteria can more effectively utility Fe than WT bacteria in the presence of LCN2 (Fig. 3b). These results substantiate the translational relevance of the engineered bacteria.

      (2) There is no discussion of the cancer type and why this cancer type was chosen. If the current tumor modulation system is dependent on LCN2 activity, there would need to be some recognition that different tumors have variable levels of LCN expression. Would the response of the tumor depend on the role of iron in that cancer type?

      We appreciate the comment and added relevant text and citations describing clinical relevance of LCN2 expression associated with the tumor types used in the study (breast cancer, melanoma, and colon cancer). Elevated LCN2 has been associated with higher aggressiveness for all three cancer types.

      (3) To demonstrate long-term anti-cancer memory was established through enhancement of CD8+ T cell activity (Fig 5c), the "2nd seeding tumor cells" experiment may need to be done in CD8 antibody-treated IronA mice since CD8+ T cells may play a role in tumor suppression regardless of whether or not iron regulation is being manipulated. It appears that the control group for this experiment is naive mice (and not WT-E. coli treated mice), in which case the immunologic memory could be from having had tumor/E. coli rather than the effect of IroA-E. coli.

      We acknowledge that our prior writing may have overstated our claim on immunological memory. Our intention is to show that upon treatment and tumor eradication by iron-scavenging bacteria, adaptive immunity mediated by CD8 T cells can be elicited. We also did not consider a WT-E. coli control as no WT-E. coli treated group achieved complete tumor regression. We have modified our text to reflect our intended message.

      Reviewer #1 (Recommendations For The Authors):

      All the figures seem to be in low resolution and pixelated. Please upload high-resolution ones.

      We have updated figures to high-resolution ones.

      Reviewer #2 (Recommendations For The Authors):

      Some specific comments towards experiments:

      (1) For Fig 2 f/ Fig 3f/ Fig 5d/Fig6c, the survival rate is based on the tumor volume (the mouse was considered dead when the tumor volume exceeded 1,500 mm3). Did the mice die from the experiment (how many from each group)? If it only reflects the tumor size, do these figures deliver the same information as the tumor growth figure?

      We appreciate the reviewer’s comment. The survival rate is indeed based on tumor volume, and we used a cutoff of 1500 mm3. No death event was observed prior to the tumors reaching 1500 mm3. Although the survival figures cover some of the information conveyed by the tumor volume tracking, the figures offer additional temporal resolution of tumor progression with the survival figures. Having both tumor volume and survival tracking are commonly adopted to depict tumor progression. We have the protocol regarding survival monitoring to the materials and method section.

      (2) Fig 3a, not sure if entE is a good negative control for this experiment. Neg. Ctrl should maintain its CFU/ml at a certain level regardless of Lcn2 conc. However, entE conc. is at 100 CUF/ml throughout the experiment suggesting there is no entE in media or if it is supersensitive to Lcn2 that bacteria die at the dose of 0.1nM?

      We appreciate the reviewer’s comment. The △entE-E. coli was indeed observed to be highly sensitive to LCN2. We included the control to highlight the competitive relationship between entE and LCN2 for iron chelation, which is previously reported in literature [Biometals 32, 453–467 (2019)].

      (3) Fig 4, the authors harvested bacteria from the tumor by centrifuging homogenized samples at different speeds. Internal controls confirming sample purity (positive for bacteria and negative for cells for panels a,b,c; or vice versa for panel d) may be necessary. This comment may also apply to samples from Fig 1.

      We acknowledge the reviewer’s concern and would like to point out that the proteomic analysis was performed using a highly cited protocol that provides reference and normalization standards for E. coli proteins [Mol Cell Proteomics. 2014 Sep; 13(9): 2513–2526]. The reference is cited in the Materials and Method section associated with the proteomic analysis.

      (4) To demonstrate long-term anti-caner memory was established through enhancement of CD8+ T cell activity, the "2nd seeding tumor cells" experiment may need to be done in CD8 antibody-treated IronA mice.

      We have modified our claims to highlight that the tumor eradication by iron scavenging bacteria can establish adaptive anticancer immunity through the elicitation of CD8 T cells. We apologize for overstating our claim in the previous manuscript draft.

      Minor suggestions:

      (1) Please include the tumor re-challenge experiment in the method section.

      The re-challenge experiment has been added to the method section as instructed.

      (2) Please cite others' and your previous work. E.g. line 281, 282, line 306-307.

      We have added the citations as instructed.

      (3) Line 448, BL21 is bacteria, not cells.

      We have made the correction accordingly.

      Reviewer #3 (Recommendations For The Authors):

      • The authors postulate that IroA-E. coli is more potent than DGC-E. coli in resisting LCN2 activity, and that this potency is the cause of the increased tumor suppression of this engineered strain. If so, Fig 3a should include DGC-E. coli for direct comparison.

      We appreciate the reviewer for the comment and would like to clarify that we intended construct IroA-E. coli as a more specific iron-scavenging strategy, which can aide the discussion of nutritional immunity and minimize compounding factors from the immune-stimulatory effect of CDG. We have modified our text to clarify our stance.

      • The data refers to the effects of WT bacteria-mediated tumor suppression, e.g. Figure 3e shows that even WT bacteria have a significant suppressive effect on tumor growth. Could the authors provide background on what is known about the mechanism of this tumor suppression, outside of tumor targeting and engineerability? They only reference "immune system stimulation."

      We appreciate the reviewer’s comment and would like to refer the reviewer to our recently published article [Lim et al., EMBO Molecular Medicine 2024; DOI: 10.1038/s44321-023-00022-w], which shows that in addition to immune system stimulation, WT bacteria can also be perceived as an invading species in the tumor that can exert differential selective pressure against cancer cells. Competition for nutrient is highlighted as a major contribution to contain tumor growth. In fact, the nutrient competition that we observed in the prior article inspired the design of the iron scavenging bacteria towards overcoming nutritional immunity. We have cited this recently published article to the revised manuscript to enrich the background.

      • The authors claim that there is immunologic memory because of tumor resistance in re-challenged mice after IroA-E. coli treatment (Fig 5c). It appears that the control group for this experiment is naive mice (and not WT-E. coli treated mice), in which case the immunologic memory could be from having had tumor/E. coli rather than the effect of IroA-E. coli.

      We have modified our claims to highlight that the tumor eradication by iron scavenging bacteria can establish adaptive anticancer immunity through the elicitation of CD8 T cells. We did not intend to highlight that the adaptive immunity stemmed from IroA-E. coli only, and we intend to build upon current literature that has reported CD8+ T cell elicitation by bacterial therapy. The IroA-E.coli is shown to enhance adaptive immunity. We also did not consider a WT-E. coli control as no WT-E. coli treated group achieved complete tumor regression.

      • The authors claim that CD8+ T cells are mechanistically important in the effects of iron status manipulation in E. coli-mediated tumor suppression (Fig 5). In order to show this, it seems that Fig 5c should include WT-E. coli and WT-E. coli+CD8 ab groups, as it may be that CD8+ T cells play a role in tumor suppression regardless of whether or not iron regulation is being manipulated.

      We apologize for the confusion from our prior writing. We have modified our claims to highlight that the tumor eradication by iron scavenging bacteria can establish adaptive anticancer immunity through the elicitation of CD8 T cells. We did not intend to convey that CD8+ T cells are mechanistically important in the effects of iron status manipulation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      The reviewers praised multiple aspects of our study. Reviewer 1 noted that “the work aligns well with current research trends and will greatly interest researchers in the field.” Reviewer 2 highlighted the unique capability of our imaging approach, which “allows for investigation of the heterogeneity of response across individual dopamine axons, unlike other common approaches such as fiber photometry.” Reviewer 3 commented that “the experiments are beautifully executed” and “are revealing novel information about how aversive and rewarding stimuli is encoded at the level of individual axons, in a way that has not been done before.”

      In addition to the positive feedback, the reviewers also provided useful criticisms and suggestions, some of which may not be fully addressed in a single study. For instance, questions regarding whether dopamine axons encode the valence or specific identity of the stimuli, or the most salient aspects of the environment, remain open. At the same time, as all the reviewers agreed, our report on the diversity of dopamine axonal responses using a novel imaging design introduces significant new insights to the neuroscience community. Following the reviewers’ recommendations, we have refrained from making interpretations that could be perceived as overinterpretation, such as concluding that “dopamine axons are involved in aversive processing.” This has necessitated extensive revisions, including modifying the title of our manuscript to make clear that the novelty of our work is revealing ‘functional diversity’ using our new imaging approach.

      Below, we respond to the reviewers’ comments point by point.

      eLife assessment

      This valuable study shows that distinct midbrain dopaminergic axons in the medial prefrontal cortex respond to aversive and rewarding stimuli and suggest that they are biased toward aversive processing. The use of innovative microprism based two-photon calcium imaging to study single axon heterogeneity is solid, although the experimental design could be optimized to distinguish aversive valence from stimulus salience and identity in this dopamine projection. This work will be of interest to neuroscientists working on neuromodulatory systems, cortical function and decision making.

      Reviewer #1

      Summary:

      In this manuscript, Abe and colleagues employ in vivo 2-photon calcium imaging of dopaminergic axons in the mPFC. The study reveals that these axons primarily respond to unconditioned aversive stimuli (US) and enhance their responses to initially-neutral stimuli after classical association learning. The manuscript is well-structured and presents results clearly. The utilization of a refined prism-based imaging technique, though not entirely novel, is well-implemented. The study's significance lies in its contribution to the existing literature by offering single-axon resolution functional insights, supplementing prior bulk measurements of calcium or dopamine release. Given the current focus on neuromodulator neuron heterogeneity, the work aligns well with current research trends and will greatly interest researchers in the field.

      However, I would like to highlight that the authors could further enhance their manuscript by addressing study limitations more comprehensively and by providing essential details to ensure the reproducibility of their research. In light of this, I have a number of comments and suggestions that, if incorporated, would significantly contribute to the manuscript's value to the field.

      Strengths:

      • Descriptive.

      • Utilization of a well-optimized prism-based imaging method.

      • Provides valuable single-axon resolution functional observations, filling a gap in existing literature.

      • Timely contribution to the study of neuromodulator neuron heterogeneity.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      (1) It's important to fully discuss the fact that the measurements were carried out only on superficial layers (30-100um), while major dopamine projections target deep layers of the mPFC as discussed in the cited literature (Vander Weele et al., 2018) and as illustrated in FigS1B,C. This limitation should be explicitly acknowledged and discussed in the manuscript, especially given the potential functional heterogeneity among dopamine neurons in different layers. This potential across-layer heterogeneity could also be the cause of discrepancy among past recording studies with different measurement modalities. Also, mentioning technical limitations would be informative. For example: how deep the authors can perform 2p-imaging through the prism? was the "30-100um" maximum depth the authors could get?

      Thank you for pointing out this important issue about layer differences.

      It is possible that the mesocortial pathway has layer-specific channels, with some neurons targeting supra granular layers and others targeting infragranular ones. Alternatively, it is also plausible that the axons of the same neurons branch into both superficial and deep layers. This is a critical issue that has not been investigated in anatomical studies and will require single-cell labeling of dopamine neurons (Matsuda et al 2009 and Aransay et al 2015). We now discuss this issue in the Discussion.

      As for the imaging depth of 30–100 m, we were unable to visualize deeper axons in a live view mode. Our imaging system has already been optimized to detect weak signals (e.g., we have employed an excitation wavelength of 980 nm, dispersion compensation, and a hybrid photodetector). It is possible that future studies using improved imaging approaches may be able to visualize deeper layers. Importantly, sparse axons in the supragranular layers are advantageous in detecting weak signals; dense labeling of axons would increase the background fluorescence relative to signals. We now reference this layer issue in the Results and Discussion sections.

      (2) In the introduction, it seems that the authors intended to refer to Poulin et al. 2018 regarding molecular/anatomical heterogeneity of dopamine neurons, but they inadvertently cited Poulin et al. 2016 (a general review on scRNAseq). Additionally, the statement that "dopamine neurons that project to the PFC show unique genetic profiles (line 85)" requires clarification, as Poulin et al. 2018 did not specifically establish this point. Instead, they found at least the Vglut2/Cck+ population projects into mPFC, and they did not reject the possibility of other subclasses projecting to mPFC. Rather, they observed denser innervation with DAT-cre, suggesting that non-Vglut2/Cck populations would also project to mPFC. Discuss the potential molecular heterogeneity among mPFC dopamine axons in light of the sampling limitation mentioned earlier.

      We thank the reviewer for pointing this out. Genetic profiles of PFC-projecting DA neurons are still being investigated, so describing them as “unique” was misleading. We have edited the Introduction accordingly, and now discuss this issue in detail in the Discussion.

      (3) I find the data presented in Figure 2 to be odd. Firstly, the latency of shock responses in the representative axons (right panels of G, H) is consistently very long - nearly 500ms. It raises a query whether this is a biological phenomenon or if it stems from a potential technical artifact, possibly arising from an issue in synchronization between the 2-photon imaging and stimulus presentation. My reservations are compounded by the notable absence of comprehensive information concerning the synchronization of the experimental system in the method section.

      The synchronization of the stimulus and data acquisition is accomplished at a sub-millisecond resolution. We use a custom-made MATLAB program that sends TTL commands to standard imaging software (ThorImage or ScanImage) and a stimulator for electrical shocks. All events are recorded as analogue inputs to a different DAQ to ensure synchronization. We have provided additional details regarding the configuration in the Methods section.

      We consider that the long latency of shock response is biological. For instance, a similar long latency was found after electrical shock in a photometry imaging study (Kim, …, Deisseroth, 2016).

      Secondly, there appear to be irregularities in Panel J. While the authors indicate that "Significant axons were classified as either reward-preferring (cyan) or aversive-preferring (magenta), based on whether the axons are above or below the unity line of the reward/aversive scatter plot (Line 566)," a cyan dot slightly but clearly deviates above the unity line (around coordinates (x, y) = (20, 21)). This needs clarification. Lastly, when categorizing axons for analysis of conditioning data in Fig3 (not Fig2), the authors stated "The color-coded classification (cyan/magenta) was based on k-means clustering, using the responses before classical conditioning (Figure 2J)". I do not understand why the authors used different classification methods for two almost identical datasets.

      We thank the reviewer for pointing out these insufficient descriptions. We classified the axons using k-means clustering, and the separation of the two clusters happened to roughly coincide with the unity line of the reward/aversive scatter plot in Fig 2J. In other words, we did not use the unity line to classify the data points (which is why the color separation of the histogram is not at 45 degrees). We have clarified this point in the Methods section.

      (4) In connection with Point 3, conducting separate statistical analyses for aversive and rewarding stimuli would offer a fairer approach. This could potentially reveal a subset of axons that display responses to both aversive and appetitive stimuli, aligning more accurately with the true underlying dynamics. Moreover, the characterization of Figure 2J as a bimodal distribution while disregarding the presence of axons responsive to both aversive and appetitive cues seems somewhat arbitrary and circular logic. A more inclusive consideration of this dual-responsive population could contribute to a more comprehensive interpretation.

      We also attempted k-means clustering with additional dimensions (e.g., temporal domains as shown in Fig. 3I, J), but no additional clusters were evident. We note that the lack of other clusters does not exclude the possibility of their existence, which may only become apparent with a substantial increase in the number of samples. In the current report, we present the clusters that were the easiest/simplest for us to identify.

      Additionally, we have revised our manuscript to reflect that many axons respond to both reward and aversive stimuli, and that aversive-preferring axons do not exclusively respond to the aversive stimulus.

      (5) The contrast in initialization to novel cues between aversive and appetitive axons mirrors findings in other areas, such as the tail-of-striatum (TS) and ventral striatum (VS) projecting dopamine neurons (Menegas et al., 2017, not 2018). You might consider citing this very relevant study and discussing potential collateral projections between mPFC and TS or VS.

      Thank you for pointing this out. We have now included Menegas et al., 2017, and also discuss the possibility of collaterals to these areas. In addition, we also referred to Azcorra et al., 2023 - this was published after our initial submission.

      (6) The use of correlation values (here >0.65) to group ROIs into axons is common but should be justified based on axon density in the FOV and imaging quality. It's important to present the distribution of correlation values and demonstrate the consistency of results with varying cut-off values. Also, provide insights into the reliability of aversive/appetitive classifications for individual ROIs with high correlations. Importantly, if you do the statistical testing and aversive/appetitive classifications for individual ROIs with above-threshold high correlation (to be grouped into the same axon), do they always fall into the same category? How many false positives/false negatives are observed?


      "Our results remained similar for different correlation threshold values (Line 556)" (data not shown) is obsolete.

      We have conducted additional analysis using correlation values 0.5 and 0.3 that resulted in a smaller number of axon terminals. In essence, the relationship between reward responses and aversive responses remained very similar to Fig. 2J, K.

      Author response image 1.

      Reviewer #2 (Public Review):

      Summary:

      This study aims to address existing differences in the literature regarding the extent of reward versus aversive dopamine signaling in the prefrontal cortex. To do so, the authors chose to present mice with both a reward and an aversive stimulus during different trials each day. The authors used high spatial resolution two-photon calcium imaging of individual dopaminergic axons in the medial PFC to characterize the response of these axons to determine the selectivity of responses in unique axons. They also paired the reward (water) and an aversive stimulus (tail shock) with auditory tones and recorded across 12 days of associative learning.

      The authors find that some axons respond to both reward and aversive unconditioned stimuli, but overall, there is a strong preference to respond to aversive stimuli consistent with expectations from prior studies that used other recording methods. The authors find that both of their two auditory stimuli initially drive responses in axons, but that with training axons develop more selective responses for the shock associated tone indicating that associative learning led to changes in these axon's responses. Finally, the authors use anticipatory behaviors during the conditioned stimuli and facial expressions to determine stimulus discrimination and relate dopamine axons signals with this behavioral evidence of discrimination. This study takes advantage of cutting-edge imaging approaches to resolve the extent to which dopamine axons in PFC respond appetitive or aversive stimuli. They conclude that there is a strong bias to respond to the aversive tail shock in most axons and weaker more sparse representation of water reward.

      Strengths:

      The strength of this study is the imaging approach that allows for investigation of the heterogeneity of response across individual dopamine axons, unlike other common approaches such as fiber photometry which provide a measure of the average population activity. The use of appetitive and aversive stimuli to probe responses across individual axons is another strength.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      A weakness of this study is the design of the associative conditioning paradigm. The use of only a single reward and single aversive stimulus makes it difficult to know whether these results are specific to the valence of the stimuli versus the specific identity of the stimuli. Further, the reward presentations are more numerous than the aversive trials making it unclear how much novelty and habituation account for results. Moreover, the training seems somewhat limited by the low number of trials and did not result in strong associative conditioning. The lack of omission responses reported may reflect weak associative conditioning. Finally, the study provides a small advance in our understanding of dopamine signaling in the PFC and lacks evidence for if and what might be the consequence of these axonal responses on PFC dopamine concentrations and PFC neuron activity.

      We thank the reviewer for the suggestions.

      We agree that interpreting the response change during classical conditioning is not straightforward. Although the reward and aversive stimuli we employed are commonly used in the field, future studies with more sophisticated paradigms will be necessary to address whether dopamine axons encode the valence of the stimuli, the specific identity of the stimuli, or novelty and habituation. In our current manuscript, we refrain from making a conclusion that distinct groups of neurons encode different valances. In fact, many axons respond to both stimuli, at different ratios. We have removed descriptions that may suggest exclusive coding of reward or aversive processing. Additionally, we have extensively discussed possible interpretations.

      In terms of the strength of the conditioning association, behavioral results indicated that the learning plateaued – anticipatory behaviors did not increase during the last two phases when the conditioned span was divided into six phases (Figure 3–figure supplement 1).

      Our goal in the current manuscript is to provide new insight into the functional diversity of dopamine axons in the mPFC. Investigating the impact of dopamine axons on local dopamine concentration and neural activity in the mPFC is important but falls beyond the scope of our current study. In particular, given the functional diversity of dopamine axons, interpreting bulk optogenetic or chemogenetic axonal manipulation experiments would not be straightforward. As suggested, measuring the dopamine concentration through two-photon imaging of dopamine sensors and monitoring the activity of dopamine recipient neurons (e.g., D1R- or D2R-expressing neurons) is a promising approach that we plan to undertake in the near future.

      Reviewer #3 (Public Review):

      Summary:

      The authors image dopamine axons in medial prefrontal cortex (mPFC) using microprism-mediated two-photon calcium imaging. They image these axons as mice learn that two auditory cues predict two distinct outcomes, tailshock or water delivery. They find that some axons show a preference for encoding of the shock and some show a preference for encoding of water. The authors report a greater number of dopamine axons in mPFC that respond to shock. Across time, the shock-preferring axons begin to respond preferentially to the cue predicting shock, while there is a less pronounced increase in the water-responsive axons that acquire a response to the water-predictive cue (these axons also increase non-significantly to the shock-predictive cue). These data lead the authors to argue that dopamine axons in mPFC preferentially encode aversive stimuli.

      Strengths:

      The experiments are beautifully executed and the authors have mastered an impressively complex technique. Specifically, they are able to image and track individual dopamine axons in mPFC across days of learning. This technique is used the way it should be: the authors isolate distinct dopamine axons in mPFC and characterize their encoding preferences and how this evolves across learning of cue-shock and cue-water contingencies. Thus, these experiments are revealing novel information about how aversive and rewarding stimuli is encoded at the level of individual axons, in a way that has not been done before. This is timely and important.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      The overarching conclusion of the paper is that dopamine axons preferentially encode aversive stimuli. This is prevalent in the title, abstract, and throughout the manuscript. This is fundamentally confounded. As the authors point out themselves, the axonal response to stimuli is sensitive to outcome magnitude (Supp Fig 3). That is, if you increase the magnitude of water or shock that is delivered, you increase the change in fluorescence that is seen in the axons. Unsurprisingly, the change in fluorescence that is seen to shock is considerably higher than water reward.

      We agree that the interpretation of our results is not straightforward. Our current manuscript now focuses on our strength, which is reporting the functional diversity of dopamine axons. Therefore, we avoid using the word ‘encode’ when describing the response.

      We believe that our results could reconcile the apparent discrepancy as to why some previous studies reported only aversive responses while others reported reward responses. In particular, if the reward volume were very small, the reward response could go undetected.

      Further, when the mice are first given unexpected water delivery and have not yet experienced the aversive stimuli, over 40% of the axons respond [yet just a few lines below the authors write: "Previous studies have demonstrated that the overall dopamine release at the mPFC or the summed activity of mPFC dopamine axons exhibits a strong response to aversive stimuli (e.g., tail shock), but little to rewards", which seems inconsistent with their own data].

      We always recorded the reward and aversive response together, which might have confused the reviewer. Therefore, there is no inconsistency in our data. We have clarified our methods and reasoning accordingly.

      Given these aspects of the data, it could be the case that the dopamine axons in mPFC encodes different types of information and delegates preferential processing to the most salient outcome across time.

      This is certainly an exciting interpretation, so we have included it in our discussion. Meanwhile, ‘the most salient outcome’ alone cannot fully capture the diverse response patterns of the dopaminergic axons, particularly reward-preferring axons. We discuss our findings in more detail in the revised manuscript.

      The use of two similar sounding tones (9Khz and 12KHz) for the reward and aversive predicting cues are likely to enhance this as it requires a fine-grained distinction between the two cues in order to learn effectively. There is considerable literature on mPFC function across species that would support such a view. Specifically, theories of mPFC function (in particular prelimbic cortex, which is where the axon images are mostly taken) generally center around resolution of conflict in what to respond, learn about, and attend to. That is, mPFC is important for devoting the most resources (learning, behavior) to the most relevant outcomes in the environment. This data then, provides a mechanism for this to occur in mPFC. That is, dopamine axons signal to the mPFC the most salient aspects of the environment, which should be preferentially learned about and responded towards. This is also consistent with the absence of a negative prediction error during omission: the dopamine axons show increases in responses during receipt of unexpected outcomes, but do not encode negative errors. This supports a role for this projection in helping to allocate resources to the most salient outcomes and their predictors, and not learning per se. Below are a just few references from the rich literature on mPFC function (some consider rodent mPFC analogous to DLPFC, some mPFC), which advocate for a role in this region in allocating attention and cognitive resources to most relevant stimuli, and do not indicate preferential processing of aversive stimuli.

      Distinguishing between 9 kHz and 12 kHz sound tones may not be that difficult, considering anticipatory licking and running are differentially manifested. In addition, previous studies have shown that mice can distinguish between two sound tones when they are separated by 7% (de Hoz and Nelken 2014). Nonetheless, we agree with the attractive interpretation that “the mPFC devotes the most resources (learning, behavior) to the most relevant outcomes in the environment” and that dopamine is a mechanism for this. Therefore, we discuss this interpretation in the revised text.

      References:

      (1) Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. Annual review of neuroscience, 24(1), 167-202.

      (2) Bissonette, G. B., Powell, E. M., & Roesch, M. R. (2013). Neural structures underlying set-shifting: roles of medial prefrontal cortex and anterior cingulate cortex. Behavioural brain research, 250, 91101.

      (3) Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. Annual review of neuroscience, 18(1), 193-222.

      (4) Sharpe, M. J., Stalnaker, T., Schuck, N. W., Killcross, S., Schoenbaum, G., & Niv, Y. (2019). An integrated model of action selection: distinct modes of cortical control of striatal decision making. Annual review of psychology, 70, 53-76.

      (5) Ridderinkhof, K. R., Ullsperger, M., Crone, E. A., & Nieuwenhuis, S. (2004). The role of the medial frontal cortex in cognitive control. science, 306(5695), 443-447.

      (6) Nee, D. E., Kastner, S., & Brown, J. W. (2011). Functional heterogeneity of conflict, error, taskswitching, and unexpectedness effects within medial prefrontal cortex. Neuroimage, 54(1), 528-540.

      (7) Isoda, M., & Hikosaka, O. (2007). Switching from automatic to controlled action by monkey medial frontal cortex. Nature neuroscience, 10(2), 240-248.

      Reviewer #1 (Recommendations For The Authors):

      Specific Suggestions and Questions on the Methods Section:

      In general, the methods part is not well documented and sometimes confusing. Thus, as it stands, it hinders reproducible research. Specific suggestions/questions are listed in the following section.

      (1) Broussard et al. 2018 introduced axon-GCaMP6 instead of axon-jGCaMP8m. The authors should provide details about the source of this material. If it was custom-made, a description of the subcloning process would be appreciated. Additionally, consider depositing sequence information or preferably the plasmid itself. Furthermore, the introduction of the jGCaMP8 series by Zhang, Rozsa, et al. 2023 should be acknowledged and referenced in your manuscript.

      We thank the reviewer for pointing this out. We have now included details on how we prepared the axon-jGCaMP8m, which was based on plasmids available at Addgene. Additionally, we have deposited our construct to Addgene ( https://www.addgene.org/216533/ ). We have also cited Janelia’s report on jGCaMP8, Zhang et al.

      (2) The authors elaborate on the approach taken for experimental synchronization. Specifically, how was the alignment achieved between 2-photon imaging, treadmill recordings, aversive/appetitive stimuli, and videography? It would be important to document the details of the software and hardware components employed for generating TTLs that trigger the pump, stimulator, cameras, etc.

      We have now included a more detailed explanation about the timing control. We utilize a custommade MATLAB program that sends TTL square waves and analogue waves via a single National Instruments board (USB-6229) to control two-photon image acquisition, behavior camera image acquisition, water syringe movement, current flow from a stimulator, and sound presentation. We also continuously recorded at 30 kHz via a separate National Instrument board (PCIe-6363) the frame timing of two-photon imaging, the frame timing of a behavior camera, copies of command waves (sent to the syringe pump, the stimulator, and the speaker), and signals from the treadmill corresponding to running speed.

      (3) The information regarding the cameras utilized in the study presents some confusion. In one instance, you mention, "To monitor licking behavior, the face of each mouse was filmed with a camera at 60 Hz (CM3-U3-13Y3M-CS, FLIR)" (Line 488). However, there's also a reference to filming facial expressions using an infrared web camera (Line 613). Could you clarify whether the FLIR camera (which is an industrial CMOS not a webcam) is referred to as a webcam? Alternatively, if it's a different camera being discussed, please provide product details, including pixel numbers and frame rate for clarity.

      We thank the reviewer for pointing this out. This was a mistake on our end. The camera used in the current project was a CM3-U3-13Y3M-CS, not a web camera. We have now corrected this.

      (4) Please provide more information about the methodology employed for lick detection. Specifically, did the authors solely rely on videography for this purpose? If so, why was an electrical (or capacitive) detector not used? It would provide greater accuracy in detecting licking.

      Lick detection was performed offline based on videography, using DeepLabCut. As licking occurs at a frequency of ~6.5 Hz (Xu, …, O’Connor Nature Neurosci, 2022), the movement can be detected at a frame rate of 60 Hz. Initially, we used both a lick sensor and videography. However, we favored videography because it could potentially provide non-binary information.

      Other Minor Points:

      (5) Ensure consistency in the citation format; both Vander Weele et al. 2018 and Weele et al. 2019, share the same first author.

      Thank you for pointing this out. Endnote processes the first author’s name differently depending on the journal. We fixed the error manually. The first paper (2018) is an original research paper, and the second one (2019) is a review about how dopamine modulates aversive processing in the mPFC. We cited the second one in three instances where we mentioned review papers.

      (6) The distinction between "dashed vs dotted lines" in Figure 3K and 3M appears to be very confusing. Please consider providing a clearer visualization/labeling to mitigate this confusion.

      We have now changed the line styles.

      (7) Additionally plotting mean polar angles of aversive/appetitive axons as vectors in the Cartesian scatter plots (2J, 3I,J) would make interpretation easier.

      We have now made this change to Figures 2, 3, 4.

      (8) Data and codes should be shared in a public database. This is important for reproducible research and we believe that "available from the corresponding author upon reasonable request" is outdated language.

      We have uploaded the data to GitHub, https://github.com/pharmedku/2024-elife-da-axon.

      Reviewer #2 (Recommendations For The Authors):

      (1) Authors don't show which mouse each axon data comes from making it hard to know if differences arise from inter-mouse differences vs differences in axons. The best way to address this point is to show similar plots as Figure 2J & K but broken down by mouse to shows whether each mouse had evidence of these two clusters.

      We have now made this change to Figure 2-figure supplement 3.

      (2) Line 166: Should this sentence point to panels 2F, G, H rather than 2I which doesn't show a shock response?

      We thank the reviewer for pointing this out. We have fixed the incorrect labels.

      Line 195: The population level bias to aversive stimuli was shown previously using photometry so it is not justified to say "for the first time" regarding this statement.

      We have adjusted this sentences so the claim of ”for the first time” is not associated with the population-level bias.

      (4) The paper lacks a discussion of the potential role that novelty plays in the amplitude of the responses given that tail shocks occur less often that rewards. Is the amplitude of the first reward of the day larger than subsequent rewards? Would tail shock responses decay if they occurred in sequential trials?

      Following the reviewer's suggestion, we conducted a comparison of individual axonal responses to both conditioned and unconditioned stimuli across the first trial and subsequent trials. Our findings reveal a notable trend: aversive-preferring axons exhibited attenuation in response to CSreward, yet enhancement in response to CSaversive. Conversely, the response of these axons to USreward was attenuated, with no significant change observed for USaversive. In contrast, reward-preferring axons displayed an invariable activity pattern from the initial trial, highlighting the functional diversity present within dopamine axons. This analysis has been integrated into Figure 3-figure supplement 4 and is elaborated upon in the Discussion section.

      (5) Fix typo in Figure 1 - supplement 1. Shift

      We have now corrected this. Thank you.

      (6) The methods section needs information about trial numbers. Please indicate how many trials were presented to each mouse per day.

      We have now added the information about trial numbers to the Methods section.

      Reviewer #3 (Recommendations For The Authors):

      In line with the public review, my recommendation is for the authors to remain as objective about their data as possible. There are many points in the manuscript where the authors seem to directly contradict their own data. For example, they first detail that dopamine axons respond to unexpected water rewards. Indeed, they find that there are 40% of dopamine axons that respond in this way. Then, a few paragraphs later they state: "Previous studies have demonstrated that the overall dopamine release at the mPFC or the summed activity of mPFC dopamine axons exhibits a strong response to aversive stimuli (e.g., tail shock), but little to rewards". As detailed above, I do not think these data support an idea that dopamine axons in mPFC preferentially encode aversive outcomes. If the authors wanted to examine a role for mPFC in preferential encoding of aversive stimuli, you would first have to equate the outcomes by magnitude and then compare how the axons acquire preferences across time. Alternatively, a prediction of a more general process that I detail above would predict that you could give mice two rewards that differ in magnitude (e.g., lots of food vs. small water) and you would see the same results that the authors have seen here (i.e., a preference for the food, which is the larger and more salient outcome). Without other tests of how dopamine axons in mPFC respond to situations like this, I don't think any conclusion around mPFC in favoring aversive stimuli can be made.

      As suggested, we have made the current manuscript as objective as possible, removing interpretation aspects regarding what dopamine axons encode and emphasizing their functional diversity. In particular, we remove the word ‘encode’ when describing the response of dopamine axons.

      Although it may have appeared unclear, there was no contradiction within our data regarding the response to reward and aversive stimuli. We have now improved the readability of the Results and Methods sections. Concerning the interpretation of what exactly the mPFC dopamine axons encode, we have rewritten the discussion to be as objective about our data as possible, as suggested. We also have edited our title and abstract accordingly. Meanwhile, we wish to emphasize that our reward and aversive stimuli are standard paradigms commonly used in the field. We believe, and all the reviewers agreed, that reporting the diversity of dopamine axonal responses with a novel imaging design constitutes new insight for the neuroscience community. Therefore, we have decided to leave the introduction of new behavioral tasks for future studies and instead expanded our discussion.

      As mentioned, I think the experiments are executed really well and the technological aspects of the authors' methods are impressive. However, there are also some aspects of the data presentation that would be improved. Some of the graphs took a considerable amount of effort to unpack. For example, Figure 4 is hard going. Is there a way to better illustrate the main points that this figure wants to convey? Some of this might be helped by a more complete description in the figure captions about what the data are showing. It would also be great to see how the response of dopamine axons changes across trial within a session to the shock and water-predictive cues. Supp Figure 1 should be in the main text with standard error and analyses across time. Clarifying these aspects of the data would make the paper more relevant and accessible to the field.

      We thank the reviewer for pointing out that the legend of Figure 4 was incomplete. We have fixed it, along with improving the presentation of the figure. We have also prepared a new figure (Figure 3– figure supplement 4) to compare CSaversive and CSreward signals for the first and rest of the trials within daily sessions, revealing further functional diversity in dopamine axons. We have decided to keep Figure 1–figure supplement 2 as a figure supplement with an additional analysis, as another reviewer pointed out that the design is not completely new. Furthermore, as eLife readers can easily access figure supplements, we believe it is appropriate to maintain it in this way.

      Minor points:

      (1) What is the control period for the omission test? Was omission conducted for the shock?

      The control period for reward omission is a 2-second period just before the CS onset. We did not include shock omission, because a sufficient number of trials (> 6 trials) for the rare omission condition could not be achieved within a single day.

      (2) The authors should mention how similar the tones were that predicted water and shock.

      According to de Hoz and Nelken (2014), a frequency difference of 4–7% is enough for mice to discriminate between tones. In addition, anticipatory licking and running confirmed that the mice could discriminate between the frequencies. We have now included this information in the Discussion.

      (3) I realize the viral approach used in the current studies may not allow for an idea of where in VTA dopamine neurons are that project to mPFC- is there data in the literature that speak to this? Particularly important as we now know that there is considerable heterogeneity in dopamine neuronal responses, which is often captured by differences in medial/lateral position within VTA.

      Some studies have suggested that mesocortical dopamine neurons are located in the medial posterior VTA (e.g., Lammel et al., 2008). However, in mouse anterograde tracing, it is not possible to spatially confine the injection of conventional viruses/tracers. We now refer to Lammel et al., 2008 in the Introduction.

    1. Author response:

      eLife assessment

      This study provides valuable information on the mechanism of PepT2 through enhanced-sampling molecular dynamics, backed by cell-based assays, highlighting the importance of protonation of selected residues for the function of a proton-coupled oligopeptide transporter (hsPepT2). The molecular dynamics approaches are convincing, but with limitations that could be addressed in the manuscript, including lack of incorporation of a protonation coordinate in the free energy landscape, possibility of protonation of the substrate, errors with the chosen constant pH MD method for membrane proteins, dismissal of hysteresis emerging from the MEMENTO method, and the likelihood of other residues being affected by peptide binding. Some changes to the presentation could be considered, including a better description of pKa calculations and the inclusion of error bars in all PMFs. Overall, the findings will appeal to structural biologists, biochemists, and biophysicists studying membrane transporters.

      We would like to express our gratitude to the reviewers for providing their feedback on our manuscript, and also for recognising the variety of computational methods employed, the amount of sampling collected and the experimental validation undertaken. Following the individual reviewer comments, as addressed point-by-point below, we will shortly prepare a revised version of this paper. Intended changes to the revised manuscript are marked up in bold font in the detailed responses below, but before that we address some of the comments made above in the general assessment:

      • “lack of incorporation of a protonation coordinate in the free energy landscape”. We acknowledge that of course it would be highly desirable to treat protonation state changes explicitly and fully coupled to conformational changes. However, at this point in time, evaluating such a free energy landscape is not computationally feasible (especially considering that the non-reactive approach taken here already amounts to almost 1ms of total sampling time). Previous reports in the literature tend to focus on either simpler systems or a reduced subset of a larger problem. As we were trying to obtain information on the whole transport cycle, we decided to focus here on non-reactive methods.

      • “possibility of protonation of the substrate”. The reviewers are correct in pointing out this possibility, which we had not discussed explicitly in our manuscript. Briefly, while we describe a mechanism in which protonation of only protein residues (with an unprotonated ligand) can account for driving all the necessary conformational changes of the transport cycle, there is some evidence for a further intermediate protonation site in our data (as we commented on in the first version of the manuscript as well), which may or may not be the substrate itself. A future explicit treatment of the proton movements through the transporter, when it will become computationally tractable to do so, will have to include the substrate as a possible protonation site; for the present moment, we will amend our discussion to alert the reader to the possibility that the substrate could be an intermediate to proton transport. This has repercussions for our study of the E56 pKa value, where – if protons reside with a significant population at the substrate C-terminus – our calculated shift in pKa upon substrate binding could be an overestimate, although we would qualitatively expect the direction of shift to be unaffected. However, we also anticipate that treating this potential coupling explicitly would make convergence of any CpHMD calculation impractical to achieve and thus it may be the case that for now only a semi-quantitative conclusion is all that can be obtained.

      • “errors with the chosen constant pH MD method for membrane proteins”. We acknowledge that – as reviewer #1 has reminded us – the AMBER implementation of hybrid-solvent CpHMD is not rigorous for membrane proteins, and as such we will add a cautionary note to our paper. We will also explain how the use of the ABFE thermodynamic cycle calculations helps to validate the CpHMD results in a completely orthogonal manner (we will promote this validation which was in the supplementary figures into the main text in the revised version). We therefore remain reasonably confident in the results presented with regards to the reported pKa shift of E56 upon substrate binding, and suggest that if the impact of neglecting the membrane in the implicit-solvent stage of CpHMD is significant, then there is likely an error cancellation when considering shifts induced by the incoming substrate.

      • “dismissal of hysteresis emerging from the MEMENTO method”. We have shown in our method design paper how the use of the MEMENTO method drastically reduces hysteresis compared to steered MD and metadynamics for path generation, and find this improvement again for PepT2 in this study. We will address reviewer #3’s concern about our presentation on this point by revising our introduction of the MEMENTO method, as detailed in the response below.

      • “the likelihood of other residues being affected by peptide binding”. In this study, we have investigated in detail the involvement of several residues in proton-coupled di-peptide transport by PepT2. Short of the potential intermediate protonation site mentioned above, the set of residues we investigate form a minimal set of sorts within which the important driving forces of alternating access can be rationalised. We have not investigated in substantial detail here the residues involved in holding the peptide in the binding site, as they are well studied in the literature and ligand promiscuity is not the problem of interest here. It remains entirely possible that further processes contribute to the mechanism of driving conformational changes by involving other residues not considered in this paper. We will make our speculation that an ensemble of different processes may be contributing simultaneously more explicit in our revision, but do not believe any of our conclusions would be affected by this.

      As for the additional suggested changes in presentation, we will provide the requested details on the CpHMD analysis. Furthermore, we will use the convergence data presented separately in figures S12 and S16 to include error bars on our 1D-reprojections of the 2D-PMFs in figures 3, 4 and 5. (Note that we will opt to not do so in figures S10 and S15 which collate all 1D PMF reprojections for the OCC ↔ OF and OCC ↔ IF transitions in single reference plots, respectively, to avoid overcrowding those necessarily busy figures). We are also changing the colours schemes of these plots in our revision to improve accessibility.

      Reviewer #1 (Public Review):

      The authors have performed all-atom MD simulations to study the working mechanism of hsPepT2. It is widely accepted that conformational transitions of proton-coupled oligopeptide transporters (POTs) are linked with gating hydrogen bonds and salt bridges involving protonatable residues, whose protonation triggers gate openings. Through unbiased MD simulations, the authors identified extra-cellular (H87 and D342) and intra-cellular (E53 and E622) triggers. The authors then validated these triggers using free energy calculations (FECs) and assessed the engagement of the substrate (Ala-Phe dipeptide). The linkage of substrate release with the protonation of the ExxER motif (E53 and E56) was confirmed using constant-pH molecular dynamics (CpHMD) simulations and cellbased transport assays. An alternating-access mechanism was proposed. The study was largely conducted properly, and the paper was well-organized. However, I have a couple of concerns for the authors to consider addressing.

      We would like to note here that it may be slightly misleading to the reader to state that “The linkage of substrate release with the protonation of the ExxER motif (E53 and E56) was confirmed using constant-pH molecular dynamics (CpHMD) simulations and cell-based transport assays.” The cellbased transport assays confirmed the importance of the extracellular gating trigger residues H87, S321 and D342 (as mentioned in the preceding sentence), not of the substrate-protonation link as this line might be understood to suggest.

      (1) As a proton-coupled membrane protein, the conformational dynamics of hsPepT2 are closely coupled to protonation events of gating residues. Instead of using semi-reactive methods like CpHMD or reactive methods such as reactive MD, where the coupling is accounted for, the authors opted for extensive non-reactive regular MD simulations to explore this coupling. Note that I am not criticizing the choice of methods, and I think those regular MD simulations were well-designed and conducted. But I do have two concerns.

      a) Ideally, proton-coupled conformational transitions should be modelled using a free energy landscape with two or more reaction coordinates (or CVs), with one describing the protonation event and the other describing the conformational transitions. The minimum free energy path then illustrates the reaction progress, such as OCC/H87D342- → OCC/H87HD342H → OF/H87HD342H as displayed in Figure 3.

      We concur with the reviewer that the ideal way of describing the processes studied in our paper would be as a higher-dimensional free energy landscapes obtained from a simulation method that can explicitly model proton-transfer processes. Indeed, it would have been particularly interesting and potentially informative with regards to the movement of protons down into the transporter in the OF → OCC → IF sequence of transitions. As we note in our discussion on the H87→E56 proton transfer:

      “This could be investigated using reactive MD or QM/MM simulations (both approaches have been employed for other protonation steps of prokaryotic peptide transporters, see Parker et al. (2017) and Li et al. (2022)). However, the putative path is very long (≈ 1.7 nm between H87 and E56) and may or may not involve a large number of intermediate protonatable residues, in addition to binding site water. While such an investigation is possible in principle, it is beyond the scope of the present study.”

      Where even sampling the proton transfer step itself in an essentially static protein conformation would be pushing the boundaries of what has been achieved in the field, we believe that considering the current state-of-the-art, a fully coupled investigation of large-scale conformational changes and proton-transfer reaction is not yet feasible in a realistic/practical time frame. We also note this limitation already when we say that:

      “The question of whether proton binding happens in OCC or OF warrants further investigation, and indeed the co-existence of several mechanisms may be plausible here”.

      Nonetheless, we are actively exploring approaches to treat uptake and movement of protons explicitly for future work.

      In our revision, we will expand on our discussion of the reasoning behind employing a nonreactive approach and the limitations that imposes on what questions can be answered in this study.

      Without including the protonation as a CV, the authors tried to model the free energy changes from multiple FECs using different charge states of H87 and D342. This is a practical workaround, and the conclusion drawn (the OCC→ OF transition is downhill with protonated H87 and D342) seems valid. However, I don't think the OF states with different charge states (OF/H87D342-, OF/H87HD342-, OF/H87D342H, and OF/H87HD342H) are equally stable, as plotted in Figure 3b. The concern extends to other cases like Figures 4b, S7, S10, S12, S15, and S16. While it may be appropriate to match all four OF states in the free energy plot for comparison purposes, the authors should clarify this to ensure readers are not misled.

      The reviewer is correct in their assessment that the aligning of PMFs in these figures is arbitrary; no relative free energies of the PMFs to each other can be estimated without explicit free energy calculations at least of protonation events at the end state basins. The PMFs in our figures are merely superimposed for illustrating the differences in shape between the obtained profiles in each condition, as discussed in the text, and we will make this clear in the appropriate figure captions in our revision.

      b) Regarding the substrate impact, it appears that the authors assumed fixed protonation states. I am afraid this is not necessarily the case. Variations in PepT2 stoichiometry suggest that substrates likely participate in proton transport, like the Phe-Ala (2:1) and Phe-Gln (1:1) dipeptides mentioned in the introduction. And it is not rigorous to assume that the N- and C-termini of a peptide do not protonate/deprotonate when transported. I think the authors should explicitly state that the current work and the proposed mechanism (Figure 8) are based on the assumption that the substrates do not uptake/release proton(s).

      This is indeed an assumption inherent in the current work. While we do “speculate that the proton movement processes may happen as an ensemble of different mechanisms, and potentially occur contemporaneously with the conformational change” we do not in the current version indicate explicitly that this may involve the substrate. We will make clear the assumption and this possibility in the revised version of our paper. Indeed, as we discuss, there is some evidence in our PMFs of an additional protonation site not considered thus far, which may or may not be the substrate. We will make note of this point in the revised manuscript.

      As for what information can be drawn from the given experimental stoichiometries, we note in our paper that “a 2:1 stoichiometry was reported for the neutral di-peptide D-Phe-L-Ala and 3:1 for anionic D-Phe-L-Glu. (Chen et al., 1999) Alternatively, Fei et al. (1999) have found 1:1 stoichiometries for either of D-Phe-L-Gln (neutral), D-Phe-L-Glu (anionic), and D-Phe-L-Lys (cationic).”

      We do not assume that it is our place to arbit among the apparent discrepancies in the experimental data here, although we believe that our assumed 2:1 stoichiometry is additionally “motivated also by our computational results that indicate distinct and additive roles played by two protons in the conformational cycle mechanism”.

      (2) I have more serious concerns about the CpHMD employed in the study.

      a) The CpHMD in AMBER is not rigorous for membrane simulations. The underlying generalized Born model fails to consider the membrane environment when updating charge states. In other words, the CpHMD places a membrane protein in a water environment to judge if changes in charge states are energetically favorable. While this might not be a big issue for peripheral residues of membrane proteins, it is likely unphysical for internal residues like the ExxER motif. As I recall, the developers have never used the method to study membrane proteins themselves. The only CpHMD variant suitable for membrane proteins is the membrane-enabled hybrid-solvent CpHMD in CHARMM. While I do not expect the authors to redo their CpHMD simulations, I do hope the authors recognize the limitations of their method.

      We will discuss the limitations of the AMBER CpHMD implementation in the revised version. However, despite that, we believe we have in fact provided sufficient grounds for our conclusion that substrate binding affects ExxER motif protonation in the following way:

      In addition to CpHMD simulations, we establish the same effect via ABFE calculations, where the substrate affinity is different at the E56 deprotonated vs protonated protein. This is currently figure S20, though in the revised version we will move this piece of validation into a new panel of figure 6 in the main text, since it becomes more important with the CpHMD membrane problem in mind. Since the ABFE calculations are conducted with an all-atom representation of the lipids and the thermodynamic cycle closes well, it would appear that if the chosen CpHMD method has a systematic error of significant magnitude for this particular membrane protein system, there may be the benefit of error cancellation. While the calculated absolute pKa values may not be reliable, the difference made by substrate binding appears to be so, as judged by the orthogonal ABFE technique.

      Although the reviewer does “not expect the authors to redo their CpHMD simulations”, we consider that it may be helpful to the reader to share in this response some results from trials using the continuous, all-atom constant pH implementation that has recently become available in GROMACS (Aho et al 2022, https://pubs.acs.org/doi/10.1021/acs.jctc.2c00516) and can be used rigorously with membrane proteins, given its all-atom lipid representation.

      Unfortunately, when trying to titrate E56 in this CpHMD implementation, we found few protonationstate transitions taking place, and the system often got stuck in protonation state–local conformation coupled minima (which need to interconvert through rearrangements of the salt bridge network involving slow side-chain dihedral rotations in E53, E56 and R57). Author response image 1 shows this for the apo OF state, Author response image 2 shows how noisy attempts at pKa estimation from this data turn out to be, necessitating the use of a hybrid-solvent method.

      Author response image 1.

      All-atom CpHMD simulations of apo-OF PepT2. Red indicates protonated E56, blue is deprotonated.

      Author response image 2.

      Difficulty in calculating the E56 pKa value from the noisy all-atom CpHMD data shown in Author response image 1

      b) It appears that the authors did not make the substrate (Ala-Phe dipeptide) protonatable in holosimulations. This oversight prevents a complete representation of ligand-induced protonation events, particularly given that the substrate ion pairs with hsPepT2 through its N- & C-termini. I believe it would be valuable for the authors to acknowledge this potential limitation.

      In this study, we implicitly assumed from the outset that the substrate does not get protonated, which – as by way of response to the comment above – we will acknowledge explicitly in revision. This potential limitation for the available mechanisms for proton transfer also applies to our investigation of the ExxER protonation states. In particular, a semi-grand canonical ensemble that takes into account the possibility of substrate C-terminus protonation may also sample states in which the substrate is protonated and oriented away from R57, thus leaving the ExxER salt bridge network in an apo-like state. The consequence would be that while the direction of shift in E56 pKa value will be the same, our CpHMD may overestimate its magnitude. It would thus be interesting to make the C-terminus protonatable for obtaining better quantitative estimates of the E56 pKa shift (as is indeed true in general for any other protein protonatable residue, though the effects are usually assumed to be negligible). We do note, however, that convergence of the CpHMD simulations would be much harder if the slow degree of freedom of substrate reorientation (which in our experience takes 10s to 100s of ns in this binding pocket) needs to be implicitly equilibrated upon protonation state transitions. We will discuss such considerations in the revision.

      Reviewer #2 (Public Review):

      This is an interesting manuscript that describes a series of molecular dynamics studies on the peptide transporter PepT2 (SLC15A2). They examine, in particular, the effect on the transport cycle of protonation of various charged amino acids within the protein. They then validate their conclusions by mutating two of the residues that they predict to be critical for transport in cell-based transport assays. The study suggests a series of protonation steps that are necessary for transport to occur in Petp2. Comparison with bacterial proteins from the same family shows that while the overall architecture of the proteins and likely mechanism are similar, the residues involved in the mechanism may differ.

      Strengths:

      This is an interesting and rigorous study that uses various state-of-the-art molecular dynamics techniques to dissect the transport cycle of PepT2 with nearly 1ms of sampling. It gives insight into the transport mechanism, investigating how the protonation of selected residues can alter the energetic barriers between various states of the transport cycle. The authors have, in general, been very careful in their interpretation of the data.

      Weaknesses:

      Interestingly, they suggest that there is an additional protonation event that may take place as the protein goes from occluded to inward-facing but they have not identified this residue.

      We have indeed suggested that there may be an additional protonation site involved in the conformational cycle that we have not been able to capture, which – as we discuss in our paper – might be indicated by the shapes of the OCC ↔ IF PMFs given in Figure S15. One possibility is for this to be the substrate itself (see the response to reviewer #1 above) though within the scope of this study the precise pathway by which protons move down the transporter and the exact ordering of conformational change and proton transfer reactions remains a (partially) open question. We acknowledge this and denote it with question marks in the mechanistic overview we give in Figure 8, and also “speculate that the proton movement processes may happen as an ensemble of different mechanisms, and potentially occur contemporaneously with the conformational change”.

      Some things are a little unclear. For instance, where does the state that they have defined as occluded sit on the diagram in Figure 1a? - is it truly the occluded state as shown on the diagram or does it tend to inward- or outward-facing?

      Figure 1a is a simple schematic overview intended to show which structures of PepT2 homologues are available to use in simulations. This was not meant to be a quantitative classification of states. Nonetheless, we can note that the OCC state we derived has extra- and intracellular gate opening distances (as measured by the simple CVs defined in the methods and illustrated in Figure 2a) that indicate full gate closure at both sides. In particular, although it was derived from the IF state via biased sampling, the intracellular gate opening distance in the OCC state used for our conformational change enhanced sampling was comparable to that of the OF state (ie, full closure of the gate), see Figure S2b and the grey bars therein. Therefore, we would schematically classify the OCC state to lie at the center of the diagram in Figure 1a. Furthermore, it is largely stable over triplicates of 1 μslong unbiased MD, where in 2/3 replicates the gates remain stable, and the remaining replicate there is partial opening of the intracellular gate (as shown in Figure 2 b/c under the “apo standard” condition). We comment on this in the main text by saying that “The intracellular gate, by contrast, is more flexible than the extracellular gate even in the apo, standard protonation state”, and link it to the lower barrier for transition to IF than to OF. We did this by saying that “As for the OCC↔OF transitions, these results explain the behaviour we had previously observed in the unbiased MD of Figure 2c.” We acknowledge this was not sufficiently clear and will add details to the latter sentence in revision to help clarify better the nature of the occluded state.

      The pKa calculations and their interpretation are a bit unclear. Firstly, it is unclear whether they are using all the data in the calculations of the histograms, or just selected data and if so on what basis was this selection done. Secondly, they dismiss the pKa calculations of E53 in the outward-facing form as not being affected by peptide binding but say that E56 is when there seems to be a similar change in profile in the histograms.

      In our manuscript, we have provided two distinct analyses of the raw CpHMD data. Firstly, we analysed the data by the replicates in which our simulations were conducted (Figure 6, shown as bar plots with mean from triplicates +/- standard deviation), where we found that only the effect on E56 protonation was distinct as lying beyond the combined error bars. This analysis uses the full amount of sampling conducted for each replicate. However, since we found that the range of pKa values estimated from 10ns/window chunks was larger than the error bars obtained from the replicate analysis (Figures S17 and S18), we sought to verify our conclusion by pooling all chunk estimates and plotting histograms (Figure S19). We recover from those the effect of substrate binding on the E56 protonation state on both the OF and OCC states. However, as the reviewer has pointed out (something we did not discuss in our original manuscript), there is a shift in the pKa of E53 of the OF state only. In fact, the trend is also apparent in the replicate-based analysis of Figure 6, though here the larger error bars overlap. In our revision, we will add more details of these analyses for clarity (including more detailed figure captions regarding the data used in Figure 6) as well as a discussion of the partial effect on the E53 pKa value.

      We do not believe, however, that our key conclusions are negatively affected. If anything, a further effect on the E53 pKa which we had not previously commented on (since we saw the evidence as weaker, pertaining to only one conformational state) would strengthen the case for an involvement of the ExxER motif in ligand coupling.

      Reviewer #3 (Public Review):

      Summary:

      Lichtinger et al. have used an extensive set of molecular dynamics (MD) simulations to study the conformational dynamics and transport cycle of an important member of the proton-coupled oligopeptide transporters (POTs), namely SLC15A2 or PepT2. This protein is one of the most wellstudied mammalian POT transporters that provides a good model with enough insight and structural information to be studied computationally using advanced enhanced sampling methods employed in this work. The authors have used microsecond-level MD simulations, constant-PH MD, and alchemical binding free energy calculations along with cell-based transport assay measurements; however, the most important part of this work is the use of enhanced sampling techniques to study the conformational dynamics of PepT2 under different conditions.

      The study attempts to identify links between conformational dynamics and chemical events such as proton binding, ligand-protein interactions, and intramolecular interactions. The ultimate goal is of course to understand the proton-coupled peptide and drug transport by PepT2 and homologous transporters in the solute carrier family.

      Some of the key results include:

      (1) Protonation of H87 and D342 initiate the occluded (Occ) to the outward-facing (OF) state transition.

      (2) In the OF state, through engaging R57, substrate entry increases the pKa value of E56 and thermodynamically facilitates the movement of protons further down.

      (3) E622 is not only essential for peptide recognition but also its protonation facilitates substrate release and contributes to the intracellular gate opening. In addition, cell-based transport assays show that mutation of residues such as H87 and D342 significantly decreases transport activity as expected from simulations.

      Strengths:

      (1) This is an extensive MD-based study of PepT2, which is beyond the typical MD studies both in terms of the sheer volume of simulations as well as the advanced methodology used. The authors have not limited themselves to one approach and have appropriately combined equilibrium MD with alchemical free energy calculations, constant-pH MD, and geometry-based free energy calculations. Each of these 4 methods provides a unique insight regarding the transport mechanism of PepT2.

      (2) The authors have not limited themselves to computational work and have performed experiments as well. The cell-based transport assays clearly establish the importance of the residues that have been identified as significant contributors to the transport mechanism using simulations.

      (3) The conclusions made based on the simulations are mostly convincing and provide useful information regarding the proton pathway and the role of important residues in proton binding, protein-ligand interaction, and conformational changes.

      Weaknesses:

      (1) Some of the statements made in the manuscript are not convincing and do not abide by the standards that are mostly followed in the manuscript. For instance, on page 4, it is stated that "the K64-D317 interaction is formed in only ≈ 70% of MD frames and therefore is unlikely to contribute much to extracellular gate stability." I do not agree that 70% is negligible. Particularly, Figure S3 does not include the time series so it is not clear whether the 30% of the time where the salt bridge is broken is in the beginning or the end of simulations. For instance, it is likely that the salt bridge is not initially present and then it forms very strongly. Of course, this is just one possible scenario but the point is that Figure S3 does not rule out the possibility of a significant role for the K64-D317 salt bridge.

      The reviewer is right to point out that the statement and Figure S3 as they stand do not adequately support our decision to exclude the K64-D317 salt-bridge in our further investigations. The violin plot shown in Figure S3, visualised as pooled data from unbiased 1 μs triplicates, does indeed not rule out a scenario where the salt bridge only formed late in our simulations (or only in some replicates), but then is stable. Therefore, in our revision, we will include the appropriate time-series of the salt bridge distances, showing how K64-D317 is initially stable but then falls apart in replicate 1, and is transiently formed and disengaged across the trajectories in replicates 2 and 3. We will also remake the data for this plot as we discovered a bug in the relevant analysis script that meant the D170-K642 distance was not calculated accurately. The results are however almost identical, and our conclusions remain.

      (2) Similarly, on page 4, it is stated that "whether by protonation or mutation - the extracellular gate only opens spontaneously when both the H87 interaction network and D342-R206 are perturbed (Figure S5)." I do not agree with this assessment. The authors need to be aware of the limitations of this approach. Consider "WT H87-prot" and "D342A H87-prot": when D342 residue is mutated, in one out of 3 simulations, we see the opening of the gate within 1 us. When D342 residue is not mutated we do not see the opening in any of the 3 simulations within 1 us. It is quite likely that if rather than 3 we have 10 simulations or rather than 1 us we have 10 us simulations, the 0/3 to 1/3 changes significantly. I do not find this argument and conclusion compelling at all.

      If the conclusions were based on that alone, then we would agree. However, this section of work covers merely the observations of the initial unbiased simulations which we go on to test/explore with enhanced sampling in the rest of the paper, and which then lead us to the eventual conclusions.

      Figure S5 shows the results from triplicate 1 μs-long trajectories as violin-plot histograms of the extracellular gate opening distance, also indicating the first and final frames of the trajectories as connected by an arrow for orientation – a format we chose for intuitively comparing 48 trajectories in one plot. The reviewer reads the plot correctly when they analyse the “WT H87-prot” vs “D342A H87-prot” conditions. In the former case, no spontaneous opening in unbiased MD is taking place, whereas when D342 is mutated to alanine in addition to H87 protonation, we see spontaneous transition in 1 out of 3 replicates. However, the reviewer does not seem to interpret the statement in question in our paper (“the extracellular gate only opens spontaneously when both the H87 interaction network and D342-R206 are perturbed”) in the way we intended it to be understood. We merely want to note here a correlation in the unbiased dataset we collected at this stage, and indeed the one spontaneous opening in the case comparison picked out by the reviewer is in the condition where both the H87 interaction network and D342-R206 are perturbed. In noting this we do not intend to make statistically significant statements from the limited dataset. Instead, we write that “these simulations show a large amount of stochasticity and drawing clean conclusions from the data is difficult”. We do however stand by our assessment that from this limited data we can “already appreciate a possible mechanism where protons move down the transporter pore” – a hypothesis we investigate more rigorously with enhanced sampling in the rest of the paper. We will revise the section in question to make clearer that the unbiased MD is only meant to give an initial hypothesis here to be investigated in more detail in the following sections. In doing so, we will also incorporate, as we had not done before, the case (not picked out by the reviewer here but concerning the same figure) of S321A & H87 prot. In the third replicate, this shows partial gate opening towards the end of the unbiased trajectory (despite D342 not being affected), highlighting further the stochastic nature that makes even clear correlative conclusions difficult to draw.

      (3) While the MEMENTO methodology is novel and interesting, the method is presented as flawless in the manuscript, which is not true at all. It is stated on Page 5 with regards to the path generated by MEMENTO that "These paths are then by definition non-hysteretic." I think this is too big of a claim to say the paths generated by MEMENTO are non-hysteretic by definition. This claim is not even mentioned in the original MEMENTO paper. What is mentioned is that linear interpolation generates a hysteresis-free path by definition. There are two important problems here: (a) MEMENTO uses the linear interpolation as an initial step but modifies the intermediates significantly later so they are no longer linearly interpolated structures and thus the path is no longer hysteresisfree; (b) a more serious problem is the attribution of by-definition hysteresis-free features to the linearly interpolated states. This is based on conflating the hysteresis-free and unique concepts. The hysteresis in MD-based enhanced sampling is related to the presence of barriers in orthogonal space. For instance, one may use a non-linear interpolation of any type and get a unique pathway, which could be substantially different from the one coming from the linear interpolation. None of these paths will be hysteresis-free necessarily once subjected to MD-based enhanced sampling techniques.

      We certainly do not intend to claim that the MEMENTO method is flawless. The concern the reviewer raises around the statement "These paths are then by definition non-hysteretic" is perhaps best addressed by a clarification of the language used and considering how MEMENTO is applied in this work.

      Hysteresis in the most general sense denotes the dependence of a system on its history, or – more specifically – the lagging behind of the system state with regards to some physical driver (for example the external field in magnetism, whence the term originates). In the context of biased MD and enhanced sampling, hysteresis commonly denotes the phenomenon where a path created by a biased dynamics method along a certain collective variable lags behind in phase space in slow orthogonal degrees of freedom (see Figure 1 in Lichtinger and Biggin 2023, https://doi.org/10.1021/acs.jctc.3c00140). When used to generate free energy profiles, this can manifest as starting state bias, where the conformational state that was used to seed the biased dynamics appears lower in free energy than alternative states. Figure S6 shows this effect on the PepT2 system for both steered MD (heavy atom RMSD CV) + umbrella sampling (tip CV) and metadynamics (tip CV). There is, in essence, a coupled problem: without an appropriate CV (which we did not have to start with here), path generation that is required for enhanced sampling displays hysteresis, but the refinement of CVs is only feasible when paths connecting the true phase space basins of the two conformations are available. MEMENTO helps solve this issue by reconstructing protein conformations along morphing paths which perform much better than steered MD paths with respect to giving consistent free energy profiles (see Figure S7 and the validation cases in the MEMENTO paper), even if the same CV is used in umbrella sampling.

      There are still differences between replicates in those PMFs, indicating slow conformational flexibility propagated from end-state sampling through MEMENTO. We use this to refine the CVs further with dimensionality reduction (see the Method section and Figure S8), before moving to 2D-umbrella sampling (figure 3). Here, we think, the reviewer’s point seems to bear. The MEMENTO paths are ‘non-hysteretic by definition’ with respect to given end states in the sense that they connect (by definition) the correct conformations at both end-states (unlike steered MD), which in enhanced sampling manifests as the absence of the strong starting-state bias we had previously observed (Figure S7 vs S6). They are not, however, hysteresis-free with regards to how representative of the end-state conformational flexibility the structures given to MEMENTO really were, which is where the iterative CV design and combination of several MEMENTO paths in 2D-PMFs comes in.

      We also cannot make a direct claim about whether in the transition region the MEMENTO paths might be separated from the true (lower free energy) transition paths by slow orthogonal degrees of freedom, which may conceivably result in overestimated barrier heights separating two free energy basins. We cannot guarantee that this is not the case, but neither in our MEMENTO validation examples nor in this work have we encountered any indications of a problem here.

      We hope that the reviewer will be satisfied by our revision, where we will replace the wording in question by a statement that the MEMENTO paths do not suffer from hysteresis that is otherwise incurred as a consequence of not reaching the correct target state in the biased run (in some orthogonal degrees of freedom).

    2. Reviewer #3 (Public Review):

      Summary:

      Lichtinger et al. have used an extensive set of molecular dynamics (MD) simulations to study the conformational dynamics and transport cycle of an important member of the proton-coupled oligopeptide transporters (POTs), namely SLC15A2 or PepT2. This protein is one of the most well-studied mammalian POT transporters that provides a good model with enough insight and structural information to be studied computationally using advanced enhanced sampling methods employed in this work. The authors have used microsecond-level MD simulations, constant-PH MD, and alchemical binding free energy calculations along with cell-based transport assay measurements; however, the most important part of this work is the use of enhanced sampling techniques to study the conformational dynamics of PepT2 under different conditions.

      The study attempts to identify links between conformational dynamics and chemical events such as proton binding, ligand-protein interactions, and intramolecular interactions. The ultimate goal is of course to understand the proton-coupled peptide and drug transport by PepT2 and homologous transporters in the solute carrier family.

      Some of the key results include<br /> (1) Protonation of H87 and D342 initiate the occluded (Occ) to the outward-facing (OF) state transition.

      (2) In the OF state, through engaging R57, substrate entry increases the pKa value of E56 and thermodynamically facilitates the movement of protons further down.

      (3) E622 is not only essential for peptide recognition but also its protonation facilitates substrate release and contributes to the intracellular gate opening. In addition, cell-based transport assays show that mutation of residues such as H87 and D342 significantly decreases transport activity as expected from simulations.

      Strengths:

      (1) This is an extensive MD-based study of PepT2, which is beyond the typical MD studies both in terms of the sheer volume of simulations as well as the advanced methodology used. The authors have not limited themselves to one approach and have appropriately combined equilibrium MD with alchemical free energy calculations, constant-pH MD, and geometry-based free energy calculations. Each of these 4 methods provides a unique insight regarding the transport mechanism of PepT2.

      (2) The authors have not limited themselves to computational work and have performed experiments as well. The cell-based transport assays clearly establish the importance of the residues that have been identified as significant contributors to the transport mechanism using simulations.

      (3) The conclusions made based on the simulations are mostly convincing and provide useful information regarding the proton pathway and the role of important residues in proton binding, protein-ligand interaction, and conformational changes.

      Weaknesses:

      (1) Some of the statements made in the manuscript are not convincing and do not abide by the standards that are mostly followed in the manuscript. For instance, on page 4, it is stated that "the K64-D317 interaction is formed in only ≈ 70% of MD frames and therefore is unlikely to contribute much to extracellular gate stability." I do not agree that 70% is negligible. Particularly, Figure S3 does not include the time series so it is not clear whether the 30% of the time where the salt bridge is broken is in the beginning or the end of simulations. For instance, it is likely that the salt bridge is not initially present and then it forms very strongly. Of course, this is just one possible scenario but the point is that Figure S3 does not rule out the possibility of a significant role for the K64-D317 salt bridge.

      (2) Similarly, on page 4, it is stated that "whether by protonation or mutation - the extracellular gate only opens spontaneously when both the H87 interaction network and D342-R206 are perturbed (Figure S5)." I do not agree with this assessment. The authors need to be aware of the limitations of this approach. Consider "WT H87-prot" and "D342A H87-prot": when D342 residue is mutated, in one out of 3 simulations, we see the opening of the gate within 1 us. When D342 residue is not mutated we do not see the opening in any of the 3 simulations within 1 us. It is quite likely that if rather than 3 we have 10 simulations or rather than 1 us we have 10 us simulations, the 0/3 to 1/3 changes significantly. I do not find this argument and conclusion compelling at all.

      (3) While the MEMENTO methodology is novel and interesting, the method is presented as flawless in the manuscript, which is not true at all. It is stated on Page 5 with regards to the path generated by MEMENTO that "These paths are then by definition non-hysteretic." I think this is too big of a claim to say the paths generated by MEMENTO are non-hysteretic by definition. This claim is not even mentioned in the original MEMENTO paper. What is mentioned is that linear interpolation generates a hysteresis-free path by definition. There are two important problems here: (a) MEMENTO uses the linear interpolation as an initial step but modifies the intermediates significantly later so they are no longer linearly interpolated structures and thus the path is no longer hysteresis-free; (b) a more serious problem is the attribution of by-definition hysteresis-free features to the linearly interpolated states. This is based on conflating the hysteresis-free and unique concepts. The hysteresis in MD-based enhanced sampling is related to the presence of barriers in orthogonal space. For instance, one may use a non-linear interpolation of any type and get a unique pathway, which could be substantially different from the one coming from the linear interpolation. None of these paths will be hysteresis-free necessarily once subjected to MD-based enhanced sampling techniques.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1

      (1) Given the low trial numbers, and the point of sequential vs clustered reactivation mentioned in the public review, it would be reassuring to see an additional sanity check demonstrating that future items that are currently not on-screen can be decoded with confidence, and if so, when in time the peak reactivation occurs. For example, the authors could show separately the decoding accuracy for near and far items in Fig. 5A, instead of plotting only the difference between them.

      We have now added the requested analysis showing the raw decoded probabilities for near and distant items separately in Figure 5A. We have also chosen to replace Figure 5B with the new figure as we think it provides more information than the previous Figure 5B. Instead, we have moved Figure 5B to the supplement. The median peak decoded accuracy for near and distant items is equivalent. We have added the following description to the figure:

      “Decoded raw probabilities for off-screen items, that were up to two steps ahead of the current stimulus cue (‘near’,) vs. distant items that were more than two steps away on the graph, on trials with correct answers. The median peak decoded probability for near and distant items was at the same time point for both probability categories. Note that displayed lines reflect the average probability while, to eliminate influence of outliers, the peak displays the median.”

      (2) The non-sequential reactivation analyses often use a time window of peak decodability, and it was not entirely clear to me what data this time window is determined on, e.g., was it determined based on all future reactivations irrespective of graph distance? This should be clarified in the methods.

      Thank you for raising this. We now clarify this in the relevant section to read: “First, we calculated a time point of interest by computing the peak probability estimate of decoders across all trials, i.e., the average probability for each timepoint of all trials (except previous onscreen items) of all distances, which is equivalent to the peak of the differential reactivation analysis”

      (3) Fig 4 shows evidence for forward and backward sequential reactivation, suggesting that both forward and backward replay peak at a lag of 40-50msec. It would be helpful if this counterintuitive finding could be picked up in the discussion, explaining how plausible it is, physiologically, to find forward and backward replay at the same lag, and whether this could be an artifact of the TDLM method.

      This is an important point and we agree that it appears counterintuitive. However, we would highlight this exact time range has been reported in previous studies, though t never for both forward and backward replay. We now include a discussion of this finding. The section now reads:

      “[… ] Even though we primarily focused on the mean sequenceness scores across time lags, there appears s to be a (non-significant) peak at 40-60 milliseconds. While simultaneous forward and backward replay is theoretically possible, we acknowledge that it is somewhat surprising and, given our paradigm, could relate to other factors such as autocorrelations (Liu, Dolan, et al., 2021).”

      (4) It is reported that participants with below 30% decoding accuracy are excluded from the main analyses. It would be helpful if the manuscript included very specific information about this exclusion, e.g., was the criterion established based on the localizer cross-validated data, the temporal generalisation to the cued item (Fig. 2), or only based on peak decodability of the future sequence items? If the latter, is it applied based on near or far reactivations, or both?

      We now clarify this point to include more specific information, which reads:

      “[…] Therefore, we decided a priori that participants with a peak decoding accuracy of below 30% would be excluded from the analysis (nine participants in all) as obtained from the cross-validation of localizer trials”

      (5) Regarding the low amount of data for the reactivation analysis, the manuscript should be explicit about the number of trials available for each participant. For example, Supplemental Fig. 1 could provide this information directly, rather than the proportion of excluded trials.

      We have adapted the plot in the supplement to show the absolute number of rejected epochs per participant, in addition to the ratio.

      (6) More generally, the supplements could include more detailed information in the legends.

      We agree and have added more extensive explanation of the plots in the supplement legends.

      (7) The choice of comparing the 2 nearest with all other future items in the clustered reactivation analysis should be better motivated, e.g., was this based on the Wimmer et al. (2020) study?

      We have added our motivation for taking the two nearest items and contrasting them with the items further away. The paragraph reads:

      “[…] We chose to combine the following two items for two reasons: First, this doubled the number of included trials; secondly, using this approach the number of trials for each category (“near” and “distant”) was more balanced. […]”

      Reviewer 2

      (1) Focus exclusively on retrieval data (and here just on the current image trials).

      If I understand correctly, you focus all your analyses (behavioural as well as MEG analyses) on retrieval data only and here just on the current image trials. I am surprised by that since I see some shortcomings due to that. These shortcomings can likely be addressed by including the learning data (and predecessor image trials) in your analyses.

      a) Number of trials: During each block, you presented each of the twelve edges once. During retrieval, participants then did one "single testing session block". Does that mean that all your results are based on max. 12 trials? Given that participants remembered, on average, 80% this means even fewer trials, i.e., 9-10 trials?

      This is correct and a limitation of the paper. However, while we used only correct trials for the reactivation analysis, the sequential analysis was conducted using all trials disregarding the response behaviour. To retain comparability with previous studies we mainly focused on data from after a consolidation phase. Nevertheless, despite the trial limitation we consider the results are robust and worth reporting. Additionally, based on the suggestion of the referee, we now include results from learning blocks (see below).

      b) Extend the behavioural and replay/reactivation analysis to predecessor images.

      Why do you restrict your analyses to the current image trials? Especially given that you have such a low trial number for your analyses, I was wondering why you did not include the predecessor trials (except the non-deterministic trials, like the zebra and the foot according to Figure 2B) as well.

      We agree it would be great to increase power by adding the predecessor images to the current image cue analysis, excluding the ambiguous trials, we did not do so as we considered the underlying retrieval processes of these trial types are not the same, i.e. cannot be simply combined. Nevertheless, we have performed the suggested analysis to check if it increases our power. We found, that the reactivation effect is robust and significant at the same time point of 220-230 ms. However, the effect size actually decreased: While before, peak differential reactivation was at 0.13, it is now at 0.07. This in fact makes conceptual sense. We suspect that the two processes that are elicited by showing a single cue and by showing a second, related, cue are distinct insofar as the predecessor image acts as a primer for the current image, potentially changing the time course/speed of retrieval. Given our concerns that the two processes are not actually the same we consider it important to avoid mixing these data.

      We have added a statement to the manuscript discussing this point. The section reads:

      “Note that we only included data from the current image cue, and not from the predecessor image cue, as we assume the retrieval processes differ and should not be concatenated.”

      c) Extend the behavioural and replay/reactivation analysis to learning trials.

      Similar to point 1b, why did you not include learning trials in your analyses?

      The advantage of including (correct and incorrect) learning trials has the advantage that you do not have to exclude 7 participants due to ceiling performance (100%).

      Further, you could actually test the hypothesis that you outline in your discussion: "This implies that there may be a switch from sequential replay to clustered reactivation corresponding to when learned material can be accessed simultaneously without interference." Accordingly, you would expect to see more replay (and less "clustered" reactivation) in the first learning blocks compared to retrieval (after the rest period).

      To track reactivation and replay over the course of learning is a great idea. We have given a lot of thought as to how to integrate these findings but have not found a satisfying solution. Thus, analysis of the learning data turned out to be quite tricky: We decided that each participant should perform as many blocks as necessary to reach at least 80% (with a limit of six and lower bound of two, see Supplement figure 4). Indeed, some participant learned 100% of the sequence after one block (these were mostly medical students, learning things by hard is their daily task). With the benefit of hindsight, we realise our design means that different blocks are not directly comparable between participants. In theory, we would expect that replay emerges in parallel with learning and then gradually changes to clustered reactivation as memory traces become consolidated/stronger. However, it is unclear when replay should emerge and when precisely a switch to clustered reactivation would happen. For this reason, we initially decided not to include the learning trials into the paper.

      Nevertheless, to provide some insight into the learning process, and to see how consolidation impacts differential reactivation and replay, we have split our data into pre and post resting state, aggregating all learning trials of each participant. While this does not allow us to track processes on a block basis, it does offer potential (albeit limited) insight into the hypothesis we outline in the discussion.

      For reactivation, we see emergence of a clear increase, further strengthening the outlined hypothesis, however, for replay the evidence is less clear, as we do not know over how many learning blocks replay is expected.

      We calculated individual trajectories of how reactivation and replay changes from learning to retrieval and related these to performance. Indeed, we see an increase of reactivation is nominally associated with higher learning performance, while an increase in replay strength is associated with lower performance (both non-significant). However, due to the above-mentioned reasons we think it would premature to add this weak evidence to the paper.

      To mitigate problems of experiment design in relation to this question we are currently implementing a follow-study, where we aim to normalize the learning process across participants and index how replay/reactivation changes over the course of learning and after consolidation.

      We have added plots showing clustered reactivation sequential replay measures during learning (Figure 5D and Supplement 8)

      The added section(s) now read:

      “To provide greater detail on how the 8-minute consolidation period affected reactivation we, post-hoc, looked at relevant measures across learning trials in contrast to retrieval trials. For all learning trials, for each participant, we calculated differential reactivation for the same time point we found significant in the previous analysis (220-260 milliseconds). On average, differential reactivation probability increased from pre to post resting state (Figure 5D). […]

      Nevertheless, even though our results show a nominal increase in reactivation from learning to retrieval (see Figure 5D), due to experimental design features our data do not enable us to test for an hypothesized switch for sequential replay (see also “limitations” and Supplement 8).”

      d) Introduction (last paragraph): "We examined the relationship of graph learning to reactivation and replay in a task where participants learned a ..." If all your behavioural analyses are based on retrieval performance, I think that you do not investigate graph learning (since you exclusively focus the analyses on retrieving the graph structure). However, relating the graph learning performance and replay/reactivation activity during learning trials (i.e., during graph learning) to retrieval trials might be interesting but beyond the scope of this paper.

      We agree. We have changed the wording to be more accurate. Indeed, we do not examine graph learning but instead examine retrieval from a graph, after graph learning. The mentioned sentence now read

      “[…] relationship of retrieval from a learned graph structure to reactivation [...]”

      e) It is sometimes difficult to follow what phase of the experiment you refer to since you use the terms retrieval and test synonymously. Not a huge problem at all but maybe you want to stick to one term throughout the whole paper.

      Thank you for pointing this out. We have now adapted the manuscript to exclusively refer to “retrieval” and not to “test”.

      (2) Is your reactivation clustered?

      In Figure 5A, you compare the reactivation strength of the two items following the cue image (i.e., current image trials) with items further away on the graph. I do not completely understand why your results are evidence for clustered reactivation in contrast to replay.

      First, it would be interesting to see the reactivation of near vs. distant items before taking the difference (time course of item probabilities).

      (copied answer from response to Reviewer 1, as the same remark was raised)

      We have added the requested analysis showing the raw decoded probabilities for near and distant items separately in Figure 5A. We have chosen to replace Figure 5B with the new figure as we think that it offers more information than the previous Figure 5B. Instead, we have moved Figure 5B to the supplement. The median peak decoded accuracy for near and distant items is equivalent. We have added the following description to the figure:

      “Decoded raw probabilities for off-screen items, that were up to two steps ahead of the current stimulus cue (‘near’,) vs. distant items that were more than two steps away on the graph, on trials with correct answers. The median peak decoded probability for near and distant items was at the same time point for both probability categories. Note that displayed lines reflect the average probability while, to eliminate influence of outliers, the peak displays the median. .”

      Second, could it still be that the first item is reactivated before the second item? By averaging across both items, it becomes not apparent what the temporal courses of probabilities of both items look like (and whether they follow a sequential pattern). Additionally, the Gaussian smoothing kernel across the time dimension might diminish sequential reactivation and favour clustered reactivation. (In the manuscript, what does a Gaussian smoothing kernel of  = 1 refer to?). Could you please explain in more detail why you assume non-sequential clustered reactivation here and substantiate this with additional analyses?

      We apologise for the unclear description. Note the Gaussian kernel is in fact only used for the reactivation analysis and not the replay analysis, so any small temporal successions would have been picked up by the sequential analysis. We now clarify this in the respective section of the sequential analysis and also explain the parameter of delta= 1 in the reactivation analysis section. The paragraph now reads

      “[…] As input for the sequential analysis, we used the raw probabilities of the ten classifiers corresponding to the stimuli. [...]

      […] Therefore, to address this we applied a Gaussian smoothing kernel (using scipy.ndimage.gaussian_filter with the default parameter of σ=1 which corresponds approximately to taking the surrounding timesteps in both direction with the following weighting: current time step: 40%, ±1 step: 25%, ±2 step: 5%, ±3 step: 0.5%) [...]”

      (3) Replay and/or clustered reactivation?

      The relationship between the sequential forward replay, differential reactivation, and graph reactivation analysis is not really apparent. Wimmer et al. demonstrated that high performers show clustered reactivation rather than sequential reactivation. However, you did not differentiate in your differential reactivation analysis between high vs. low performers. (You point out in the discussion that this is due to a low number of low performers.)

      We agree that a split into high vs low performers would have been preferably for our analysis. However, there is one major obstacle that made us opt for a correlational analysis instead: We employed criteria learning, rendering a categorical grouping conceptually biased. Even though not all participants reached the criteria of 80%, our sample did not naturally split between high and low performers but was biased towards higher performance, leaving the groups uneven. The median performance was 83% (mean ~81%), with six of our subjects (~1/4th of included participant) having this exact performance. This makes a median or mean split difficult, as either binning assignment choice would strongly affect the results. We have added a limitations section in which we extensively discuss this shortcoming and reasoning for not performing a median split as in Wimmer et al (2020). The section now reads:

      “There are some limitations to our study, most of which originate from a suboptimal study design. [...], as we performed criteria learning, a sub-group analysis as in Wimmer et al., (2020) was not feasible, as median performance in our sample would have been 83% (mean 81%), with six participants exactly at that threshold. [...]”

      It might be worth trying to bring the analysis together, for example by comparing sequential forward replay and differential reactivation at the beginning of graph learning (when performance is low) vs. retrieval (when performance is high).

      Thank you for the suggestion to include the learning segments, which we think improves the paper quite substantially. However, analysis of the learning data turned out to be quite tricky> We had decided that each participant should perform as many blocks as necessary to reach at least 80% accuracy (with a limit of six and lower bound of two, see Supplement figure 4). Some participants learned 100% of the sequence after one block (these were mostly medical students, learning things by hard is their daily task). This in hindsight is an unfortunate design feature in relation to learning as it means different blocks are not directly comparable between participants.

      In theory, we would expect that replay emerges in parallel with learning and then gradually change to clustered reactivation, as memory traces get consolidated/stronger. However, it is unclear when replay would emerge and when the switch to reactivation would happen. For this reason, we initially decided not to include the learning trials into the paper at all.

      Nevertheless, to give some insight into the learning process and to see how consolidation effects differential reactivation and replay, we have split our data into pre and post resting state, aggregating all learning trials of each participant. While this does not allow us to track measures of interest on a block basis, it gives some (albeit limited) insight into the hypothesis outlined in our discussion.

      For reactivation, we see a clear increase, further strengthening the outlined hypothesis, However, for replay the evidence is less obvious, potentially due to that fact that we do not know across how many learning blocks replay is to be expected.

      The added section(s) now read:

      “To examine how the 8-minute consolidation period affected reactivation we, post-hoc, looked at relevant measures during learning trials in contrast to retrieval trials. For all learning trial, for each participant, we calculated differential reactivation for the time point we found significant during the previous analysis (220-260 milliseconds). On average, differential reactivation probability increased from pre to post resting state (Figure 5D).

      […]

      Nevertheless, even though our results show a nominal increase in reactivation from learning to retrieval (see Figure 5D), our data does not enable us to show an hypothesized switch for sequential replay (see also “limitations” and Supplement 8).”

      Additionally, the main research question is not that clear to me. Based on the introduction, I thought the focus was on replay vs. clustered reactivation and high vs. low performance (which I think is really interesting). However, the title is more about reactivation strength and graph distance within cognitive maps. Are these two research questions related? And if so, how?

      We agree we need to be clearer on this point. We have added two sentences to the introduction, which should address this point. The section now reads:

      “[…] In particular, the question remains how the brain keeps track of graph distances for successful recall and whether the previously found difference between high and low performers also holds true within a more complex graph learning context.”

      (4) Learning the graph structure.

      I was wondering whether you have any behavioural measures to show that participants actually learn the graph structure (instead of just pairs or triplets of objects). For example, do you see that participants chose the distractor image that was closer to the target more frequently than the distractor image that was further away (close vs. distal target comparison)? It should be random at the beginning of learning but might become more biased towards the close target.

      Thanks, this is an excellent suggestion. Our analysis indeed shows that people take the near lure more often than the far lure in later blocks, while it is random in the first block.

      Nevertheless, we have decided to put these data into the supplement and reference it in the text. This is because analysis of the learning blocks is challenging and biased in general. Each participant had a different number of learning blocks based on their learning rate, and this makes it difficult to compare learning across participants. We have tried our best to accommodate and explain these difficulties in the figure legend. Nevertheless, we thank the referee for guidance here and this analysis indeed provides further evidence that participants learned the actual graph structure.

      The added section reads

      “Additionally, we have included an analysis showing how wrong answers participants provided were random in the first block and biased towards closer graph nodes in later blocks. This is consistent with participants actually learning the underlying graph structure as opposed to independent triplets (see figure and legend of Supplement 6 for details).”

      (5) Minor comments

      a) "Replay analysis relies on a successive detection of stimuli where the chance of detection exponentially decreases with each step (e.g., detecting two successive stimuli with a chance of 30% leaves a 9% chance of detecting the replay event). " Could you explain in more detail why 30% is a good threshold then?

      Thank you. We have further clarified the section. As we are working mainly with probabilities, it is useful to keep in mind that accuracy is a class metric that only provides a rough estimate of classifier ability. Alternatively, something like a Top-3-Accuracy would be preferable, but also slightly silly in the context of 10 classes.

      Nevertheless, subtle changes in probability estimates are present and can be picked up by the methods we employ. Therefore, the 30% is a rough lower bound and decided based on pilot data that showed that clean MEG data from attentive participants can usually reach this threshold. The section now reads:

      “(e.g., detecting two successive stimuli with a chance of 30% leaves a 9% chance of detecting a replay event). However, one needs to bear in mind that accuracy is a “winnertakes-all” metric indicating whether the top choice also has the highest probability, disregarding subtle, relative changes in assigned probability. As the methods used in this analysis are performed on probability estimates and not class labels, one can expect that the 30% are a rough lower bound and that the actual sensitivity within the analysis will be higher. Additionally, based on pilot data, we found that attentive participants were able to reach 30% decodability, allowing us to use decodability as a data quality check. “

      b) Could you make explicit how your decoders were designed? Especially given that you added null data, did you train individual decoders for one class vs. all other classes (n = 9 + null data) or one class vs. null data?

      We added detail to the decoder training. The section now reads

      “Decoders were trained using a one-vs-all approach, which means that for each class, a separate classifier was trained using positive examples (target class) and negative examples (all other classes) plus null examples (data from before stimulus presentation, see below). In detail, null data was.”

      c) Why did you choose a ratio of 1:2 for your null data?

      Our choice for using a higher ratio was based upon previous publications reporting better sensitivity of TDLM using higher ratios, as spatial sensor correlations are decreasing. Nevertheless, this choice was not well investigated beforehand. We have added more information to this to the manuscript

      d) You could think about putting the questionnaire results into the supplement if they are sanity checks.

      We have added the questionnaire results. However, due to the size of the tables, we have decided to add them as excel files into the supplementary files of the code repository. We have mentioned the existence file in the publication.

      e) Figure 2. There is a typo in D: It says "Precessor Image" instead of "Predecessor Image".

      Fixed typo in figure.

      f) You write "Trials for the localizer task were created from -0.1 to 0.5 seconds relative to visual stimulus onset to train the decoders and for the retrieval task, from 0 to 1.5 seconds after onset of the second visual cue image." But the Figure legend 3D starts at -0.1 seconds for the retrieval test.

      We have now clarified this. For the classifier cross-validation and transfer sanity check and clustered analysis we used trials from -0.1 to 0.5s, whereas for the sequenceness analysis of the retrieval, we used trials from 0 to 1.5 seconds

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank all reviewers for their thorough assessment and constructive comments. We are glad that the reviewers appreciate that our findings are of interest to the nuclear transport field and that our extension of the use of the RITE methodology can be a valuable tool for the further characterization of NPCs that differ in composition and potentially function. In response to the reviewers’ comments, we have revised the text to incorporate their suggestions and improve overall readability and clarity. Furthermore, we propose to perform a set of additional experiments to address the reviewers’ most important critiques. Below we list our response with the reviewer comments reprinted in dark grey and our response in blue for easier orientation. We have added numbering of the comments for easier orientation.

      Many of the comments made by the reviewers have already been implemented, additional points will be addressed in a revised version of the manuscript as detailed below.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The authors extended the existing recombination-induced tag exchange (RITE) technology to show that they can image a subset of NPCs, improving signal-to-noise ratios for live cell imaging in yeast, and to track the stability or dynamics of specific nuclear pore proteins across multiple cell divisions. Further, the authors use this technology to show that the nuclear basket proteins Mlp1, Mlp2 and Pml39 are stably associated with "old NPCs" through multiple cell cycles. The authors show that the presence of Mlp1 in these "old NPCs" correlates with exclusion of Mlp1-positive NPCs from the nucleolar territory. A surprising result is that basket-less NPCs can be excluded from the non-nucleolar region, an observation that correlates with the presence of Nup2 on the NPC regardless of maturation state of the NPC. In support of the proposal that retention of NPCs via Mlp1 and Nup2 in non-nucleolar regions, simulation data is presented to suggest that basket-less NPCs diffuse faster in the plane of the nuclear envelope.

      However, there are some points that do need addressing:

      Major Points 1. Taking into account that the Nup2 result in Figure 4B forms the basis for one half of the proposed model in Figure 6 regarding the exclusion of NPCs from the nucleolar region of the NE, there is a relatively small amount of data in support of this finding and this proposed model. For example, the only data for Nup2 in the manuscript is a column chart in Figure 4B with no supporting fluorescence microscopy examples for any Nup2 deletion. Further, the Nup60 deletion mutant will have zero basket-containing NPCs, whereas the Nup2 deletion will be a mixture of basket-containing and basket-less NPCs. The only support for the localization of basket-containing NPCs in the Nup2 deletion mutant is through a reference "Since Mlp1-positive NPCs remain excluded from the nucleolar territory in nup2Δ cells (Galy et al., 2004), the homogenous distribution observed in this mutant must be caused predominantly by the redistribution of Mlp-negative NPCs into the nucleolar territory."

      We have already added fluorescent images of the nup2d strain to figure 4A in the preliminary revision.

      In addition, we will repeat the experiment from Galy et al. 2004 to test whether Mlp-positive NPCs are excluded from nucleoli in our hands as well.

      Furthermore, we propose to carry out more experiments to pinpoint which domains of Nup2 contribute to nucleolar exclusion, which will provide more insight into the mechanism behind this effect. We propose to do this by analyzing NPC localization in mutants expressing truncations of Nup2 with deletions for individual domains as their only copy of Nup2. Regardless of whether we find a single domain of Nup2 responsible of a combinatorial action, this experiment will indicate a potential molecular mechanism for nucleolar exclusion.

      1. The authors could consider utilizing this opportunity to discuss their technological innovations in the context of the prior work of Onischenko et al., 2020. This work is referenced for the statement "RITE can be used to distinguish between old and new NPCs" Page 2, Line 43. However, it is not referenced for the statement "We constructed a RITE-cassette that allows the switch from a GFP-labelled protein to a new protein that is not fluorescently labelled (RITE(GFP-to-dark))" despite Onischenko et al., 2020 having already constructed a RITE-cassette for the GFP-to-dark transition. The authors could consider taking this opportunity to instead focus on their innovative approach to apply this technology to decrease the number of fluorescently-tagged NPCs by dilution across multiple cell divisions and to interpret this finding as a measure of the stability of nuclear pore proteins within the broader NPC.

      We apologize for this imprecise citation. We have modified the text to indicate that our RITE cassette was previously used in two publications. It now reads: “We used a RITE-cassette that allows the switch from a GFP-labelled protein to a new protein that is not fluorescently labelled (RITE(GFP-to-dark)) (Onischenko et al., 2020, Kralt et al., 2022). “

      1. The authors could also consider taking this opportunity to discuss their results in the context of the Saccharomyces cerevisiae nuclear pore complex structures published e.g. in Kim et al., 2018, Akey et al., 2022, Akey et al., 2023 in which the arrangement of proteins in the nuclear basket is presented, and also work from the Kohler lab (Mészáros et al., 2015) on how the basket proteins are anchored to the NPC. There is additional literature that also might help provide some perspective to the findings in the current manuscript, such as the observation that a lesser amount of Mlp2 to Mlp1 observed is consistent with prior work (e.g. Kim et al., 2018) and that intranuclear Mlp1 foci are also formed after Mlp1 overexpression (Strambio-de-Castillia et al., 1999).

      Following the reviewer’s suggestion, we extended our discussion of basket Nup stoichiometry and organization in the discussion section including several of the citations mentioned. At this point, we did not see a good way to incorporate discussion about the nuclear Mlp1 foci formed after Mlp1 overexpression. However, this observation is in line with the foci formed in cells lacking Nup60, suggesting that Mlp1 that cannot be incorporated into NPCs forms nuclear foci.

      Minor Points 1. What is the "lag time" of the doRITE switching? Do the authors believe that it is comparable to the approximate 1-hour timeframe following beta-estradiol induction as shown previously in Chen et al. Nucleic Acids Research, Volume 28, Issue 24, 15 December 2000, Page e108, https://doi.org/10.1093/nar/28.24.e108

      Our data (e.g. newRITE, Figure S3B) suggest that the switch occurs on a similar timeframe at

      1. The authors could consider a brief explanation of radial position (um) for the benefit of the reader, in Figures 1E (right panel) and 2B (right panel), perhaps using a diagram to make it easier to understand the X-axis (um).

      To address this, we have now included a diagram and refer to it in the figure legend.

      1. In Figure 1G, would the authors consider changing the vertical axis title and the figure legend wording from "mean number of NPCs per cell" to "mean labeled NPC # per cell" to reflect that what is being characterized are the remaining GFP-bearing NPCs over time?

      Thank you for spotting this inaccuracy. We have changed the label to “mean # of labeled NPCs per cell”.

      1. In Figure 2C, the magenta-labeled protein in the micrographs is not described in the figure or the legend.

      As requested, a description has been added in figure and legend.

      1. In Figure S2A, there is an arrow indicating a Nup159 focus, but this is not described in the figure legend, as is done in Figure 2C.

      A description has been added to the legend.

      1. In Figure S3C, the figure legend does not match the figure. Was this supposed to be designed like Figure 3C and is missing part of the figure? Or is the legend a typographical error?

      We apologize for this error and thank the reviewer for spotting it. The legend has been corrected.

      1. In Figure S4B, the spontaneously recombined RITE (GFP-to-dark) Nup133-V5 appears in the western blot as equally abundant to pre-recombined Nup133-V5-GFP. In the figure legend, this is explained as cells grown in synthetic media without selection to eliminate cells that have lost their resistance marker from the population. In Cheng et al. Nucleic Acids Res. 2000 Dec 15; 28(24): e108, Cre-EBD was not active in the absence of B-estradiol, despite galactose-induced Cre-EBD overexpression. Would the authors be able to comment further on the Cre-Lox RITE system in the manuscript?

      We note that also in the cited publication, cells are grown in the presence of selection to select (as stated in this publication) “against pre-excision events that occur because of low but measurable basal expression of the recombinase”. Although the authors report that spontaneous recombination is reduced with the b-estradiol inducible system (compared to pGAL expression control of the recombinase only), they show negligible spontaneous recombination only within a two-hour time window. Indeed, we also observe low levels of uninduced recombination on a short timeframe, but occasional events can become significant in longer incubation times (e.g. overnight growth) in the absence of selection. It should be noted that in our system, Cre expression is continuously high (TDH3-promoter) and not controlled by an inducible GAL promoter. We have added the information about the promoter controlling Cre-expression in the methods section.

      1. In Figure 6, the authors may want to consider inverting the flow of the cartoon model to start from the wild type condition and apply the deletion mutations at each step to "arrive" at the mutant conditions, rather than starting with mutant conditions and "adding back" proteins.

      Following the suggestions of the reviewer, we have modified our model to more clearly represent the contributions of the different basket components.

      Reviewer #1 (Significance (Required)):

      Recent work has drawn attention to the fact that not all NPCs are structurally or functionally the same, even within a single cell. In this light, the work here from Zsok et al. is an important demonstration of the kind of methodologies that can shed light on the stability and functions of different subpopulations of NPCs. Altogether, these data are used to support an interesting and topical model for Nup2 and nuclear-basket driven retention of NPCs in non-nucleolar regions of the nuclear envelope.

      We thank the reviewer for this positive assessment of our work.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this study, Zsok et al. develop innovative methods to examine the dynamics of individual nuclear pore complexes (NPCs) at the nuclear envelope of budding yeast. The underlying premise is that with the emergence of biochemically distinct NPCs that co-exist in the same cell, there is a need to develop tools to functionally isolate and study them. For example, there is a pool of NPCs that lack the nuclear basket over the nucleolus. Although the nature of this exclusion has been investigated in the past, the authors take advantage of a modification of recombination induced tag exchange (RITE), the slow turnover of scaffold nups, the closed mitosis of budding yeast, and extensive high quality time lapse microscopy to ultimately monitor the dynamics of individual NPCs over the nucleolus. By leveraging genetic knockout approaches and auxin-induced degradation with sophisticated quantitative and rigorous analyses, the authors conclude that there may be two mechanisms dependent on nuclear basket proteins that impact nucleolar exclusion. They also incorporate some computational simulations to help support their conclusions. Overall, the data are of the highest quality and are rigorously quantified, the manuscript is well written, accessible, and scholarly - the conclusions are thus on solid footing.

      We thank the reviewer for this assessment.

      Reviewer #2 (Significance (Required)):

      I have no concerns about the data or the conclusions in this manuscript. However, the significance is not overly clear as there is no major conceptual advance put forward, nor is there any new function suggested for the NPCs over nucleoli. As NPCs are immobile in metazoans, the significance may also be limited to a specialized audience.

      We respectfully disagree with this assessment. It is becoming increasingly clear that NPC variants are also present in other model systems. We characterize the interaction between conserved nuclear components, the NPC, the nucleolus and chromatin. While the specific architecture of the nucleus varies between species, many of these interactions are conserved. For example, Nup50, the homologue of Nup2, interacts with chromatin also in other systems including mammalian cells and thus may contribute to regulating the interplay between the nuclear basket and adjoining chromatin. Furthermore, our work demonstrates the use of a novel approach in the application of RITE that can be useful for other researchers in the field of NPC biology and beyond. For example, doRITE could be applied to study the properties of aged NPCs in the context of young cells. In the revised manuscript, we attempt to better highlight and discuss the conceptual advances of our manuscript.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript of Zsok et al. describes the role of nuclear basket proteins in the distribution and mobility of nuclear pore complexes in budding yeast. In particular, the authors showed that the doRITE approach can be used for the analysis of stable and dynamically associated NUPs. Moreover, it can distinguish individual NUPs and follow the inheritance of individual NPCs from mother to daughter cells. The author's findings highlight that Mlp1, Mlp2, and Pml39 are stably associated with the nuclear pore; deletion of Mlp1-Mlp2 and Nup60 leads to the higher NPC density in the nucleolar territory; and NPCs exhibit increased mobility in the absence of the nuclear basket components.

      The manuscript contains most figures supporting the data, and data supports the conclusions. However, authors need to include better explanations for figures in the text and figure legends. Lack of detailed explanation can pose challenges for non-experts. In addition, the authors jump over figures and shuffle them through the manuscript, which disrupts the flow and coherence of the manuscript.

      We thank the reviewer for pointing this out. We have modified the figure legends throughout the manuscript in an attempt to make them more accessible to the reader. In addition, we will revise the figure order and text as suggested to improve the flow of the manuscript.

      Major comments: 1) The nuclear basket contains Nup1, Nup2, Nup60, Mlp1, and Mlp2 in yeast. Nup60 works as a seed for Mlp1/Mlp2 and Nup2 recruitment and plays a key role in the assembly of nuclear pore basket scaffold (PMID: 35148185). Logically, the authors focused primarily on Nup60 in the current manuscript. However, NUP153 has another ortholog of yeast - Nup1, which has not been studied in this work. I recommend adjusting the title of the manuscript to: Nup60 and Mlp1/Mlp2 regulate the distribution and mobility of nuclear pore complexes in budding yeast. I also suggest discussing why work on Nup1 was not included/performed in the manuscript.

      We have changed the title to “Nuclear basket proteins regulate the distribution and mobility of nuclear pore complexes in budding yeast”. We think that this better captures the essence of our manuscript than listing all four proteins (Mlp1/2, Nup60 and Nup2) in the title.

      We initially focused on the network that is involved in Mlp1/2 interaction at the NPC. However, we agree that it would be interesting to test, whether Nup1 plays a role in the analyzed processes as well. Since Nup1 is essential in our yeast background, we will use auxin-inducible degradation of Nup1 to test its involvement in NPC distribution.

      2) Figure 2B: I suggest choosing a more representative image for Pml39. It looks not like a stable component but rather dynamic as NUP60 or Gle1 based on figure showed in Figure 2B.

      Due to its lower copy number, Pml39 is much more difficult to visualize than the other Nups. To guide the reader, we have now added arrow heads to point to remaining Pml39 foci at the 14 hour timepoint. The 11 hour time point most clearly show that Pml39 is less dynamic than other Nups such as Nup116, Nup60 or Gle1. At this time point, clear dots for Pml39 can be detected, while e.g. Nup116 in the same figure exhibits a more distributed signal and the signal for Nup60 and Gle1 is no longer visible. We will describe this more clearly in our revised manuscript as well.

      3) Depletion of AID-tagged proteins needs to be supported by Western blot analysis with protein-specific antibodies, and PCR results should be included in supplementary data to demonstrate the homozygosity of the strains.

      The correct genomic tagging of the depleted proteins by AID was confirmed by PCR. We will include this PCR analysis in the supplemental data. Please note that we are working with haploid yeast cells. Therefore, all strains only carry a single copy of the genes. Unfortunately, we do not have protein-specific antibodies against the depleted proteins. However, the Mlp1-mislocalization phenotype demonstrates that depletion of Nup60 is successful and the depletion strain for PolII depletion was used and characterized previously (PMID: 31753862, PMID: 36220102).

      4) Figure 5B: Snapshots of images from the movie are required. There are no images, only quantifications.

      We have replaced the supplemental movie with a movie showing the detection by Trackmate as well as overlaid tracks. As requested, a snapshot of this movie was inserted in figure 5B. We have also moved the example tracks from the supplement to the main figure. Furthermore, we will deposit the tracking dataset in the ETH Research Collection to make it available to the community.

      5) Description of figure legends is more technical than supporting/explaining the figure. For example, below my suggestions for Figure 1D. Please, consider more detailed explanation for other figures. (D) Left: Schematic of the RITE cassette. NUP of interest is tagged with V5 tag and eGFP fluorescent protein where LoxP sites flank eGFP. Before the beta-estradiol-induced recombination, the old NPCs are marked with eGFP signal, whereas new NPCs lack an eGFP signal after the recombination. ORF: open reading frame; V5: V5-tag; loxP: loxP recombination site; eGFP: enhanced green fluorescent protein. Right: doRITE assay schematic of stable or dynamic Nup behavior over cell divisions in yeast after the recombination.

      We have modified the figure legends throughout the manuscript to make them more explanatory and helpful for the reader.

      In addition, I recommend highlighting the result in the title of the figures. Please, re-consider titles for Figure S3.

      We have revised the title for Figure S3 to state a result. It now reads: “Mlp1 truncations localize preferentially to non-nucleolar NPCs.”

      Minor: i) P.1 Line 31. Extra period symbol before the "(Figure 1A)".

      Fixed

      ii) P.2 Line 10. Inconsistent writing of PML39 and MLP1. Both genes are capitalized. The same for P.4 Line 16. In some cases all letters are capitalized in other only the first one.

      We are following the official yeast gene nomenclature by spelling gene names in italicized capitals and protein names with only the first letter capitalized. We are sorry that this can be confusing for readers more familiar with other model systems but we adhere to the accepted yeast nomenclature standards.

      iii) P.2 Line 18-22. The sentence is too long and hard to read. I recommend splitting it into two sentences.

      We agree and have fixed this.

      iv) P.2-3 Line 46-47. The sentence is unclear. Suggestion: We expected that successive cell divisions would dilute the signal of labelled and stably associated with the NPC nucleoporins. By contrast, ...

      We have modified the sentence to read: “When tagging a Nup that stably associates with the NPC, we expected that successive cell divisions would dilute labelled NPCs by inheritance to both mother and daughter cells leading to a low density of labelled NPCs. By contrast,…”

      v) P.4 Line 17-21. Please, consider adding extra information and clarifying lines 19-21. For example, in Line 19 Figure 2B you can add that the reader needs to compare row 1 and row 4.

      Thank you, we have fixed this as suggested.

      vi) P. 5 Line 15. When a number begins a sentence, that number should always be spelled out. You can pe-phrase the sentence to avoid it. Also, I recommend adding an explanation/hypothesis of why new NPCs are less frequently detected in nucleolar territory.

      We have formatted the text. Interestingly, new NPCs are more frequently detected in the nucleolar territory. We have reformulated this section to make it clearer, also in response to the next comment.

      vii) P.5 Line 17-22. I recommend re-phrasing these two sentences. Logically, it is clear that Mlp1/Mlp2 loss mimics "old NPCs" to look more like "new NPCs", and for that reason, they are more frequently included in the nucleolar territory, but it is not clear when you read these two sentences from the first time.

      We have reformulated this section to make it clearer.

      viii) P6. Line 16. No figure supporting data on graph (Figure 3B).

      We have added fluorescent images of the nup2d strain to figure 4A.

      ix) P.7 Line 10-13. The sentence is unclear.

      We have shortened the sentence and moved part of the content to the discussion in the next paragraph.

      x) P.13,14 etc. If 0h timepoint has been used for normalization, why is it present on the graph?

      The 0h timepoint is shown for comparison and to illustrate the standard deviation in the data.

      xi) P.15. Line 32-33. There is no image here. Potentially wrong description of the figure.

      Thank you for spotting this. This was fixed.

      xii) Figures: - Inconsistent labeling of figures. For example, Fig.1, Fig.1S, Figure 2 etc.

      Thank you, this has been corrected.

      • Inconsistent labeling of figures. For example, Fig.1 G "mean number of NPCs per cell" - no capitalization of the first letter. Fig.1S "Fraction in population" is capitalize d. In general, titles of axis should be capitalized.

      Thank you for spotting this. This was fixed.

      Suggestions for Figure 1D and Figure 6 are attached as a separate file.

      We thank the reviewer for their suggestions to improve these figures. We have taken their recommendation and revised the figures accordingly (see also response to reviewer 1, minor point 8).

      Reviewer #3 (Significance (Required)):

      Zsok et al. used the recombination-induced tag exchange (RITE) approach, which is an interesting and powerful method to follow individual NUPs over time with respect to their localization and abundance. This approach has been used before in PMID: 36515990 to distinguish pre-existing and newly synthesized Nup2 populations and has been extended to other basket NUPs in this work. Using this method, the authors support the earlier data on basket nucleoporins and highlight new insights on how basket nucleoporins regulate NPCs distribution and mobility. Overall, the manuscript provides new details on the stability of nucleoporins in yeast and how these data align with the mass spectrometry and FRAP data performed earlier in other studies. The limitation of this study is the absence of data on Nup1. It was unclear why these data were not present. Additional data can be included on the dynamics of Pml39, for example, using the FRAP method. The dynamic of Pml39 at the pore was shown only using the doRITE method.

      As suggested, we propose to test whether Nup1 influences NPC organization (see also above). Unfortunately, we are not able to provide orthologous data for the dynamics of Pml39. As we have discussed in the manuscript, FRAP is not suitable for the analysis of the dynamics of most nucleoporins in yeast due to the high lateral mobility of NPCs in the nuclear envelope and has previously generated misleading results for Mlp1. Furthermore, the low expression levels of Pml39 will make it difficult to obtain reliable FRAP curves for this protein. We therefore do not think that adding FRAP experiments with Pml39 will provide valuable insight.

      However, in addition to the Pml39 doRITE result itself, our observation that the Pml39-dependent pool of Mlp1 exhibits stable association with the NPC supports the interpretation of Pml39 as a stable protein as well.

      In general, this study represents a unique research study of basic research on nuclear pore proteins that will be of general interest to the nuclear transport field.

      Field of expertise: nuclear-cytoplasmic transport, nuclear pore, inducible protein degradation. I do not have sufficient expertise in ExTrack.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This important study advances our understanding of how past and future information is jointly considered in visual working memory by studying gaze biases in a memory task that dissociates the locations during encoding and memory tests. The evidence supporting the conclusions is convincing, with state-of-the-art gaze analyses that build on a recent series of experiments introduced by the authors. This work, with further improvements incorporating the existing literature, will be of broad interest to vision scientists interested in the interplay of vision, eye movements, and memory.

      We thank the Editors and the Reviewers for their enthusiasm and appreciation of our task, our findings, and our article. We also wish to thank the Reviewers for their constructive comments that we have embraced to improve our article. Please find below our point-by-point responses to this valuable feedback, where we also state relevant revisions that we have made to our article.

      In addition, please note that we have now also made our data and code publicly available.

      Reviewer 1, Comments:

      In this study, the authors offer a fresh perspective on how visual working memory operates. They delve into the link between anticipating future events and retaining previous visual information in memory. To achieve this, the authors build upon their recent series of experiments that investigated the interplay between gaze biases and visual working memory. In this study, they introduce an innovative twist to their fundamental task. Specifically, they disentangle the location where information is initially stored from the location where it will be tested in the future. Participants are tasked with learning a novel rule that dictates how the initial storage location relates to the eventual test location. The authors leverage participants' gaze patterns as an indicator of memory selection. Intriguingly, they observe that microsaccades are directed toward both the past encoding location and the anticipated future test location. This observation is noteworthy for several reasons. Firstly, participants' gaze is biased towards the past encoding location, even though that location lacks relevance to the memory test. Secondly, there's a simultaneous occurrence of an increased gaze bias towards both the past and future locations. To explore this temporal aspect further, the authors conduct a compelling analysis that reveals the joint consideration of past and future locations during memory maintenance. Notably, microsaccades biased towards the future test location also exhibit a bias towards the past encoding location. In summary, the authors present an innovative perspective on the adaptable nature of visual working memory. They illustrate how information relevant to the future is integrated with past information to guide behavior.

      Thank you for your enthusiasm for our article and findings as well as for your constructive suggestions for additional analyses that we respond to in detail below.

      This short manuscript presents one experiment with straightforward analyses, clear visualizations, and a convincing interpretation. For their analysis, the authors focus on a single time window in the experimental trial (i.e., 0-1000 ms after retro cue onset). While this time window is most straightforward for the purpose of their study, other time windows are similarly interesting for characterizing the joint consideration of past and future information in memory. First, assessing the gaze biases in the delay period following the cue offset would allow the authors to determine whether the gaze bias towards the future location is sustained throughout the entire interval before the memory test onset. Presumably, the gaze bias towards the past location may not resurface during this delay period, but it is unclear how the bias towards the future location develops in that time window. Also, the disappearance of the retro cue constitutes a visual transient that may leave traces on the gaze biases which speaks again for assessing gaze biases also in the delay period following the cue offset.

      Thank you for raising this important point. We initially focused on the time window during the cue given that our central focus was on gaze-biases associated with mnemonic item selection. By zooming in on this window, we could best visualize our main effects of interest: the joint selection (in time) of past and future memory attributes.

      At the same time, we fully agree that examining the gaze biases over a more extended time window yields a more comprehensive view of our data. To this end, we have now also extended our analysis to include a wider time range that includes the period between cue offset (1000 ms after cue onset) and test onset (1500 ms after cue onset). We present these data below. Because we believe our future readers are likely to be interested in this as well, we have now added this complementary visualization as Supplementary Figure 4 (while preserving the focus in our main figure on the critical mnemonic selection period of interest).

      Author response image 1.

      Supplementary Figure 4. Gaze biases in extended time window as a complement to Figure 1 and Supplementary Figure 2. This extended analysis reveals that while the gaze bias towards the past location disappears around 600 ms after cue onset, the gaze bias towards the future location persists (panel a) and that while the early (joint) future bias occurs predominantly in the microsaccade range below 1 degree visual angle, the later bias to the future location incorporates larger eye movement that likely involve preparing for optimally perceiving the anticipated test stimulus (panel b).

      This extended analysis reveals that while the gaze bias towards the past location disappears around 600 ms after cue onset (consistent with our prior reports of this bias), the gaze bias towards the future location persists. Moreover, as revealed by the data in panel b above, while the early (joint) future bias occurs predominantly in the microsaccade range below 1 degree visual angle, the later bias to the future location incorporates larger eye movement that likely involve preparing for optimally perceiving the anticipated test stimulus.

      We now also call out these additional findings and figure in our article:

      Page 2 (Results): “Gaze biases in both axes were driven predominantly by microsaccades (Supplementary Fig. 2) and occurred similarly in horizontal-to-vertical and vertical-tohorizontal trials (Supplementary Fig. 3). Moreover, while the past bias was relatively transient, the future bias continued to increase in anticipation of the of the test stimulus and increasingly incorporated eye-movements beyond the microsaccade range (see Supplementary Fig. 4 for a more extended time range)”.

      Moreover, assessing the gaze bias before retro-cue onset allows the authors to further characterize the observed gaze biases in their study. More specifically, the authors could determine whether the future location is considered already during memory encoding and the subsequent delay period (i.e., before the onset of the retro cue). In a trial, participants encode two oriented gratings presented at opposite locations. The future rule indicates the test locations relative to the encoding locations. In their example (Figure 1a), the test locations are shifted clockwise relative to the encoding location. Thus, there are two pairs of relevant locations (each pair consists of one stimulus location and one potential test location) facing each other at opposite locations and therefore forming an axis (in the illustration the axis would go from bottom left to top right). As the future rule is already known to the participants before trial onset it is possible that participants use that information already during encoding. This could be tested by assessing whether more microsaccades are directed along the relevant axis as compared to the orthogonal axis. The authors should assess whether such a gaze bias exists already before retro cue onset and discuss the theoretical consequences for their main conclusions (e.g., is the future location only jointly used if the test location is implicitly revealed by the retro cue).

      Thank you – this is another interesting point. We fully agree that additional analysis looking at the period prior to retrocue onset may also prove informative. In accordance with the suggested analysis, we have therefore now also analysed the distribution of saccade directions (including in the period from encoding to retrocue) as a function of the future rule (presented below, and now also included as Supplementary Fig. 5). Complementary recent work from our lab has shown how microsaccade directions can align to the axis of memory contents during retention (see de Vries & van Ede, eNeuro, 2024). Based on this finding, one may predict that if participants retain the items in a remapped fashion, their microsaccades may align with the axis of the future rule, and this could potentially already happen prior to cue onset.

      These complementary analyses show that saccade directions are predominantly influenced by the encoding locations rather than the test locations, as seen most clearly by the saccade distribution plots in the middle row of the figure below. To obtain time-courses, we categorized saccades as occurring along the axis of the future rule or along the orthogonal axis (bottom row of the figure below). Like the distribution plots, these time course plots also did not reveal any sign of a bias along the axis of the future rule itself.

      Importantly, note how this does not argue against our main findings of joint selection of past and future memory attributes, as for that central analysis we focused on saccade biases that were specific to the selected memory item, whereas the analyses we present below focus on biases in the axes in which both memory items are defined; not only the cued/selected memory item.

      Author response image 2.

      Supplementary Figure 5. Distribution of saccade directions relative to the future rule from encoding onset. (Top panel) The spatial layouts in the four future rules. (Middle panel) Polar distributions of saccades during 0 to 1500 ms after encoding onset (i.e., the period between encoding onset and cue onset). The purple quadrants represent the axis of the future rule and the grey quadrants the orthogonal axis. (Bottom panel) Time courses of saccades along the above two axes. We did not observe any sign of a bias along the axis of the future rule itself.

      We agree that these additional results are important to bring forward when we interpret our findings. Accordingly, we now mention these findings at the relevant section in our Discussion:

      Page 5 (Discussion): “First, memory contents could have directly been remapped (cf. 4,24–26) to their future-relevant location. However, in this case, one may have expected to exclusively find a future-directed gaze bias, unlike what we observed. Moreover, using a complementary analysis of saccade directions along the axis of the future rule (cf. 24), we found no direct evidence for remapping in the period between encoding and cue (Supplementary Fig. 5)”.

      Reviewer 2, Comments:

      The manuscript by Liu et al. reports a task that is designed to examine the extent to which "past" and "future" information is encoded in working memory that combines a retro cue with rules that indicate the location of an upcoming test probe. An analysis of microsaccades on a fine temporal scale shows the extent to which shifts of attention track the location of the location of the encoded item (past) and the location of the future item (test probe). The location of the encoded grating of the test probe was always on orthogonal axes (horizontal, vertical) so that biases in microsaccades could be used to track shifts of attention to one or the other axis (or mixtures of the two). The overall goal here was then to (1) create a methodology that could tease apart memory for the past and future, respectively, (2) to look at the time-course attention to past/future, and (3) to test the extent to which microsaccades might jointly encode past and future memoranda. Finally, some remarks are made about the plausibility of various accounts of working memory encoding/maintenance based on the examination of these time courses.

      Strengths:

      This research has several notable strengths. It has a clear statement of its aims, is lucidly presented, and uses a clever experimental design that neatly orthogonalizes "past" and "future" as operationalized by the authors. Figure 1b-d shows fairly clearly that saccade directions have an early peak (around 300ms) for the past and a "ramping" up of saccades moving in the forward direction. This seems to be a nice demonstration the method can measure shifts of attention at a fine temporal resolution and differentiate past from future-oriented saccades due to the orthogonal cue approach. The second analysis shown in Figure 2, reveals a dependency in saccade direction such that saccades toward the probe future were more likely also to be toward the encoded location than away from the encoded direction. This suggests saccades are jointly biased by both locations "in memory".

      Thank you for your overall appreciation of our work and for highlighting the above strengths. We also thank you for your constructive comments and call for clarifications that we respond to below.

      Weaknesses:

      (1) The "central contribution" (as the authors characterize it) is that "the brain simultaneously retains the copy of both past and future-relevant locations in working memory, and (re)activates each during mnemonic selection", and that: "... while it is not surprising that the future location is considered, it is far less trivial that both past and future attributes would be retained and (re)activated together. This is our central contribution." However, to succeed at the task, participants must retain the content (grating orientation, past) and probe location (future) in working memory during the delay period. It is true that the location of the grating is functionally irrelevant once the cue is shown, but if we assume that features of a visual object are bound in memory, it is not surprising that location information of the encoded object would bias processing as indicated by microsaccades. Here the authors claim that joint representation of past and future is "far less trivial", this needs to be evaluaed from the standpoint of prior empirical data on memory decay in such circumstances, or some reference to the time-course of the "unbinding" of features in an encoded object.

      Thank you. We agree that our participants have to use the future rule – as otherwise they do not know to which test stimulus they should respond. This was a deliberate decision when designing the task. Critically, however, this does not require (nor imply) that participants have to incorporate and apply the rule to both memory items already prior to the selection cue. It is at least as conceivable that participants would initially retain the two items at their encoded (past) locations, then wait for the cue to select the target memory item, and only then consider the future location associated with the target memory item. After all, in every trial, there is only 1 relevant future location: the one associated with the cued memory item. The time-resolved nature of our gaze markers argues against such a scenario, by virtue of our observation of the joint (simultaneous) consideration of past and future memory attributes (as opposed to selection of past-before-future). These temporal dynamics are central to the insights provided by our study.

      In our view, it is thus not obvious that the rule would be applied at encoding. In this sense, we do not assume that the future location is part of both memory objects from encoding, but rather ask whether this is the case – and, if so, whether the future location takes over the role of the past location, or whether past and future locations are retained jointly.

      Our statements regarding what is “trivial” and what is “less trivial” regard exactly this point: it is trivial that the future is considered (after all, our task demanded it). However, it is less trivial that (1) the future location was already available at the time of initial item selection (as reflected in the simultaneous engagement of past and future locations), and (2) that in presence of the future location, the past location was still also present in the observed gaze biases.

      Having said that, we agree that an interesting possibility is that participants remap both memory items to their future-relevant locations ahead of the cue, but that the past location is not yet fully “unbound” by the time of the cue. This may trigger a gaze bias not only to the new future location but also to the “sticky” (unbound) past location. We now acknowledge this possibility in our discussion (also in response to comment 3 below) where we also suggest how future work may be able to tap into this:

      Page 6 (Discussion): “In our study, the past location of the memory items was technically irrelevant for the task and could thus, in principle, be dropped after encoding. One possibility is that participants remapped the two memory items to their future locations soon after encoding, and had started – but not finished – dropping the past location by the time the cue arrived. In such a scenario, the past signal is merely a residual trace of the memory items that serves no purpose but still pulls gaze. Alternatively, however, the past locations may be utilised by the brain to help individuate/separate the two memory items. Moreover, by storing items with regard to multiple spatial frames (cf. 37) – here with regard to both past and future visual locations – it is conceivable that memories may become more robust to decay and/or interference. Also, while in our task past locations were never probed, in everyday life it may be useful to remember where you last saw something before it disappeared behind an occluder. In future work, it will prove interesting to systematically vary to the delay between encoding and cue to assess whether the reliance on the past location gradually dissipates with time (consistent with dropping an irrelevant feature), or whether the past trace remains preserved despite longer delays (consistent with preserving utility for working memory).”

      (2) The authors refer to "future" and "past" information in working memory and this makes sense at a surface level. However, once the retrocue is revealed, the "rule" is retrieved from long-term memory, and the feature (e.g. right/left, top/bottom) is maintained in memory like any other item representation. Consider the classic test of digit span. The digits are presented and then recalled. Are the digits of the past or future? The authors might say that one cannot know, because past and future are perfectly confounded. An alternative view is that some information in working memory is relevant and some is irrelevant. In the digit span task, all the digits are relevant. Relevant information is relevant precisely because it is thought be necessary in the future. Irrelevant information is irrelevant precisely because it is not thought to be needed in the immediate future. In the current study, the orientation of the grating is relevant, but its location is irrelevant; and the location of the test probe is also relevant.

      Thank you for this stimulating reflection. We agree that in our set-up, past location is technically “task-irrelevant” while future location is certainly “task-relevant”. At the same time, the engagement of the past location suggests to us that the brain uses past location for the selection – presumably because the brain uses spatial location to help individuate/separate the items, even if encoded locations are never asked about. Therefore, whether something is relevant or irrelevant ultimately depends on how one defines relevance (past location may be relevant/useful for the brain even if technically irrelevant from the perspective of the task). In comparison, the use of “past” and “future” may be less ambiguous.

      It is also worth noting how we interpret our findings in relation to demands on visual working memory, inspired by dynamic situations whereby visual stimuli may be last seen at one location but expected to re-appear at another, such as a bird disappearing behind a building (the example in our introduction). Thus, past for us does not refer to the memory item perse (like in the digit span analogue) but, rather, quite specifically to the past location of a dynamic visual stimulus in memory (which, in our experiment, was operationalised by the future rule, for convenience).

      (3) It is not clear how the authors interpret the "joint representation" of past and future. Put aside "future" and "past" for a moment. If there are two elements in memory, both of which are associated with spatial bindings, the attentional focus might be a spatial average of the associated spatial indices. One might also view this as an interference effect, such that the location of the encoded location attracts spatial attention since it has not been fully deleted/removed from working memory. Again, for the impact of the encoded location to be exactly zero after the retrieval cue, requires zero interference or instantaneous decay of the bound location information. It would be helpful for the authors to expand their discussion to further explain how the results fit within a broader theoretical framework and how it fits with empirical data on how quickly an irrelevant feature of an object can be deleted from working memory.

      Thank you also for this point (that is related to the two points above). As we stated in our reply to comment 1 above, we agree that one possibility is that the past location is merely “sticky” and pulls the task-relevant future bias toward the past location. If so, our time courses suggest that such “pulling” occurs only until approximately 600 ms after cue onset, as the past bias is only transient. An alternative interpretation is that the past location may not be merely a residual irrelevant trace, but actually be useful and used by the brain.

      For example, the encoded (past) item locations provide a coordinate system in which to individuate/separate the two memory items. While the future locations also provide such a coordinate system, the brain may benefit from holding onto both coordinate systems at the same time, rendering our observation of joint selection in both frames. Indeed, in a recent VR experiment in which we had participants (rather than the items) rotate, we also found evidence for the joint use of two spatial frames, even if neither was technically required for the upcoming task (see Draschkow, Nobre, van Ede, Nature Human Behaviour, 2022). Though highly speculative at this stage, such reliance on multiple spatial frames may make our memories more robust to decay and/or interference. Moreover, while past location was never explicitly probed in our task, in daily life the past location may sometimes (unexpectedly) become relevant, hence it may be useful to hold onto it, just in case. Thus, considering the past location merely as an “irrelevant feature” (that takes time to delete) may not do sufficient justice to the potential roles of retaining past locations of dynamic visual objects held in working memory.

      As also stated in response to comment 1 above, we now added these relevant considerations to our Discussion:

      Page 5 (Discussion): “In our study, the past location of the memory items was technically irrelevant for the task and could thus, in principle, be dropped after encoding. One possibility is that participants remapped the two memory items to their future locations soon after encoding, and had started – but not finished – dropping the past location by the time the cue arrived. In such a scenario, the past signal is merely a residual trace of the memory items that serves no purpose but still pulls gaze. Alternatively, however, the past locations may be utilised by the brain to help individuate/separate the two memory items. Moreover, by storing items with regard to multiple spatial frames (cf. 37) – here with regard to both past and future visual locations – it is conceivable that memories may become more robust to decay and/or interference. Also, while in our task past locations were never probed, in everyday life it may be useful to remember where you last saw something before it disappeared behind an occluder. In future work, it will prove interesting to systematically vary to the delay between encoding and cue to assess whether the reliance on the past location gradually dissipates with time (consistent with dropping an irrelevant feature), or whether the past trace remains preserved despite longer delays (consistent with preserving utility for working memory).”

      Reviewer 3, Comments:

      This study utilizes saccade metrics to explore, what the authors term the "past and future" of working memory. The study features an original design: in each trial, two pairs of stimuli are presented, first a vertical pair and then a horizontal one. Between these two pairs comes the cue that points the participant to one target of the first pair and another of the second pair. The task is to compare the two cued targets. The design is novel and original but it can be split into two known tasks - the first is a classic working memory task (a post-cue informs participants which of two memorized items is the target), which the authors have used before; and the second is a classic spatial attention task (a pre-cue signal that attention should be oriented left or right), which was used by numerous other studies in the past. The combination of these two tasks in one design is novel and important, as it enables the examination of the dynamics and overlapping processes of these tasks, and this has a lot of merit. However, each task separately is not new. There are quite a few studies on working memory and microsaccades and many on spatial attention and microsaccades. I am concerned that the interpretation of "past vs. future" could mislead readers to think that this is a new field of research, when in fact it is the (nice) extension of an existing one. Since there are so many studies that examined pre-cues and post-cues relative to microsaccades, I expected the interpretation here to rely more heavily on the existing knowledge base in this field. I believe this would have provided a better context of these findings, which are not only on "past" vs. "future" but also on "working memory" vs. "spatial attention".

      Thank you for considering our findings novel and important, while at the same time reminding us of the parallels to prior tasks studying spatial attention in perception and working memory. We fully agree that our task likely engages both attention to the (past) memory item as well as spatial attention to the upcoming (future) test stimulus. At the same time, there is a critical difference in spatial attention for the future in our task compared with ample prior tasks engaging spatial cueing of attention for perception. In our task, the cue never directly cues the future location. Rather, it exclusively cues the relevant memory item. It is the memory item that is associated with the relevant future location, according to the future rule. This integration of the rule-based future location into the memory representation is distinct from classical spatial-attention tasks in which attention is cued directly to a specific location via, for example, a spatial cue such as an arrow.

      Thus, if we wish to think about our task as engaging cueing of spatial attention for perception, we have to at least also invoke the process of cueing the relevant location via the appropriate memory item. We feel it is more parsimonious to think of this as attending to both the past and future location of a dynamic visual object in working memory.

      If we return to our opening example, when we see a bird disappear behind a building, we can keep in working memory where we last saw it, while anticipating where it will re-appear to guide our external spatial attention. Here too, spatial attention is fully dependent on working-memory content (the bird itself) – mirroring the dynamic semng in our study. Thus, we believe our findings contribute a fresh perspective, while of course also extending established fields. We now contextualize our finding within the literature and clarify our unique contribution in our revised manuscript:

      Page 5 (Discussion): “Building on the above, at face value, our task may appear like a study that simply combines two established tasks: tasks using retro-cues to study attention in working memory (e.g.,2,31-33) and tasks using pre-cues to study orienting of spatial attention to an upcoming external stimulus (e.g., 31,32,34–36). A critical difference with common pre-cue studies, however, is that the cue in our task never directly informed the relevant future location. Rather, as also stressed above, the future location was a feature of the cued memory item (according to the future rule), and not of the cue itself. Note how this type of scenario may not be uncommon in everyday life, such as in our opening example of a bird flying behind a building. Here too, the future relevant location is determined by the bird – i.e. the memory content – itself.”

      Reviewer 2, Recommendations:

      It would be helpful to set up predictions based on existing working memory models. Otherwise, the claim that the joint coding of past/future is "not trivial" is simply asserted, rather than contradicting an existing model or prior empirical results. If the non-trivial aspect is simply the ability to demonstrate the joint coding empirical through a good experimental design, make it clear that this is the contribution. For example, it may be that prevailing models predict exactly this finding, but nobody has been able to demonstrate it cleanly, as the authors do here. So the non-triviality is not that the result contradicts working memory models, but rather relates to the methodological difficulty of revealing such an effect.

      Thank you for your recommendation. First, please see our point-by-point responses to the individual comments above, where we also state relevant changes that we have made to our article, and where we clarify what we meant with “non trivial”. As we currently also state in our introduction, our work took as a starting point the framework that working memory is inherently about the past while being for the future (cf. van Ede & Nobre, Annual Review of Psychology, 2023). By virtue of our unique task design, we were able to empirically demonstrate that visual contents in working memory are selected via both their past and their future-relevant locations – with past and future memory attributes being engaged together in time. With “not trivial” we merely intend to make clear that there are viable alternatives than the findings we observed. For example, past could have been replaced by the future, or it could have been that item selection (through its past location) was required before its future-relevant location could be considered (i.e. past-before-future, rather than joint selection as we reported). We outline these alternatives in the second paragraph of our Discussion:

      Page 5 (Discussion): “Our finding of joint utilisation of past and future memory attributes emerged from at least two alternative scenarios of how the brain may deal with dynamic everyday working memory demands in which memory content is encoded at one location but needed at another.

      First, [….]”

      Our work was not motivated from a particular theoretical debate and did not aim to challenge ongoing debates in the working-memory literature, such as: slot vs. resource, active vs. silent coding, decay vs. interference, and so on. To our knowledge, none of these debates makes specific claims about the retention and selection of past and future visual memory attributes – despite this being an important question for understanding working memory in dynamics everyday semngs, as we hoped to make clear by our opening example.

      Reviewer 3, Recommendations:

      I recommend that the present findings be more clearly interpreted in the context of previous findings on working memory and attention. The task design includes two components - the first (post-cue) is a classic working memory task and the second (the pre-cue) is a classic spatial attention design. Both components were thoroughly studied in the past and this previous knowledge should be better integrated into the present conclusions. I specifically feel uncomfortable with the interpretation of past vs. future. I find this framework to be misleading because it reads like this paper is on a topic that is completely new and never studied before, when in fact this is a study on the interaction between working memory and spatial attention. I recommend the authors minimize this past-future framing or be more explicit in explaining how this new framework relates to the more common terminology in the field and make sure that the findings are not presented in a vacuum, as another contribution to the vibrant field that they are part of.

      Thank you for these recommendations. Please also see our point-by-point responses to the individual comments above. Here, we explained our logic behind using the terminology of past vs. future (in addition, see also our response to point 2 or reviewer 2). Here, we also stated relevant changes that we have made to our manuscript to explain how our findings complement – but are also distinct from – prior tasks that used pre-cues to direct spatial attention to an upcoming stimulus. As we explained above, in our task, the cue itself never contained information about the upcoming test location. Rather, the upcoming test location was a property of the memory item (given the future rule). Hence, we referred to this as a “future attribute” of the cued memory item, rather than as the “cued location” for external spatial attention. Still, we agree the future bias likely (also) reflects spatial allocation to the upcoming test array, and we explicitly acknowledge this in our discussion. For example:

      Page 5 (Discussion): “This signal may reflect either of two situations: the selection of a future-copy of the cued memory content or anticipatory attention to its the anticipated location of its associated test-stimulus. Either way, by the nature of our experimental design, this future signal should be considered a content-specific memory attribute for two reasons. First, the two memory contents were always associated with opposite testing locations, hence the observed bias to the relevant future location must be attributed specifically to the cued memory content. Second, we cued which memory item would become tested based on its colour, but the to-be-tested location was dependent on the item’s encoding location, regardless of its colour. Hence, consideration of the item’s future-relevant location must have been mediated by selecting the memory item itself, as it could not have proceeded via cue colour directly.”

      Page 6 (Discussion): “Building on the above, at face value, our task may appear like a study that simply combines two established tasks: tasks using retro-cues to study attention in working memory (e.g.,2,31-33) and tasks using pre-cues to study orienting of spatial attention to an upcoming external stimulus (e.g., 31,32,34–36). A critical difference with common pre-cue studies, however, is that the cue in our task never directly informed the relevant future location. Rather, as also stressed above, the future location was a feature of the cued memory item (according to the future rule), and not of the cue itself. Note how this type of scenario may not be uncommon in everyday life, such as in our opening example of a bird flying behind a building. Here too, the future relevant location is determined by the bird – i.e. the memory content – itself.”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      We are grateful for the overall positive feedback from the reviewer.

      We agree with the reviewer that our data showing cellular co-localization between PRC1 and BIN1 requires further investigation in future studies, however, we are confident that in the current form, our manuscript already presents multiple evidences for the role of BIN1 in mitotic processes. We would like to emphasize that PRC1 is not the sole BIN1 partner that connects it to mitotic processes, but it is only one out of more than a dozen that we identified in our study. Furthermore, the mitotic connection with BIN1 is not absolutely novel as BIN1 levels are mildly fluctuating during the cell cycle, similar to other proteins involved in the regulation of the cell cycle (Santos et al., 2015) and because DNM2 is also a well-accepted actor during mitosis (Thompson et al., 2002).

      The less marked co-localization between BIN1 and PRC1 compared to the strong co-localization between BIN1 and DNM2 can be a consequence of their weaker affinity and their partial binding. Yet, this does not necessarily imply that stronger interactions have more biological significance. For example, weaker affinities can be compensated by local concentrations to achieve an even higher degree of cellular complexes than of strongly binding interactions that are separated within the cell. Furthermore, even the degree of complex formation cannot be used intuitively to estimate the biological significance of a complex because complexes can trigger very important biological processes even at very low abundances, e.g. by catalyzing enzymatic reactions. Deciding what is and what is not “biologically significant” among the identified interactions remains to be answered in the future, once we are able to overview complex biological processes in a holistic manner.

      In the revised version, we implemented minor changes to further clarify the raised points.

      Reviewer #2:

      We thank the reviewer for the careful assessment and we are pleased to see the positive enthusiasm regarding our affinity interactomic strategy.

      The reviewer points out that affinities were only measured with a single technique, which is relatively unproven. While it is true that our work uses two techniques building on the same holdup concept, we rather believe that this approach is well-proven. The original holdup method was described almost 20 years ago and since then, it has been used in more than 10 publications for quantitative interactomics. Over the years, at least five distinct generations of the assay were developed, all building on the expertise of the preceding one. In the past, we extensively proved that the resulting affinities show excellent agreement with affinities measured with other methods, such as fluorescence polarization, isothermal titration calorimetry, or surface plasmon resonance (for example in Vincentelli et al. Nat. Meth. 2015; Gogl et al. 2020 Structure; Gogl et al. 2022 Nat.Com.). However, it is true that the most recent variation of this method family, called native holdup, is a fairly new approach published just a bit more than a year ago and this is only the third work that utilizes this method. Yet, in our original work describing the method, we demonstrated good agreement with the results of previous holdup experiments, as well as with orthogonal affinity measurements (Zambo et al. 2022).

      Importantly, the reviewer raises concerns regarding the number of replicates used in our study, as well as the reliability of our methodology. We are glad for such a comment as it allows us to explain our motives behind experimental design which is most often left out from scientific works to save space and keep focus on results. The reason why we use technical replicates instead of the typical biological replicates lies in the nature of the holdup assay. In a typical interactomic assay, such as immunoprecipitation, a lot of variables can perturb the outcome of the measurement, such as bait immobilization, or captured prey leakage during washing steps. The output of such an experiment is a list of statistically significant partners and to minimize these variabilities, biological replicates are used. In the case of a native holdup approach, a panel of an equal amount of resins, all saturated with different baits or controls, is mixed with an equal amount of cell extract, taken from a single tube, and after a brief incubation, the supernatant of this mixture is analyzed. The output of such an experiment is a list of relative concentrations of prey and to maximize its accuracy, we use technical replicates. Using an ideal analytical method, such as fluorescence, it is not necessary to use technical replicates to reach accurate results. For example, the general accuracy of a holdup experiment coupled with a robust analytical approach can be seen clearly in our fragmentomic holdup data shown in Figure 7C where mutant domains that do not have any impact on the interactome show extreme agreement in affinities. Unfortunately, mass spectrometry is less accurate as an analytical method, hence we use technical triplicates to compensate for this. Finally, in the case of BIN1, an independent nHU measurement was also performed using a less capable mass spectrometer. Not counting the 117 detected partners of BIN1 that were only detected in only one of these proteomic measurements, 29 partners were identified as common significant partners in both of these measurements showing nearly identical affinities with a mean standard deviation between measured pKapp values of 0.18, meaning that the obtained dissociation constants are within a <2.5-fold range with >95% probability. There were also 61 BIN1 partners that were detected in both proteomic measurements but were only identified as a significant interaction partner in one of these experiments. Yet many of them show binding in both assays, albeit were found to be not significant in one of these assays. For example, CDC20 shows 66% depletion in one assay (significant binding) while it shows 54% depletion in the other (not significant binding), or CKAP2 shows 58% depletion in one assay (significant binding) while it shows 41% depletion in the other (not significant binding). We hope that these examples show that statistical significance in nHU experiments rather signifies how certain we are in a particular affinity measurement and not the accuracy of the affinity measurement itself. While there are true discrepancies between some of the affinity measurements between these experiments, that would be possible to clarify with more experimental replicates, the raw data presented in our work clearly demonstrate the strength and robustness of a fully quantitative interactomic assay.

      In the revised version, we clarified the number of replicates in the text, in the figure legends, and included some of this discussion in the method section.

      The reviewer had some very useful comments regarding affinity differences between short fragments and full-length proteins. In his comment, he possibly made a typo as we find that fulllength proteins typically interact with higher affinities compared to short PxxP motif fragments in isolation and not weaker. The reviewer also comments that we explain this difference with cooperativity. In a previous preprint version, which the reviewer may have seen, this was indeed the case, but since we realized that we did not have sufficient evidence supporting this model, therefore we did not discuss this in detail in the last version submitted to eLife. To clarify this, we included more discussion about the observed differences in the affinities between fragments and full-length proteins, but since we have limited data to make solid conclusions, we do not go into details about underlying models.

      Instead of cooperativity, the reviewer suggests that the observed differences may originate from additional residues that were not included in our peptides. Indeed, many similar experiments fail because of suboptimal peptide library design. Our peptide library was constructed as 15-mer, xxxxxxPxxPxxxxx motifs and we do not see a strong contribution of residues at the far end of these peptides. Specificity logo reconstructions are expected to identify all key residues that participate in SH3 domain binding, and based on this, all key residues of the identified motifs can be included in shorter 10-mer, xxxPxxPxxx motifs. Therefore, it is unlikely that residues outside our peptide regions will greatly contribute to the site-specific interactions of SH3 domains. It is however possible that other sites, that are sequentially far away from the studied PxxP motifs, are also capable of binding to SH3 through a different surface, but in light of the small size of an isolated SH3 domain, we believe it is very unlikely. It is also possible that BIN1 could also interact with other types of SH3 binding motifs that were not included in our peptide library. We think a more likely explanation is some sort of cooperativity. Cooperativity, or rather synergism between different sites can be easily explained in typical situations, such as in the case of a bimolecular interaction that is mediated by two independent sites. In such an event, once one site is bound, the second binding event will likely also occur because of the high effective local concentration of the binding sites. However, cooperativity can also form in atypical conditions and a molecular explanation for these events is rather elusive. As BIN1 contains a single SH3 domain, its binding to targets containing more binding sites can be challenging to interpret. If these sites are part of a greater Pro-rich region, such as in the case of DNM2, it is possible that the entire region adopts a fuzzy, malleable, yet PPII-like helical conformation. Once the SH3 domain is recruited to this helical region, it can freely trans-locate within this region via lateral diffusion and it will pause on optimal PxxP motifs. As an alternative to this sliding mechanism, a diffusion-limited cooperative binding can also occur. If the two motifs are not part of the same Pro-rich region, but are relatively close in space, such as in the case of ITCH or PRC1, once a BIN1 molecule dissociates from one site, it has a higher chance to rebind to the second site due to higher local concentrations. Such an event can more likely occur if a transient, but relatively stable encounter complex exists between the two molecules, from which complex formation can occur at both sites (A+B↔AB; AB↔ABsite1; AB*↔ABsite2). However, this large effective local concentration in this encounter complex is only temporary because diffusion rapidly diminishes it, although weak electrostatic interactions can increase the lifetime of such encounter complexes. In contrast, the large effective local concentration in conventional multivalent binding is time-independent and only determined by the geometry of the complex. Finally, it may also occur that our empirical bait concentration estimation for immobilized biotinylated proteins is less accurate than the concentration estimation of peptide baits because we approximate this value based on peptide baits. For this technical reason, which was discussed in detail in the original paper describing the nHU approach, we are carefully using apparent affinities for nHU experiments. Nevertheless, even without accurate bait concentrations, our nHU experiment provides precise relative affinities and, thus partner ranking. Either of the mechanisms underlying the interactions we study would be difficult to further explore experimentally, especially at the proteomic level.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We greatly appreciate the comments from the editor and the reviewers, based on which we have made the revisions. We have responded to all the questions and summarized the revisions below. The changes are also highlighted in the manuscript.

      Additionally, we’ve noticed a few typos in the manuscript presented on the eLife website, which were not there in our originally submitted file.

      (1) In both the “Full text” presented on the eLife website and the pdf file generated after clicking “Download”: the last FC1000 in the second paragraph of the “Extensive induction curves fitting of TetR mutants” section should be FC1000WT .

      (2) In the pdf file generated after clicking “Download”: the brackets are all incorrectly formatted in the captions of Figure 4 and Figure 3—figure supplement 6.

      eLife assessment

      The fundamental study presents a two-domain thermodynamic model for TetR which accurately predicts in vivo phenotype changes brought about as a result of various mutations. The evidence provided is solid and features the first innovative observations with a computational model that captures the structural behavior, much more than the current single-domain models.

      We appreciate the supportive comments by the editor and reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors’ earlier deep mutational scanning work observed that allosteric mutations in TetR (the tetracycline repressor) and its homologous transcriptional factors are distributed across the structure instead of along the presumed allosteric pathways as commonly expected. Especially, in addition, the loss of the allosteric communications promoted by those mutations, was rescued by additional distributed mutations. Now the authors develop a two-domain thermodynamic model for TetR that explains these compelling data. The model is consistent with the in vivo phenotypes of the mutants with changes in parameters, which permits quantification. Taken together their work connects intra- and inter-domain allosteric regulation that correlate with structural features. This leads the authors to suggest broader applicability to other multidomain allosteric proteins. Here the authors follow their first innovative observations with a computational model that captures the structural behavior, aiming to make it broadly applicable to multidomain proteins. Altogether, an innovative and potentially useful contribution.

      We thank the reviewer for the supportive comments.

      Weaknesses:

      None that I see, except that I hope that in the future, if possible, the authors would follow with additional proteins to further substantiate the model and show its broad applicability. I realize however the extensive work that this would entail.

      We thank the reviewer for the supportive comments and the suggestion to extend the model to other proteins, which we indeed plan to pursue in future studies.

      Reviewer #2 (Public Review):

      Summary:

      This combined experimental-theoretical paper introduces a novel two-domain statistical thermodynamic model (primarily Equation 1) to study allostery in generic systems but focusing here on the tetracycline repressor (TetR) family of transcription factors. This model, building on a function-centric approach, accurately captures induction data, maps mutants with precision, and reveals insights into epistasis between mutations.

      Strengths:

      The study contributes innovative modeling, successful data fitting, and valuable insights into the interconnectivity of allosteric networks, establishing a flexible and detailed framework for investigating TetR allostery. The manuscript is generally well-structured and communicates key findings effectively.

      We thank the reviewer for the supportive comments.

      Weaknesses:

      The only minor weakness I found was that I still don’t have a better sense into (a) intuition and (b) mathematical derivation of Equation 1, which is so central to the work. I would recommend that the authors provide this early on in the main text.

      We thank the reviewer for the suggestion. The full mathematical derivation of Equation 1 is given in the first section of the supplementary file. Given the length of the derivation, we think it’s better to keep it in the supplementary file rather than the main text. In the main text, the first subsection (overview of the two-domain thermodynamic model of allostery) of the Results section and the paragraph right before Equation 1 are meant for providing intuitive understandings of the two-domain model and the derivation of Equation 1, respectively.

      We would also like to point the reviewer to Figure 2-figure supplement 2 and Equations (12) to (18) in the supplementary file for an alternative derivation. They show that the equilibria among all molecular species containing the operator are dictated by the binding free energies, the ligand concentration, and the allosteric parameters. The probability of an unbound operator (proportional to the probability that the promoter is bound by a RNA polymerase, or the gene expression level) can thus be calculated using Equation (12), which then leads to main text Equation 1 following the derivation given there.

      Additionally, we’ve added a paragraph to the main text (line 248-260) to aid an intuitive understanding of Equation 1.

      “The distinctive roles of the three biophysical parameter on the induction curve as stipulated in Equation 1 could be understood in an intuitive manner as well. First, the value of εD controls the intrinsic strength of binding of TetR to the operator, or the intrinsic difficulty for ligand to induce their separation. Therefore, it controls how tightly the downstream gene is regulated by TetR without ligands (reflected in leakiness) and affects the performance limit of ligands (reflected in saturation). Second, the value of εL controls how favorable ligand binding is in free energy. When εL increases, the binding of ligand at low concentrations become unfavorable, where the ligands cannot effectively bind to TetR to induce its separation from the operator. Therefore, the fold-change as a function of ligand concentration only starts to noticeably increase at higher ligand concentrations, resulting in larger EC50. Third, as discussed above, γ controls the level of anti-cooperativity between the ligand and operator binding of TetR, which is the basis of its allosteric regulation. In other words, γ controls how strongly ligand binding is incompatible with operator binding for TetR, hence it controls the performance limit of ligand (reflected in saturation).”

      We hope that the reviewer will find this explanation helpful.

      Reviewer #3 (Public Review):

      Summary:

      Allosteric regulations are complicated in multi-domain proteins and many large-scale mutational data cannot be explained by current theoretical models, especially for those that are neither in the functional/allosteric sites nor on the allosteric pathways. This work provides a statistical thermodynamic model for a two-domain protein, in which one domain contains an effector binding site and the other domain contains a functional site. The authors build the model to explain the mutational experimental data of TetR, a transcriptional repress protein that contains a ligand and a DNA-binding domain. They incorporate three basic parameters, the energy change of the ligand and DNA binding domains before and after binding, and the coupling between the two domains to explain the free energy landscape of TetR’s conformational and binding states. They go further to quantitatively explain the in vivo expression level of the TetR-regulated gene by fitting into the induction curves of TetR mutants. The effects of most of the mutants studied could be well explained by the model. This approach can be extended to understand the allosteric regulation of other two-domain proteins, especially to explain the effects of widespread mutants not on the allosteric pathways. Strengths: The effects of mutations that are neither in the functional or allosteric sites nor in the allosteric pathways are difficult to explain and quantify. This work develops a statistical thermodynamic model to explain these complicated effects. For simple two-domain proteins, the model is quite clean and theoretically solid. For the real TetR protein that forms a dimeric structure containing two chains with each of them composed of two domains, the model can explain many of the experimental observations. The model separates intra and inter-domain influences that provide a novel angle to analyse allosteric effects in multi-domain proteins.

      We thank the reviewer for the supportive comments.

      Weaknesses:

      As mentioned above, the TetR protein is not a simple two-main protein, but forms a dimeric structure in which the DNA binding domain in each chain forms contacts with the ligand-binding domain in the other chain. In addition, the two ligand-binding domains have strong interactions. Without considering these interactions, especially those mutants that are on these interfaces, the model may be oversimplified for TetR.

      We thank the reviewer for this valid concern and acknowledge that TetR is a homodimer. However, we’ve deliberately chosen to simplify this complexity in our model for the following reasons.

      (1) In this work, we aim to build a minimalist model for two-domain allostery withonly the most essential parameters for capturing experimental data. The simplicity of the model helps promote its mechanistic clarity and potential transferability to other allosteric systems.

      (2) Fewer parameters are needed in a simpler model. Our two-domain modelcurrently uses only three biophysical parameters, which are all demonstrated to have distinct influences on the induction curve (see the main text section “System-level ramifications of the two-domain model”). This enables the inference of parameters with high precision for the mutants, and the quantification of the most essential mechanistic effects of their mutations, provided that the model is shown to accurately recapitulate the comprehensive dataset. Thus, we found it was unnecessary to add another parameter for explicitly describing inter-chain coupling, which would likely incur uncertainty in the inference of parameters due to the redundancy of their effects on induction data, and prevent the model from making faithful predictions.

      (3) From a more biological point of view, TetR is an obligate dimer, meaning thatthe two chains must synchronize for function, supporting the two-domain simplification of TetR for binding concerns.

      Additionally, as shown in the subsection “Inclusion of single-ligand-bound state of repressor” of section 1 of the supplementary file, incorporating the dimeric nature of TetR in our model by allowing partial ligand binding does not change the functional form of main text equation 1 in any practical sense. Therefore, considering all the factors stated above, we think that increasing the complexity of the two-domain model will only be necessary if additional data emerge to suggest the limitation of our model.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is an excellent work. I have only one suggestion for the authors. Interestingly, the authors also note that the epistatic interactions that they obtain are consistent with the structural features of the protein, which is not surprising. Within this framework, have the authors considered rescue mutations? Please see for example PMID: 18195360 and PMID: 15683227. If I understand right, this might further extend the applicability of their model. If so, the authors may want to add a comment to that effect.

      We thank the reviewer for the supportive comments and for pointing us to the useful references. We have added some comments to the main text regarding this point in line 332-336: “The diverse mechanistic origins of the rescuing mutations revealed here provide a rational basis for the broad distributions of such mutations. Integrating such thermodynamic analysis with structural and dynamic assessment of allosteric proteins for efficient and quantitative rescuing mutation design could present an interesting avenue for future research, particularly in the context of biomedical applications (PMID: 18195360, PMID: 15683227).”

      Reviewer #3 (Recommendations For The Authors):

      The authors should try to build a more realistic dimeric model for TetR to see if it could better explain experimental data. If it were too complicated for a revision, more discussions on the weakness of the current model should be given.

      We thank the reviewer for this valid concern and for the suggestion. The reasons for refraining from increasing the complexity of the model are fully discussed in our response to the reviewer’s public review given above. Primarily, we think that the value of a simple physical model is two-fold (e.g., the paradigm Ising model in statistical physics and the classic MWC model), first, its mechanistic clarity and potential transferability makes it a useful conceptual framework for understanding complex systems and establishing universal rules by comparing seemingly unrelated phenomena; second, it provides useful insights and design principles of specific systems if it can quantitatively capture the corresponding experimental data. Thus, given the current experimental data set, we believe it is justified to keep the two-domain model in its current form, while additional experimental data could necessitate a more complex model for TetR allostery in the future. Relevant discussions are added to the main text (line 443-446) and section 8 of the supplementary file.

      “It’s noted that the homodimeric nature of TetR is ignored in the current two-domain model to minimize the number of parameters, and additional experimental data could necessitate a more complex model for TetR allostery in the future (see supplementary file section 8 for more discussions).”

      Minor issues:

      (1) There is an error in Figure 3A, the 13th and 14th subgraphs are the same and should be corrected.

      We thank the reviewer for capturing this error, which has been corrected in the revised manuscript.

      (2) The criteria for the selection of mutants for analysis should be clearly given. Apart from deleting mutants that are in direct contact with the ligand of DNA, how many mutants are left, and how far are they are from the two sites? In line 257, what are the criteria for selecting these 15 mutants? Similarly, in line 332, what are the criteria for selecting these 8 mutants?

      We thank the reviewer for this comment. The data selection criteria are now added in section 7 of the supplementary file. The distances to the DNA operator and ligand of the 21 residues under mutational study are now added in Table 1 (Figure 3-figure supplement 9). The added materials are referenced in the main text where relevant.

      “7. Mutation selection for two-domain model analysis

      In this work, there are 24 mutants studied in total including the WT, and they contain mutations at 21 WT residues. We did not perform model parameter inference for the mutant G102D because of its flat induction curve (see the second subsection of section 2 and main text Figure 2—figure Supplement 3). Therefore, there are 23 mutants analyzed in main text Figure 5.

      Measuring the induction curve of a mutant involves a significant amount of experimental effort, which therefore is hard to be extended to a large number of mutants. Nonetheless, we aim to compose a set of comprehensive induction data here for validating our two-domain model for TetR allostery. To this end, we picked 15 individual mutants in the first round of induction curve measurements, which contains mutations spanning different regions in the sequence and structure of TetR (main text Figure 3—figure Supplement 1). Such broad distribution of mutations across LBD, DBD and the domain interface could potentially lead to diverse induction curve shapes and mutant phenotypes for validating the two-domain model. Indeed, as discussed in the main text section "Extensive induction curves fitting of TetR mutants", the diverse effects on induction curve from mutations perturbing different allosteric parameters predicted by the model, are successfully observed in these 15 experimental induction curves. Additionally, 5 of the 15 mutants contain a dead-rescue mutation pair, which helps us validate the model prediction that a dead mutation could be rescued by rescuing mutations that perturb the allosteric parameters in various ways.

      Eight mutation combinations were chosen for the second round of induction curve measurement for studying epistasis, where we paired up C203V and Y132A with mutations from different regions of the TetR structure. Such choice is largely based on two considerations. 1. As both C203V and Y132A greatly enhance the allosteric response of TetR, we want to probe why they cannot rescue a range of dead mutations as observed previously (PMID: 32999067). 2. C203V and Y132A are the only two mutants that show enhanced allosteric response in the first round of analysis. Combining detrimental mutations of allostery in a combined mutant could potentially lead to near flat induction curve, which is less useful for inference (see the second subsection of section 2).”

      Since the number of hotspots identified by DMS is not very large, why not analyze them all?

      We thank the reviewer for this comment. There are 41 hotspot residues in TetR (PMID: 36226916), which have 41*19=779 possible single mutations. It’s unfeasible to perform induction curve measurements for all of these 779 mutants in our current experiment. However, we agree that it would be helpful if we can obtain such a dataset in an efficient way.

      In line 257, there are 15 mutants mentioned, while in Figure 5, there are 23 mutants mentioned, in Figure 3-figure supplement 1, there are 21 mutants mentioned, and in line 226 of the supplementary file, there are 24 mutants mentioned, which is very confusing. Therefore, the data selection criteria used in this article should be given.

      We thank the reviewer for this comment. The data selection criteria are now given in section 7 of the supplementary file, which should clarify this confusion.

      (3) In Figure 4 of the Exploring epistasis between mutations section, the 6 weights of the additive models corresponding to each mutation combination are different. On one hand, it seems that there are no universal laws in these experimental data. On the other hand, unique parameters of a single mutation combination were not validated in other mutation combinations, which somewhat weakened the conclusions about the potential physical significance of these additive weights.

      We thank the reviewer for this comment. We admit that a quantitative universal law for tuning the 6 weights of the additive model does not manifest in our data, which indicates the mutation-specific nature of epistatic interactions in TetR as hinted in the different rescuing mutation distributions of different dead mutations (PMCID: PMC7568325). However, clear common trends in the weight tuning of combined mutants that contain common mutations do emerge, which comply with the structural features of the protein and provide explanations as to why C203V and Y132A don’t rescue a range of dead mutations (main text section “Exploring epistasis between mutations”). Additionally, the lack of a quantitative universal rule for tuning the 6 weights in our simple model doesn’t exclude the possibility of the existence of universal law for epistasis in TetR in another functional form, a point that could be explored in the future with more extensive joint experimental and computational investigations.

      In Eq. (27) of the supplementary file, the prior distribution of inter-domain coupling γ is given as a Gaussian distribution centered at 5 kBT. Since the absolute value of γ is important, can the authors explain why the prior distribution of γ is set to this value and what happens if other values are used?

      We thank the reviewer for the question. As explained in the corresponding discussions of Eq. (27) in the supplementary file, the prior of γ is chosen to serve as a soft constraint on its possible values based on the consideration that 1. inter-domain energetics for a TetR-like protein should be on the order of a few kBT; and 2. the prior distribution should reflect the experimental observation in the literature that γ has a small probability of adopting negative values upon mutations. Given our thorough validation of the statistical model and computational algorithm (see section 3 of the supplementary file), and the high precision in the parameter fitting results using experimental data (Figure 3 and Figure 4-figure supplement 2), we conclude that 1. the physical range of parameters encoded in their chosen prior distributions agrees well with the value reflected in the experimental data; 2. the inference results are predominantly informed by the data. Thus, changing the mean of the prior distribution of γ should not affect the inference results significantly given that it remains in the physical range.

      This point is explicitly shown in the added Table 2 (Figure 3-figure supplement 10), where we compare the current Bayesian inference results with those obtained after increasing the standard deviation of the Gaussian prior of γ from 2.5 to 5 kBT. As shown in the table, most inference results stay virtually unchanged at the use of this less informative prior, which confirms that they are predominantly informed by the data. The only exceptions are the slight increase of the inferred γ values for C203V, C203V-Y132A and C203V-G102D-L146A, reflecting the intrinsic difficulty of precise inference of large γ values with our model, as is already discussed in the second subsection of section 3 of the supplementary file. However, such observations comply with the common trend of epistatic interactions involving C203V presented in the main text and don’t compromise the ability of our model to accurately capture the induction curves of mutants. Relevant discussions are now added to the second subsection of section 3 of the supplementary file (line 368-385).

      “In our experimental dataset, such inference difficulty is only observed in the case of C203V, Y132A-C203V and C203V-G102D-L146A due to their large γ and γ + εL values (see main text Figure 3, Figure 3—figure Supplement 10 and Figure 4). As shown in main text Figure 3—figure Supplement 10, the inference results for the other 20 mutants stay highly precise and virtually unchanged after increasing the standard deviation of the Gaussian prior of γ (gstdγ ) from 2.5 to 5 kBT. This demonstrates that the inference results for these mutants are strongly informed by the induction data and there is no difficulty in the precise inference of the parameter values. On the other hand, the inferred γ values (especially the upper bound of the 95% credible region) for C203V, Y132A-C203V and C203V-G102D-L146A increased with gstdγ . This is because the induction curves in these cases are not sensitive to the value of γ given that it’s large enough as discussed above. Hence, when unphysically large γ values are permitted by the prior distribution, they could enter the posterior distribution as well. Such difficulty in the precise inference of γ values for these three mutants however, doesn’t compromise the ability of our model in accurately capturing the comprehensive set of induction data (see part iv below). Additionally, the increase of the inferred γ value of C203V at the use of larger gstdγ complies with the results presented in main text Figure 4, which show that the effect of C203V on γ tends to be compromised when combined with mutations closer to the domain interface."

    1. We can’t master knowledge. It’s what we live in. This requires a radical shift of worldview from colonialist to ecological. The colonial approach to knowledge is to capture it in order to profit from it. The ecological approach is to live within it as within a garden to be tended. The two worldviews may well be mutually incompatible, though this matter is hardly resolved yet.

      Vgl [[Netwerkleren Connectivism 20100421081941]] / [[Context is netwerk van betekenis 20210418104314]] [[Observator geeft betekenis 20210417124703]] . I think K as stock is prone to collector's fallacy. My working def of K is agency along lines of Sveiby. Such K is always situated in the interaction with the world, networks of meaning as context. This as K isn't merely purified I (DIKW pyramid is bogus), it's weaving I, experience, context, skills into a meaningful whole, and it needs an agent to decide on what's meaningful.

    1. Author Response

      The following is the authors’ response to the current reviews.

      At this stage the referees had only minor comments. Referee #1 asked whether archerfish indeed generalize in egocentric rather than allocentric coordinates. It might be that the current results do not rule out the idea that archerfish are unaware of changes in body position, they continue with previously successful actions, that seems as egocentric generalization. We agree with referee #1 and updated lines 255-260 in the results and added lines 329-336 in the discussion text that mentions this possibility. Referee #2 mentioned that a portion of fish did not make it to the final test which raises the question whether all individuals are able to solve the task. We agree with referee #2 and added paragraph at the discussion section to mention this point (lines 384-388). We also added the salinity of the water in the water tanks (line 98) as per suggestion of the Referee #2. Referee #2 suggested using a different term than “washout” in the behavioral experiments. Since the term “washout” is standard in the field, we keep the term in the text.


      The following is the authors’ response to the original reviews.

      eLife assessment

      This useful study explores how archerfish adapt their shooting behavior to environmental changes, particularly airflow perturbations. It will be of interest to experts interested in mechanisms for motor learning. While the evidence for an internal model for adaptation is solid, evidence for adaptation to light refraction, as initially hypothesized, is inconclusive. As such, the evidence supporting an egocentric representation might be caused by alternative mechanisms to airflow perturbations.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors examined whether archerfish have the capacity for motor adaptation in response to airflow perturbations. Through two experiments, they demonstrated that archerfish could adapt. Moreover, when the fish flipped its body position with the perturbation remaining constant, it did not instantaneously counteract the error. Instead, the archerfish initially persisted in correcting for the original perturbation before eventually adapting, consistent with the notion that the archerfish's internal model has been adapted in egocentric coordinates.

      Evaluation:

      The results of both experiments were convincing, given the observable learning curve and the clear aftereffect. The ability of these fish to correct their errors is also remarkable. Nonetheless, certain aspects of the experiment's motivation and conclusions temper my enthusiasm.

      (1) The authors motivated their experiments with two hypotheses, asking whether archerfish can adapt to light refractions using an innate look-up table as opposed to possessing a capacity to adapt. However, the present experiments are not designed to arbitrate between these ideas. That is, the current experiments do not rule out the look-up table hypothesis, which predicts, for example, that motor adaptation may not generalize to de novo situations with arbitrary actionoutcome associations. Such look-up table operations may also show set-size effects, whereas other mechanisms might not. Whether their capacity to adapt is innate or learned was also not directly tested, as noted by the authors in the discussion. Could the authors clarify how they see their results positioned in light of the two hypotheses noted in the Introduction?

      We agree with the referee that look up tables only confuse the issue. The question we tested is whether or not the fish uses adaptation mechanisms to correct its shooting. We have now changed the introduction both to eliminate the entire question of look up tables and also to clarify that both innate mechanisms and learning mechanisms can contribute to fish shooting, and that our research focuses on the question of whether the fish can adapt to a perturbation in its shooting caused by a change in its physical environment.

      (2) The authors claim that archerfish use egocentric coordinates rather than allocentric coordinates. However, the current experiments do not make clear whether the archerfish are "aware" that their position was flipped (as the authors noted, no visual cues were provided). As such, for example, if the fish were "unaware" of the switch, can the authors still assert that generalization occurs in egocentric coordinates? Or simply that, when archerfish are ostensibly unaware of changes in body position, they continue with previously successful actions.

      The fish has access to the body position switch: there are clues in a water tank that can help the fish orient inside the water tank. Additionally, there are no clues to the presence or direction of the air flow above the water tank. Moreover, previous experience has shown that the fish is sensitive to the visual cues and uses them to achieve consistent orientation within the tank when possible. These points have been added to the main text [lines 143-144, 254-257]

      (3) The experiments offer an opportunity to examine whether archerfish demonstrate any savings from one session to another. Savings are often attributed to a faster look-up table operation. As such, if archerfish do not exhibit savings, it might indicate a scenario where they do not possess a refined look-up table and must rely on implicit mechanisms to relearn each time.

      This is an important question. Indeed, we looked for the ‘saving’ effect in the data, but its noisy nature prevented us from drawing a concrete conclusion. We now mention this in lines 247-249.

      We have also eliminated the discussion of look up tables from the article.

      (4) The authors suggest that motor adaptation in response to wind may hint at mechanisms used to adapt to light refraction. However, how strong of a parallel can one draw between adapting to wind versus adapting to light refraction? This seems important given the claims in this paper regarding shared mechanisms between these processes. As a thought experiment, what would the authors predict if they provided a perturbation more akin to light refraction (e.g., a film that distorts light in a new direction, rather than airflow)?

      This is an important point. Indeed, our project started by looking for options to distort the refraction index or distort the light in a new direction. However, given the available ways of distorting the light to a new direction, it is hard to achieve that on the technical level. Initially, we tried using prism goggles, however the archerfish found it hard to shoot with the heavy load on the head. We have also explored oil on the water surface. However, given the available oils and the width of the film above water, it is hard to achieve considerable perturbation.

      Fish response to the perturbation matches the response to what would be expected for a change in light refraction. Light refraction perturbation does not change with the change in fish body position relative to the target. However, in response to (and in agreement with) the referees, we have generalized the context in which we see our results and discuss the results in terms of adaptation of the fish shooting behavior to changes in physical factors including light refraction, wind, fatigue, and others.

      (5) The number of fish excluded was greater than those included. This raises the question as to whether these fish are merely elite specimens or representative of the species in general.

      The filtering of the fish was in the training stage. The requirements were quite strict: the fish had to produce enough shots each day in the experimental setup. Very few fish succeeded. But all fish that got to the stage of perturbation exhibited the adaptation effect. We do not see a reason to think that the motivation to shoot will have a strong interaction with the shooting adaptation mechanisms.

      Reviewer #2 (Public Review):

      Summary:

      The work of Volotsky et al presented here shows that adult archerfish are able to adjust their shooting in response to their own visual feedback, taking consistent alterations of their shot, here by an air flow, into account. The evidence provided points to an internal mechanism of shooting adaptation that is independent of external cues, such as wind. The authors provide evidence for this by forcing the fish to shoot from 2 different orientations to the external alteration of their shots (the airflow). This paper thus provides behavioral evidence of an internal correction mechanism, that underlies adaptive motor control of this behavior. It does not provide direct evidence of refractory index-associated shoot adjustance.

      Strengths:

      The authors have used a high number of trials and strong statistical analysis to analyze their behavioral data.

      Weaknesses:

      While the introduction, the title, and the discussion are associated with the refraction index, the latter was not altered, and neither was the position of the target. The "shot" was altered, this is a simple motor adaptation task and not a question related to the refractory index. The title, abstract, and the introduction are thus misleading. The authors appear to deduce from their data that the wind is not taken into account and thus conclude that the fish perceive a different refractory index. This might be based on the assumption that fish always hit their target, which is not the case. The airflow does not alter the position of the target, thus the airflow does not alter the refractive index. The fish likely does not perceive the airflow, thus alteration of its shooting abilities is likely assumed to be an "internal problem" of shooting. I am sorry but I am not able to understand the conclusion they draw from their data.

      This is an important point. Indeed, our project started by looking for options to distort the refraction index or distort the light in a new direction. However, given the available ways of distorting the light to a new direction, it is hard to achieve that on the technical level. Initially, we tried using prism goggles, however the archerfish found it hard to shoot with the heavy load on the head. We have also explored oil on the water surface. However, given the available oils and the width of the film above water, it is hard to achieve considerable perturbation.

      Fish response to the perturbation matches the response to what would be expected for a change in light refraction. Light refraction perturbation does not change with the change in fish body position relative to the target. However, in response to (and in agreement with) the referees, we have generalized the context in which we see our results and discuss the results in terms of adaptation of the fish shooting behavior to changes in physical factors including light refraction, wind, fatigue, and others.

      Reviewer #2 (Recommendations For The Authors):

      I have had a hard time trying to understand how the authors concluded that the RI is important here as it is not altered. Thus I did not understand the conclusions drawn from this paper. The experiments are well described, but the conclusions are not to me. Maybe schematics would help to clarify. I am from outside the field and represent a naïve reader with an average intellect. The authors need to do a better job of explaining their results if they want others to understand their conclusions.

      See response to the public comments.

      Minor comments:

      Line 9: omit the "an".

      Done.

      Line 11: this sentence would fit way better if it followed the next one.<br /> Done.

      Line 15: and all the rest of the paper: washout is a strange term and for me associated with pharmacological manipulations - might only be me. I suggest using recovery instead throughout the manuscript.

      The term ‘washout’ is often used in the field of motor adaptation to describe the return to original condition. For example:

      Kluzik J, Diedrichsen J, Shadmehr R, Bastian AJ (2008) Reach adaptation: what determines whether we learn an internal model of the tool or adapt the model of our arm? J Neurophysiol 100:1455-64. doi: 10.1152/jn.90334.2008

      Donchin O, Rabe K, Diedrichsen J, Lally N, Schoch B, Gizewski ER, Timmann D (2012) Cerebellar regions involved in adaptation to force field and visuomotor perturbation. J Neurophysiol 107:134-47

      Line 19: the fish does not expect the flow, it expects that it shoots too short- no?

      Done.

      Line 35: fix the citation - in your reference manager.

      Done.

      Line 52: provide some examples of the mechanisms you think of or papers of it for naive readers. Otherwise, this sentence is not helpful for the reader.

      Done.

      Line 183: it's unclear which parameter you mean. Rephrase.

      Done.

      Line 197: should read to test "the" - same sentence: you repeat yourself- rephrase the sentence.

      Done.

      Figure 4: it was unclear to me why the figure was differentiating between fishes until I read the legend. Why not include direct information in the figure? A schematic maybe? Legend: you have a double "that" in C.

      We added the title for each column with the information about the direction of air.

      Figures: in all figures, perturbation is wrongly spelled! Change the term washout to recovery.

      Done. We kept the term ‘washout’

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      I have only a few comments that I think will improve the manuscript and help readers better appreciate the context of the reported results.

      We would like to thank the Reviewer for their time in reviewing our manuscript. We appreciate the helpful feedback and assistance in ensuring the highest quality publication possible.

      One paradox, that the authors point out, is that the drastic effects of TALK-1 L114P on plasma membrane potential do not result in a complete loss of insulin secretion. One important consideration is the role of intracellular stores in insulin secretion at physiological levels of hyperglycemia. This needs to be discussed more thoroughly, especially in the light of recent papers like Postic et al 2023 AJP and others. The authors do show an upregulation of IP3-induced Ca release. It is not clear whether they think this is a direct or indirect effect on the ER. Is there more IP3? More IP3R? Are the stores more full?

      The reviewer brings up an important point. Although we see a significant reduction in glucose-stimulated depolarization in most islets from TALK-1 L114P mice, some glucosestimulated calcium influx is still present (especially from female islets); this suggests that a subset of islet β-cells are still capable of depolarization. Because our original membrane potential recordings were done in whole islets without identification of the cell type being recorded, we have now repeated these electrical recordings in confirmed β-cells (see Supplemental figure 6). The new data shows that 33% of TALK-1 L114P β-cells show action potential firing in 11 mM glucose, which would be predicted to stimulate insulin secretion from a third of all TALK-1 L114P β-cells; this could be responsible for the remaining glucosestimulated insulin secretion observed from TALK-1 L114P islets. However, ER calcium store release could also allow for some of the calcium response in the TALK-1 L114P islets. We have now detailed this in the discussion; this now details the Postic et. al. study showing that glucose-stimulated beta-cell calcium increases involve ER calcium release as it occurs in the presence of voltage-dependent calcium channel inhibition. Future studies can assess this using SERCA inhibitors and determining if glucose-stimulated calcium influx in TALK-1 L114P islets is lost. We also find that muscarinic stimulated calcium influx from ER stores is greater in TALK-1 L114P mice. We currently do not have data to support the mechanism for this enhancement of muscarinic-induced islet calcium responses from islets expressing TALK1 L114P. Our hypothesis is that greater TALK-1 current on the ER membrane is enhancing ER calcium release in response to IP3R activation. There is an equivalent IP3R expression in control and TALK-1 L114P islets based on transcriptome analysis, which is now included in the manuscript. However, whether there is greater IP3 production, greater ER calcium storage, and/or greater ER calcium release requires further analysis. Because this finding was not directly related to the metabolic characterization of this TALK-1 L114P MODY mutation, we are planning to examine the ER functions of TALK-1L114P thoroughly in a future manuscript.

      The authors point to the possible roles of TALK-1 in alpha and delta cells. A limitation of the global knock-in approach is that the cell type specificity of the effects can't easily be determined. This should be more explicitly described as a limitation.

      We thank the reviewer for this suggestion and have added this to the discussion. This is now included in a paragraph at the end of the discussion detailing the limitations of this manuscript.

      The official gene name for TALK-1 is KCNK16. This reviewer wonders whether it wouldn't be better for this official name to be used throughout, instead of switching back and forth. The official name is used for Abcc8 for example.

      We thank the reviewer for this suggestion and have revised the manuscript to include Kcnk16 L114P. The instances of TALK-1 L114P that remain in the manuscript are in cases where the text specifically discusses TALK-1 channel function.

      There are several typos and mistakes in editing. For example, on page 5 it looks like "PMID:11263999" has not been inserted. I suggest an additional careful proofreading.

      We have revised this reference, thoroughly proofread the revised manuscript, and corrected typos.

      The difference in lethality between the strains is fascinating. Might be good to mention other examples of ion channel genes where strain alters the severe phenotypes? Additional speculation on the mechanism could be warranted. It also offers the opportunity to search for genetic modifiers. This could be discussed.

      We thank the reviewer for this suggestion and have added details on mutations where strain alters lethality.

      The sex differences are interesting. Of course, estrogen plays a role as mentioned at the bottom of page 16, but there have been more involved analyses of islet sex differences, including a recent paper from the Rideout group. Is there a sex difference in the islet expression of KCNK16 mRNA or protein, in mice or humans?

      We thank the reviewer for the important comments on the TALK-1 L114P sex differences. We have revised the manuscript to include greater discussion about female β cell resilience to stress, which may allow greater insulin secretion in the presence of the TALK-1 L114P channels; this is based on the Brownrigg et. al. study pointed out by the reviewer (PMID: 36690328). Because these sex differences in islet function were examined in mice, we looked at KCNK16 expression in mouse beta-cells. While there is a trend for greater KCNK16 expression in sorted male beta-cells (average RPKM 6296.25 +/-953.84) compared to sorted female beta-cells (5148.25 +/- 1013.22). Similarly, there was a trend toward greater KCNK16 expression in male HFD treated mouse beta-cells (average RPKM 8020.75 +/- 1944.41) compared to female HFD treated mouse beta-cells (average RPKM 7551 +/- 2952.70). We have now added this to the text.

      Page 15-16 "Indeed, it has been well established that insulin signaling is required for neonatal survival; for example, a similar neonatal lethality phenotype was observed in mice without insulin receptors (Insr-/-) where death results from hyperglycemia and diabetic ketoacidosis by P3 (40)." Formally, the authors are not examining insulin signaling. A better comparison is that of the Ins1/Ins2 double knockout model of complete hypoinsulinemia.

      We thank the reviewer for suggesting this as the appropriate comparison model and have now revised the manuscript to detail the 48-hour average life expectancy of Ins1/Ins2 double knockout mice (PMID: 9144203).

      There are probably too many abbreviations in the paper, making it harder to read by nonspecialists. I recommend writing out GOF, GSIS, WT, K2P, etc.

      We thank the reviewer for this suggestion and have revised the manuscript to reduce the use of most abbreviations.

      Reviewer #2:

      We would like to thank the Reviewer for their time in reviewing our manuscript. We appreciate the helpful feedback and assistance in ensuring the highest quality publication possible. We have thoroughly addressed all the reviewer’s comments and revised the manuscript accordingly. These changes have strengthened the manuscript and are summarized below.

      (1) The authors perform an RNA-sequencing showing that the cAMP amplifying pathway is upregulated. Is this also true in humans with this mutation? Other follow-up comments and questions from this observation:

      a) Will this mean that the treatment with incretins will improve glucose-stimulated insulin secretion and Ca2+ signalling and lower blood glucose? The authors should at least present data on glucose-stimulated insulin secretion and/or Ca2+ signalling in the presence of a compound increasing intracellular cAMP.

      b) Will an OGTT give different results than the IPGTT performed due to the fact that the cAMP pathway is upregulated?

      c) Is the increased glucagon area and glucagon secretion a compensatory mechanism that increases cAMP? What happens if glucagon receptors are blocked?

      We thank the reviewer for the suggestions. Although cAMP pathways were upregulated in the TALK-1 L114P islets, the changes in expression were only modest as examined by qRTPCR. Thus, we are not sure if this plays a role in secretion. For humans with this mutation, there have been such a small number of patients and no islets isolated from these patients. Therefore, we are unaware if the cAMP amplifying pathway is upregulated in humans with the MODY associated TALK-1 L114P mutation. We have performed the suggested experiment assessing calcium from TALK-1 L114P islets in response to liraglutide (see Supplemental figure 10); there was no liraglutide response in TALK-1 L114P islets. We have also performed the OGTT experiments as suggested and these have now been added to the manuscript (see Supplemental figure 3). We do not believe that the increased glucagon is a compensatory response, because: 1. TALK-1 deficient islets have less glucagon secretion due to reduced SST secretion (see PMID: 29402588); 2. There is no change in insulin secretion at 7mM glucose, however, glucagon secretion is significantly elevated from islets isolated from TALK-1 L114P mice; 3. TALK-1 is highly expressed in delta-cells, and in these cells TALK-1 L114P would be predicted to cause significant hyperpolarization and significant reductions in calcium entry as well as SST secretion. Thus, reduced SST secretion may be responsible for the elevation of glucagon secretion. We plan to investigate delta-cells within islets from TALK-1 L114P mice in future studies to determine if changes in SST secretion are responsible for the elevated glucagon secretion from TALK-1 L114P islets.

      (2) The performance of measurements in both male and female mice is praiseworthy. However, despite differences in the response, the authors do not investigate the potential reason for this. Are hormonal differences of importance?

      We thank the reviewer for this important point. It is indeed becoming clear that there are many differences between male and female islet function and responses to stress. Thus, we have revised the manuscript to include greater discussion about these differences such as female β cell resilience to stress, which may allow greater insulin secretion in the presence of the TALK-1 L114P channels; this is based on the Brownrigg et. al. study pointed out by reviewer 1 (PMID: 36690328). While the differences in islet function and GTT between male and female L114P mice are clear, they both show diminished islet calcium handling, defective hormone secretion, and development of glucose intolerance. This manuscript was intended to demonstrate how the MODY TALK-1 L114P causing mutation caused glucose dyshomeostasis, which we have determined in both male and female mice. The mechanistic determination for the differences between male and female mice and islets with TALK-1 L114P could be due to multiple potential causes (as detailed in PMID: 36690328), thus, we believe that comprehensive studies are required to thoroughly determine how the TALK-1 L114P mutation differently impacts male and female mice and islets, which we plan to complete in a future manuscript.

      (3) MINOR: Page 5 .." channels would be active at resting Vm PMID:11263999.." The actual reference has not been added using the reference system.

      We thank the reviewer for noticing this mistake, which has now been corrected.

      Reviewer #3:

      The manuscript is overall clearly presented and the experimental data largely support the conclusions. However, there are a number of issues that need to be addressed to improve the clarity of the paper.

      We would like to thank the Reviewer for their time in reviewing our manuscript. We appreciate the helpful feedback and assistance in ensuring the highest quality publication possible. We have thoroughly addressed all the reviewer’s comments and revised the manuscript accordingly. These changes have strengthened and improved the clarity of the manuscript.

      Specific comments:

      (1) Title: The terms "transient neonatal diabetes" and "glucose dyshomeostasis in adults" are used to describe the TALK-1 L114P mutant mice. Transient neonatal diabetes gives the impression that diabetes is resolved during the neonatal period. The authors should clarify the criteria used for transient neonatal diabetes, and the difference between glucose dyshomeostasis and MODY. Longitudinal plasma glucose and insulin data would be very informative and help readers to follow the authors' narrative.

      We appreciate the helpful comment and have added longitudinal plasma glucose from neonatal mice to address this (see Supplemental figure 2). The new data now shows the TALK-1 L114P mutant mice undergo transient hyperglycemia that resolves by p10 and then occurs again at week 15. Insulin secretion from P4 islets is also included that shows that male animals homozygous for the TALK-1 L114P mutation have the largest impairment in glucosestimulated insulin secretion, followed by male heterozygous TALK-1 L114P P4 islets that also have impaired insulin secretion (see Figure 1). The amount of hyperglycemia correlates with the defects in neonatal islet insulin secretion.

      (2) Another concern for the title is the term "α-cell overactivity." This could be taken to mean that individual α-cells are more active and/or that there are more α-cells to secrete glucagon. The study does not provide direct evidence that individual α-cells are more active. This should be clarified.

      We appreciate the helpful comment and have revised the manuscript title accordingly.

      (3) In the Introduction, it is stated that because TALK-1 activity is voltage-dependent, the GOF mutation is less likely to cause neonatal diabetes, yet the study shows the L114P TALK-1 mutation actually causes neonatal diabetes by completely abolishing glucose-stimulated Ca2+ entry. This seems to imply TALK-1 activity (either in the plasma membrane or ER membrane) has more impact on Vm or cytosolic Ca2+ in neonates than initially predicted. Some discussion on this point is warranted.

      These are important points and we have added details to the discussion about this. For example, the discussion now states that, “This suggests a greater impact of TALK-1 L114P in neonatal islets compared to adult islets. Future studies during β-cell maturation are required to determine if TALK-1 activity is greater on the plasma membrane and/or ER membrane compared with adult β-cells.” The introduction has also been revised to clarify the voltagedependence of TALK-1.

      (4) What is the relative contribution of defects in plasma membrane depolarization versus ER Ca2+ handling on defective insulin secretion response?

      We thank the reviewer for bringing up this important point. TALK-1 L114P islets show blunted glucose-stimulated depolarization and glucose-stimulated calcium entry, however, the L114P islets show equivalent Ca2+ entry as control islets in response high KCl (Figure 5GH). As the KCl stimulated Ca2+ influx is similar between control and TALK-1 L11P islets, this indicates that plasma membrane TALK-1 L114P has a hyperpolarizing role that significantly blunts glucose-stimulated depolarization and reduces activation of voltage-dependent calcium channels. We have further tested this by looking at glucose-stimulated β-cell membrane potential depolarization in TALK-1 L11P islets, which is significantly blunted (Figure4 A and B; Supplemental figure 6). However, 33% of TALK-1 L11P β-cells showed glucose-stimulated electrical excitability (Supplemental figure 6), which likely accounts for the modest GSIS from TALK-1 L11P islets. New data has also been included showing that KCl stimulation causes a significant depolarization of β-cells from TALK-1 L11P islets (Supplemental figure 6). Because plasma membrane TALK-1 L114P is largely responsible for the hyperpolarized membrane potential and blunted glucose-stimulated Ca2+ entry, this suggests that TALK-1 L11P on the plasma membrane is primarily responsible for the altered insulin secretion. The discussion has been revised to reflect this.

      (5) The Jacobson group has previously shown that another K2P channel TASK-1 is also involved in ER Ca2+ homeostasis and that TASK inhibitors restored ER Ca2+ in TASK-1 expressing cells. Is TASK-1 expressed in β-cell ER membrane? Can the mishandling of Ca2+ caused by TALK-1 L114P be reversed by TASK-1 inhibitors?

      We thank the reviewer for bringing up this important point in relation to ER calcium handling by K2P channels. We have found that TASK-1 channels expressed in alpha-cells enhance ER calcium release and that inhibitors or TASK-1 channels elevate alpha-cell ER calcium storage. We did not observe any significant changes in the gene (Kcnk3) encoding TASK-1 between islets from control or TALK-1 L11P mice, which has now been added to the manuscript. However, because the TALK-1 L11P-mediated reduction of glucose-stimulated depolarization and inhibition of calcium entry are both prevented in the presence of high KCl (see Figure X); this strongly suggests that TALK-1 L114P K+ flux at the membrane is hyperpolarizing the membrane potential and limiting depolarization and calcium entry. This suggests that TALK-1 L114P control of ER calcium handling is not the primary contributor to the blunted glucose-stimulate calcium handling. Furthermore, acetylcholine stimulation of islets from both control and TALK-1 L114P islets elicited ER calcium release, which indicates that for the most part ER calcium release is still responsive to cues that control release, but they are altered. Taken together this suggests that the TALK-1 L114P impact on ER calcium is not the primary mediator of blunted glucose-stimulated islet calcium entry and insulin secretion.

      (6) The electrical recording experiments were conducted using whole islets. The authors should comment on how the cells were identified as β-cells, especially in mutant islets in which there is an increased number of α-cells.

      The reviewer brings up an important point. As indicated, the original membrane potential recordings were conducted using whole islets. While the recorded cells could mostly be βcells based on mouse islets typically containing >80% β-cells, there is a possibility that some of the cells included in these recordings were α-cells or δ-cells (especially because of the noted α-cell hyperplasia in TALK-1 L114P islets). Thus, we have now included data from bcells that were identified with an adenoviral construct containing a rat insulin promoter driving a fluorescent reporter. This allowed the fluorescent β-cells to be monitored with electrophysiological membrane potential recordings. The new data (see Supplemental figure 6) shows a significant reduction in glucose-stimulated depolarization in 67% of β-cells with the L114P mutation compared to controls.

      Minor:

      (1) Some references need formatting.

      The references have been revised accordingly.

      (2) Please define glucose-stimulated phase 0 Ca2+ response for non-expert readers.

      This has been defined accordingly.

      (3) Page 14 bottom: The sentence "Unlike the only other MODY-associated.........., TALK-1 is not inhibited by sulfonylureas" seems out of place and lacks context.

      We thank the reviewer for this suggestion and have deleted this sentence.

      (4) Figure 6: It would be helpful to provide a protein name for the genes shown in panel D.

      The protein names for the genes have now been included in the discussion of these genes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We appreciate the thoughtful review of our manuscript by the reviewers, along with their valuable suggestions for enhancing our work. In response to these suggestions, we conducted additional experiments and made significant revisions to both the text and figures. In the following sections, we first highlight the major changes made to the manuscript, and thereafter address each reviewer's comments point-by-point. We hope these additional data and revisions have improved the robustness and clarity of the study and manuscript. Please note that as part of a suggested revision we have changed the manuscript title to be: Bacterial vampirism mediated through taxis to serum.

      Major revisions and new data:

      (1) We conducted additional experiments testing taxis to serum using a swine ex vivo enterohemorrhagic lesion model in which we competed wildtype versus chemotaxis deficient strains (Fig. 8). We selected swine for these experiments due to their similarity in gastrointestinal physiology to humans. In these experiments we see that chemotaxis, and the chemoreceptor Tsr, mediate localization to, and migration into, the lesion. We also tested, and confirmed, taxis to serum from swine and serum from horse, that supporting that serum attraction is relevant in other host-pathogen systems.

      (2) We present additional experimental data and quantification of chemotaxis responses to human serum treated with serine-racemase (Fig. S3). This treatment reduces wildtype chemoattraction and the wildtype no longer possesses an advantage over the tsr strain, providing further evidence that L-serine is the specific chemoattractant responsible for Tsr-mediated attraction to serum.

      (3) We present additional data in the form of 17 videos of chemotaxis experiments with norepinephrine and DHMA showing null-responses under various conditions. These data provide additional support to the conclusion that these chemicals are not responsible for bacterial attraction to serum. We have included these raw data as a new supplementary file (Data S1) for those in the field that are interested in these chemicals.

      (4) Based on comments from Reviewer 2 regarding whether the position of the ligand and ligand-binding site residues in the previously-reported EcTsr LBD structure are incorrect, or whether these differences are due to the proteins being from different organisms, we performed paired crystallographic refinements to determine which positions result in model improvement (Fig. 7J). Altering the EcTsr structure to have the ligand and ligandbinding site positions from our new higher resolution and better-resolved structure of Salmonella Typhimurium Tsr results in a demonstrably better model, with both Rwork and Rfree lower by about 1% (Fig. 7J). These data support our conclusion that the correct positions for both structures are as we have modeled them in the S. Typhimurium Tsr structure. We also solved an additional crystal structure of SeTsr LBD captured at neutral pH (7-7.5) that confirms our structure captured with elevated pH (7.5-9.7) has no major changes in structure or ligand-binding interactions (Fig. S6, Table S2).

      (5) Based on comments from Reviewer 2 on the accuracy of the diffusion calculations, we present a new analysis (Fig. S2) comparing the experimentally-determined diffusion of A488 compared to its calculated diffusion. We found that:

      [line 111]: “As a test case of the accuracy of the microgradient modeling, we compared our calculated values for A488 diffusion to the normalized fluorescence intensity at time 120 s. We determined the concentration to be accurate within 5% over the distance range 70270 µm (Fig. S2). At smaller distances (<70 µm) the measured concentration is approximately 10% lower than that predicted by the computation. This could be due to advection effects near the injection site that would tend to enhance the effective local diffusion rate.”

      (6) Both reviewers asked us to better justify why we focused on the chemoreceptor Tsr, and had questions about why we did not investigate Tar. The low concentration of Asp in serum suggests Tar could have some effect, but less so than Trg or Tsr (see Fig. 4A). We have revised the text throughout to better convey that we agree multiple chemoreceptors are involved in the response and clarify our rationale for studying the role of Tsr:

      [line 178]: “We modeled the local concentration profile of these effectors based on their typical concentrations in human serum (Fig. 4B). Of these, by far the two most prevalent chemoattractants in serum are glucose (5 mM) and L-serine (100-300 µM) (Fig. 4B-F). This suggested to us that the chemoreceptors Trg and/or Tsr could play important roles in serum attraction.”

      [line 186]: “Since tsr mutation diminishes serum attraction but does not eliminate it, we conclude that multiple chemoattractant signals and chemoreceptors mediate taxis to serum. To further understand the mechanism of this behavior we chose to focus on Tsr as a representative chemoreceptor involved in the response, presuming that serum taxis involves one, or more, of the chemoattractants recognized by Tsr that is present in serum: L-serine, NE, or DHMA.”

      [line 468] “Serum taxis occurs through the cooperative action of multiple bacterial chemoreceptors that perceive several chemoattractant stimuli within serum, one of these being the chemoreceptor Tsr through recognition of L-serine (Fig. 4).”

      Point-by-point responses to reviewer comments:

      Reviewer #1:

      (1) Presumably in the stomach, any escaping serum will be removed/diluted/washed away quite promptly? This effect is not captured by the CIRA assay but perhaps it might be worth commenting on how this might influence the response in vivo. Perhaps this could explain why, even though the chemotaxis appears rapid and robust, cases of sepsis are thankfully relatively rare.

      To clarify, the Enterobacteriaceae species we have tested here are colonizers of the intestines, not the stomach, and cases of bacteremia from these species are presumably due to bloodstream entry through intestinal lesions. Whether or not intestinal flow acts as a barrier to bloodstream entry is not something we test here, and so we have not commented on this idea in the manuscript. We do demonstrate that attraction to serum occurs within seconds-to-minutes of exposure. We expect that the major protective effects against sepsis are the host antibacterial factors in serum, which are well-described in other work. We have been careful to state throughout the text that we see attraction responses, and growth benefits, to serum that is diluted in an aqueous media, which is different than bacterial growth in 100% serum or in the bloodstream.

      (2) The authors refer to human serum as a chemoattractant numerous times throughout the study (including in the title). As the authors acknowledge, human serum is a complex mixture and different components of it may act as chemoattractants, chemo-repellents (particularly those with bactericidal activities) or may elicit other changes in motility (e.g. chemokinesis). The authors present convincing evidence that cells are attracted to serine within human serum - which is already a well-known bacterial chemoattractant. Indeed, their ability to elucidate specific elements of serum that influence bacterial motility is a real strength of the study. However, human serum itself is not a chemoattractant and this claim should be re-phrased - bacteria migrate towards human serum, driven at least in part by chemotaxis towards serine.

      Throughout the text we have changed these statements, including in the title, to either be ‘taxis to serum’ or ‘serum attraction.’ On the timescales we tested our data support that chemotaxis, not chemokineses or other forms of direction motility, is what drives rapid serum attraction, since a motile but non-chemotactic cheY mutant cannot localize to serum (Fig. 4). We present evidence of one of these chemotactic interactions (L-Ser).

      (3) Linked to the previous point, several bacterial species (including E. coli - one of the bacterial species investigated here) are capable of osmotaxis (moving up or down gradients in osmolality). Whilst chemotaxis to serine is important here, could movement up the osmotic gradient generated by serum injection play a more general role? It could be interesting to measure the osmolality of the injected serum and test whether other solutions with similar osmolality elicit a similar migratory response. Another important control here would be to treat human serum with serine racemase and observe how this impacts bacterial migration.

      As addressed above, we have added additional experiments of serum taxis treated with serine racemase showing competition between WT and cheY, and WT and tsr (Fig. S3). These data support a role for L-serine as a chemoattractant driving attraction to serum. The idea of osmotaxis is interesting, but outside the scope of this work since we focus on chemoattraction to L-serine as one of the mechanisms driving serum attraction, and have multiple lines of evidence to support that.

      (4) The migratory response of E. coli looks striking when quantified (Fig. 6C) but is really unclear from looking at Panel B - it would be more convincing if an explanation was offered for why these images look so much less striking than analogous images for other species (E.g. Fig. 6A).

      We agree that the E. coli taxis to serum response is less obvious. We have brightened those panels to hopefully make it clearer to interpret (more cells in field of view over time). Also, as stated in the y-axes of these plots, this quantification was performed by enumerating the number of cells in the field of view, and the Citrobacter and Escherichia responses are shown on separate y-axes (now Fig. 8C). As indicated, the experiments have different numbers of starting motile cells, which we presume accounts for the difference in attraction magnitude. When investigating diverse bacterial systems we found there to be differences in motility under the culturing and experimental conditions we employed, for multiple reasons, and so for these data we thought it best to report raw cell numbers rather data normalized to the starting number of bacteria, as we do elsewhere. In the specific case of these E. coli responding to serum, please view Supplementary Movie S3, which both clearly shows the attraction response and that the bacteria grew in a longer, semi-filamentous form that seem to impair their swimming speed.

      (5) It is unclear why the fold-change in bacterial distribution shows an approximately Gaussian shape with a peak at a radial distance of between 50 -100 um from the source (see for example Fig. 2H). Initially, I thought that maybe this was due to the presence of the microcapillary needle at the source, but the CheY distribution looks completely flat (Fig. 3I). Is this an artifact of how the fold-change is being calculated? Certainly, it doesn't seem to support the authors' claim that cells increase in density to a point of saturation at the source. Furthermore, it also seems inappropriate to apply a linear fit to these non-linear distributions (as is done in Fig. 2H and in the many analogous figures throughout the manuscript).

      We have revised the text to address this point, and removed the comment about cells increasing in density to a point of saturation: [Line 138] “We noted that in some experiments the population peak is 50-75 µm from the source, possibly due to a compromise between achieving proximity to nutrients in the serum and avoidance of bactericidal serum elements, but this behavior was not consistent across all experiments. Overall, our data show S. enterica serovars that cause disease in humans are exquisitely sensitive to human serum, responding to femtoliter quantities as an attractant, and that distinct reorganization at the population level occurs within minutes of exposure (Fig. 3, Movie 2).”

      We can confirm that this is not an artifact of quantification. Please refer to the videos of these responses, which demonstrates this point (Movies 1-5).

      (6) The authors present several experiments where strains/ serovars competed against each other in these chemotaxis assays. As mentioned, these are a real strength of the study - however, their utility is not always clear. These experiments are useful for studying the effects of competition between bacteria with different abilities to climb gradients.

      However, to meaningfully interpret these effects, it is first necessary to understand how the different bacteria climb gradients in monoculture. As such, it would be instructive to provide monoculture data alongside these co-culture competition experiments.

      Thank you for this suggestion. We agree that the coculture experiments showing strains competing for the same source of effector give a different perspective than monoculture. These experiments allow us to confirm taxis deficiencies or advantages with greater sensitivity, and ensure that the bacteria in competition have experienced the same gradient. This type of competition experiment is often used in in vivo experimentation for the same advantages. We note that in the gut the bacteria are not in monoculture and chemotactic bacteria do have to compete against each other for access to nutrients. Repeating all of the experiments we present to show both the taxis responses in coculture and monoculture would be an extraordinary amount of work that we do not believe would meaningfully change the conclusions of this study.

      (7) Linked to the above point, it would be especially instructive to test a tsr mutant's response in monoculture. Comparing the bottom row of Fig. 3G to Fig. 3I suggests that when in co-culture with a cheY mutant, the tsr mutant shows a higher fold-change in radial distribution than the WT strain. Fig. 4G shows that a tsr mutant can chemotaxis towards aspartate at a similar, but reduced rate to WT. This could imply that (like the trg mutant), a tsr mutant has a more general motility defect (e.g. a speed defect), which could explain why it loses out when in competition with the WT in gradients of human serum, but actually seems to migrate strongly to human serum when in co-culture with a cheY mutant. This should be resolved by studying the response of a tsr mutant in monoculture.

      Addressed above.

      (8) In Fig. 4, the response of the three clinical serovars to serine gradients appears stronger than the lab serovar, whilst in Fig. 1, the response to human serum gradients shows the opposite trend with the lab serovar apparently showing the strongest response. Can the authors offer a possible explanation for these slightly confusing trends?

      We suspect this relates to the fact that pure L-serine is a chemoattractant, whereas treatment with serum exposes the bacteria both to chemoattractants and, likely, chemorepellents. Strains may navigate the landscape of these stimuli different for a variety of reasons that are not simple to tease apart. The final magnitude of change in bacterial localization depends on multiple factors including swimming speed, adaptation, sensitivity of chemoattraction, and cooperative signaling of the chemoreceptor nanoarray. Thus, we cannot state with certainty how and why these strains are different across all experiments, but we can state that they are attracted to both serum and L-serine.

      (9) In Fig. S2, it seems important to present quantification of the effect of serine racemase and the reported lack of response to NE and DHMA - the single time-point images shown here are not easy to interpret.

      As suggested, we present quantification of the serum racemase treated samples (now Fig. S3). To assist in the interpretation of this max projections Fig. S3 now noted the chemotactic response (chemoattraction for L-serine, null-response for NE/DHMA). Further, we revised the text to state: [line 209: “We observed robust chemoattraction responses to L-serine, evident by the accumulation of cells toward the treatment source (Fig. S3E, Movie 4), but no response to NE or DHMA, with the cells remaining randomly distributed even after 5 minutes of exposure (Fig. S3F-I, Movie 5, Movie S1).”

      (10) Importantly, the authors detail how they controlled for the effects of pH and fluid flow (Line 133-136). Did the authors carry out similar controls for the dual-species experiments where fluorescent imaging could have significantly heated the fluid droplet driving stronger flow forces?

      Most of our microfluidics experiments were performed in a temperature-controlled chamber (see Methods). Since the strains in the coculture experiments experienced the same experimental conditions we have no evidence of fluorescence-imaginginduced temperature changes that have impacted whether or not the bacteria are attracted to serum or the effectors we investigated.

      (11) The inference of the authors' genetic analysis combined with the migratory response of E. coli and C. koseri to human serum shown in Fig. 6 is that Tsr drives movement towards human serum across a range of Enterobacteriaceae species. The evidence for the importance of Tsr here is currently correlative - more causal evidence could be presented by either studying the response of tsr mutants in these two species (certainly these should be readily available for E. coli) or by studying the response of these two species to serine gradients.

      We have revised the text to state: [line 402] “Without further genetic analyses in these strain backgrounds, the evidence for Tsr mediating serum taxis for these bacteria remains circumstantial. Nevertheless, taxis to serum appears to be a behavior shared by diverse Enterobacteriaceae species and perhaps also Gammaproteobacteria priority pathogen genera that possess Tsr such as Serratia, Providencia, Morganella, and Proteus (Fig. 8B).”

      We note that other work has thoroughly investigated E. coli serine taxis.

      Figure Suggestions

      (1) Fig. 2 - The inset bar charts in panels H-J and the font size in their axes labels are too small - this suggestion also applies to all analogous figures throughout the manuscript.

      We have increased the size of the text for these inset plots. We have also broken up some of the larger figures.

      (2) Panel 2F - the cartoon bacterial cell and 'number of bacteria' are confusing and seem to contradict the y-axis label. This also applies to several other figures throughout the manuscript where the significance of this cartoon cell is quite hard to interpret.

      As suggested, we have removed this cartoon.

      (3) Panels G-I in Fig. 3 are currently tricky to interpret - it would be easier if the authors were to use three different colours for the three different strains shown across these panels.

      We have broken up Figure 2 (which also had these types of plots) so that hopefully these labels are more clear. For the Figure in question (now Fig. 4), due to the many figures and different types of data and comparisons it was difficult to find a color scheme for these strains that would be consistent across the manuscript. These colors also reflect the fluorescence markers. We note that not only do we use color to indicate the strain but also text labels.

      (4) Panels 3B-F would be best moved to a supplementary figure as this figure is currently very busy. Similarly, I would potentially consider presenting only the bottom row of panels in Panels G-I in the main figure (which would then be consistent with analogous data presented elsewhere).

      We have opted to keep these panels in the main text (now Fig. 4) as they are relevant to understanding (1) our justification for why to pursue certain chemoeffector-chemoreceptor interactions and not others, and (2) how the chemoattraction response can be understood both in terms of bacterial population distribution and relevant cells over time.

      (5) Fig. 4 and possibly elsewhere - perhaps best not to use Ser as an abbreviation for Serine here because it could potentially be confused with an abbreviation for serum.

      It is unfortunate that these two words are so similar. However, Ser is the canonical abbreviation for the amino acid serine. Serum does not have a canonical abbreviation.

      (6) Fig. 4 - I would move panels H - K to a separate supplementary figure - currently, they are too squished together and it is hard to make out the x-axis labels. I would also consider moving panels E-G to supplementary as well so that the microscopy images presented elsewhere in the figure can be presented at an appropriate size.

      Since we are allowed more figures, we could also break some of these figures up into multiple ones.

      (7) Similarly, I would move some panels from Fig. 5 to supplementary as the figure is currently quite busy.

      We have rearranged the figure (now Fig. 7) to move the bioinformatics data to Fig. 8 to allow more space for the panels.

      Other suggestions

      (8) Line 179 - how do the concentrations quote for serine and glucose compare to aspartate? This would be helpful to justify the authors' decision not to investigate Tar as a potential chemoreceptor.

      This is addressed in our comments above and in Fig. 4A and Fig. 4B-F. Human serum L-Asp is much lower concentration (about 20-fold).

      (9) Line 282 - Serine levels in serum are quantified at 241 uM, but this is only discussed in the context of serum growth effects. Could this information be better used to design/ inform the serine gradients that were tested in chemotaxis assays?

      We tested a wide range of serine concentrations and show even much lower sources of serine than is present in serum is sufficient for chemoattraction. Also, the K1/2 for serine is 105 uM (Fig. S4), which is surpassed by the concentration in serum (Fig. S5).

      (10) The word 'potent' in the title might be too vague, especially as the strength of the response varies between strains/species. It may perhaps be more useful to focus on the rapidity/sensitivity of the response. However, presumably the sensitivity of the response will be driven by the sensitivity of the response to serine (which is already known for E. coli at least). Also, as noted in the public review, human serum itself is not a chemoattractant so I would consider re-phasing this in the title and elsewhere.

      As suggested, and discussed above, we have implemented this change.

      (11) Typo line 59 'context of colonizing of a healthy gut'.

      Addressed.

      (12) Typo line 538 - there is an extra full stop here.

      Addressed.

      Reviewer #2:

      (1) This study is well executed and the experiments are clearly presented. These novel chemotaxis assays provide advantages in terms of temporal resolution and the ability to detect responses from small concentrations. That said, it is perhaps not surprising these bacteria respond to serum as it is known to contain high levels of known chemoattractants, serine certainly, but also aspartate. In fact, the bacteria are shown to respond to aspartate and the tsr mutant is still chemotactic. The authors do not adequately support their decision to focus exclusively on the Tsr receptor. Tsr is one of the chemoreceptors responsible for observed attraction to serum, but perhaps, not the receptor. Furthermore, the verification of chemotaxis to serum is a useful finding, but the work does not establish the physiological relevance of the behavior or associate it with any type of disease progression. I would expect that a majority of chemotactic bacteria would be attracted to it under some conditions. Hence the impact of this finding on the chemotaxis or medical fields is uncertain.

      We agree that the data we show are mostly mechanistic and further work is required to learn whether this bacterial behavior is relevant in vivo and during infections. We present new data using an ex vivo intestinal model which supports the feasibility of serum taxis mediating invasion of enterohemorrhagic lesions (Fig. 8).

      (2) The authors also state that "Our inability to substantiate a structure-function relationship for NE/DHMA signaling indicates these neurotransmitters are not ligands of Tsr." Both norepinephrine (NE) and DHMA have been shown previously by other groups to be strong chemoattractants for E. coli (Ec), and this behavior was mediated by Tsr (e.g. single residue changes in the Tsr binding pocket block the response). Given the 82% sequence identity between the Se and Ec Tsr, this finding is unexpected (and potentially quite interesting). To validate this contradictory result the authors should test E. coli chemotaxis to DHMA in their assay. It may be possible that Ec responds to NE and DHMA and Se doesn't. However, currently, the data is not strong enough to rule out Tsr as a receptor to these ligands in all cases. At the very least the supporting data for Tsr being a receptor for NE/DHMA needs to be discussed.

      Addressed above. The focus of this study is serum attraction and the mechanisms thereof. We never saw any evidence to support the idea that NE/DHMA drives attraction to serum, nor are chemoeffectors for Salmonella, and provide these null-results in Data S2.

      (3) The authors also determine a crystal structure of the Se Tsr periplasmic ligand binding domain bound to L-Ser and note that the orientation of the ligand is different than that modeled in a previously determined structure of lower resolution. I agree that the SeTsr ligand binding mode in the new structure is well-defined and unambiguous, but I think it is too strong to imply that the pose of the ligand in the previous structure is wrong. The two conformations are in fact quite similar to one another and the resolution of the older structure, is, in my view, insufficient to distinguish them. It is possible that there are real differences between the two structures. The domains do have different sequences and, moreover, the crystal forms and cryo-cooling conditions are different in each case. It's become increasingly apparent that temperature, as manifested in differential cooling conditions here, can affect ligand binding modes. It's also notable that full-length MCPs show negative cooperativity in binding ligands, which is typically lost in the isolated periplasmic domains. Hence ligand binding is sensitive to the environment of a given domain. In short, the current data is not convincing enough to say that a previous "misconception" is being corrected.

      Thank you for this comment, which spurred us to investigate this idea more rigorously. As described above we performed new refinements of the E. coli structure edited to have the positions of the ligand and ligand-binding site as modeled in our new Tsr structure from Salmonella (Fig. 7J). The best model is obtained with these poses. Along with the poor fit of the E. coli model to the density, the best interpretations for these positions, for both structures, are as we have modeled them in the Salmonella Tsr structures.

      Figure suggestions

      (1) Figure 2 looks busy and unorganized. Fig 2C could be condensed into one image where there are different colored rings coming from the source point that represent different time points.

      Addressed above. Fig. 2 has been broken apart to help improve clarity.

      (2) What is the second (bottom) graph of 2D? I think only the top graph is necessary.

      We have added an explanation to the figure legend that the top graph shows the means and the bottom shows SEM. The plots cannot easily be overlaid.

      (3) Similarly, Fig 2E doesn't need to have so many time points. Perhaps 4 at maximum.

      As the development of the response over time is a key take-home of the study, we do not wish to reduce the timepoints shown.

      (4) The legend for Figure 2F uses the unit 'µM' to mean micrometers but should use 'µm'.

      Corrected.

      (5) In Figures 2H-J, the lime green text is difficult to read. The word "serum" does not need to be at the top of each panel. I recommend shortening the y-axis titles on the graphs so you can make the graphs themselves larger.

      Addressed above.

      (6) In Figures 2H-J, I am confused about what is being shown in the inset graph. The legend says it's the AUC for the data shown. However, in the third panel (S. Typhimurium vs. S. Enteriditus) the data appears to be much more disparate than the inset indicates. I don't think that this inset is necessary either.

      The point of this inset graph is to quantify the response through integration of the curve, i.e., area under the curve, which is a common way to quantify complex curves and compare responses as single values. We are using this method to calculate statistical significant of the response compared to a null response. We have added further clarification to the figure legend regarding these plots: Inset plots show foldchange AUC of strains in the same experiment relative to an expected baseline of 1 (no change). p-values shown are calculated with an unpaired two-sided t-test comparing the means of the two strains, or one-sided t-test to assess statistical significance in terms of change from 1-fold (stars).

      (7) Line 154, change "relevant for" to "observed in".

      Changed.

      (8) Line 171, according to the Mist4 database, Salmonella enterica has seven chemoreceptors. Why are only Tar, Tsr, and Trg mentioned? Why were only Tsr and Trg tested?

      Addressed above.

      (9) Line 192, be clear that you are referring to genes and not proteins, as italics are used.

      Revised to make this distinction clear.

      (10) Line 193, have other studies found a Trg deletion strain to be non-chemotactic? If so, cite this source here.

      We state that the Trg deletion strain had deficiencies in motility, and also have revised the text to include the clarification that this was not noted in earlier work with this strain: [line 173]: We were surprised to find that the trg strain had deficiencies in swimming motility (data not shown). This was not noted in earlier work but could explain the severe infection disadvantage of this mutant 34. Because motility is a prerequisite for chemotaxis, we chose not to study the trg mutant further, and instead focused our investigations on Tsr.

      (11) Why wasn't a Tar deletion mutant also analyzed? The authors say that based on the known composition of serum, serine and glucose are the most abundant. However, the serum does have aspartate at 10s of micromolar concentrations.

      Addressed above.

      (12) “The Tsr deletion strain still exhibits an obvious chemoattraction to serum. There are other protein(s) involved in chemoattraction to serum but the text does not discuss this.”

      Addressed above.

      (13) “In Figure 3B-F, the text is very difficult to read even when zoomed in on.”

      We have increased the font size of these panels.

      (14) “All of the text in Figure 5 is extremely small and difficult to read.”

      Addressed above. We split this figure in two to help improve clarity.

      (15) “I wonder about the accuracy of the concentration modeling. It seems like there are a lot of variables that could affect the diffusion rates, including the accuracy of the delivery system. Could the concentrations be verified by the dye experiments?”

      Addressed above. We provide a new analysis comparing experimental diffusion of A488 dye compared to calculations (Fig. S2).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) It is nice that the authors compared their model to the one "without lookahead" in Figure 4, but this comparison requires more evidence in my opinion, as I explain in this comment. The model without lookahead is closely related or possibly equivalent to the standard predictive coding. In predictive coding, one can make the network follow the stimulus rapidly by reducing the time constant tau. However, as the time constant decreases, the network would become unstable both in simulations (due to limited integration time step) and physical implementation (due to noise). Therefore I wonder if the proposed model has an advantage over standard predictive coding with an optimized time constant. Hence I suggest to also add a comparison between the proposed model, and the predictive coding with parameters (such as tau) optimized independently for each model. Of course, we know that the time-constant of biological neurons is fixed, but biological neurons might have had different time constants (by changing leak conductance) and such analysis could shed light on the question of why the neurons are organized the way they are.

      The comparison with a predictive network for which the neuronal time constants shrink towards 0 is in fact helpful. We added two news subsections in the SI that formally compares the NLA with other approaches, Equilibrium propagation and the Latent Equilibrium, with a version of Equilibrium Propagation also covering the standard predictive coding you describe (SI, Sect.C and D). The Subsection C concludes: “In the Equilibrium propagation we cannot simply take the limit t0 since then the dynamics either disappears (when tau remains on the left, t Du  0) or explodes (when t is moved to the right, dt/ t  ∞), leading to either too small or too big jumps.”

      We have also expanded the passage on the predictive coding in the main text, comparing our instantaneous network processing (up to a remaining time constant tin) with experimental data from humans (see page 10 of the revised ms). The new paragraph ends with:

      “Notice that, from a technical perspective, making the time constants of individual cortical neurons arbitrarily short leads to network instabilities and is unlikely the option chosen by the brain (see SI Sect. C, Comparison to the Equilibrium Propagation).”

      A new formal definition of the moving equilibrium in the Methods (Sect. F) helps to understand this notion of being in a balanced equilibrium state during the dynamics. This formal definition directly leads to the contraction analysis in the SI, Sect. D, showing why the Latent Equilibrium is always contractive, while the current form of the NLA may show jumps at the corner of a ReLu (since a second order derivative of the transfer function enters in the error propagation).

      The reviewer perhaps has additional simulations in mind that compare the robustness of the different models. However, as this paper is more about presenting a novel concept with a comprehensive theory (summing up to 45 pages), we prefer to not add more than the simulations necessary to check the statements of the theorems.

      (2) I found this paper difficult to follow, because the Results sections went straight into details, and various elements of the model were introduced without explaining why they are necessary. Furthermore, the neural implementation was introduced after the model simulations. I suggest reorganizing the manuscript, to describe the model following Marr's levels of description and then presenting the results of simulations. In particular, I suggest starting the Results section by explaining what computation the network is trying to achieve (describe the setup, function L, define its integral over time, and explain that the goal is to find a model minimizing this integral). Then, I suggest presenting the algorithm the neurons need to employ to minimize this integral, i.e. their dynamics and plasticity (I wonder if r=rho(u) + tau rho(u)' is a consequence of action minimization or a necessary assumption - please clarify it). Next please explain how the algorithms could be implemented in biological neurons. Afterward please present the results of the simulation.

      We are sorry to realize that we could not convey the main message clearly enough. After rewriting the paper and straightening the narrative, we hope it is simpler to understand now.

      The paper does not suggest a new model to solve a task, and writing down the function to be minimized is not enough. The point of the NLA is that the time integral of our Lagrangian is minimized with respect to the prospective coordinates, i.e. the discounted future voltage. It is about the question how dynamic equations in biology are derived. Of course, we also solve these equations, prove theorems and perform simulations. But the main point that biology seems to deal with time differently than physics deals with time. Biology “thinks” in terms of future quantities, physics “thinks” in terms of current quantities. We tried to explain this better now in the Introduction, the Results (e.g. after Eq. 5) and the Methods.

      (3) Understanding the paper requires background knowledge that most readers of eLife are unlikely to have, even if they are mathematically minded. For example, I am from the field of computational neuroscience, and I have never heard about Least Action principle from physics or the EulerLagrange equation. I felt lost after reading this paper, and to be able to write this review I needed to watch videos on the Euler-Lagrange equation. To help other readers, I have two suggestions: First, I feel that Eq 4-6 could be moved to the methods, because I found the concept of u~ difficult to understand, and it does not appear in the algorithm. Second, I advise to write in the Introduction, what knowledge is required to follow this paper, and point the readers to resources where they can find the required information. The authors may specify what background is required to follow the main text, and what is required to understand the methods.

      We hope that after explaining the rationale better, it becomes clear that we cannot skip the equations for the prospective coordinates. Likewise, the Euler-Lagrange equations need to be presented in the abstract form, since these are the equations that are eventually transformed into the “model”. We tried to give the basic intuition for this in the main text. As we explained above, the equations asked to be skipped represent the essence of the proposal. It is about how to derive a model equations.

      Moreover, we give more explanations in the Methods to understand the derivations, and we refer to the specifically sections in the SI for further details. We are aware that a full understanding of the theory requires some basic knowledge of the calculus of variation.

      We are hesitating to write in the Introduction what type of knowledge is required to understand the paper. An understanding can be on various levels. Moreover, the materials that are considered to be helpful depend on the background. While for some it is a Youtube, for some Wikipedia, and for others it is a textbook where specific ingredients can be extracted. But we do cite two textbooks in the Results and more in the SI, Sect. F, when referring to the principle of least action in physics and the mathematics, including weblinks.

      Minor comments

      Eq.3: The Authors refer to this equation as a Lagrangian. Could you please clarify why? Is the logic to minimize the energy subject to a constraint that Cost = 0?

      Thanks for asking. The cost is not really a constraint, it is globally minimized, in parallel steps. We are explaining this right after Eq. 3. “We `prospectively' minimize L locally across a voltage trajectory, so that, as a consequence, the local synaptic plasticity for W will globally reduce the cost along the trajectory (Theorem 1 below).”

      We were adding two sentence that explain why this function in Eq. 3 is called a Lagrangian: “While in classical energy-based approaches L is called the total energy, we call it the `Lagrangian' because it will be integrated along real and virtual voltage trajectories as done in variational calculus (leading to the Euler-Lagrange equations, see below and SI, Sect. F)”

      p.4, below Eq. 5 - Please explain the rationale behind NLA, i.e. why is it beneficial that "the trajectory u˜(t) keeps the action A stationary with respect to small variations δu˜"? I guess you wish to minimize L integrated over time, but this is not evident from the text.

      Hmm, yes and no. We wish to minimize the cost, and on the way there minimize the action. Since the global minimization of C is technically difficult, one looks for stationary trajectory as defined in the cited sentence, while minimizing L with respect to W, to eventually minimize the cost.

      In the text we now explain after Eq. 5:

      “The motivation to search for a trajectory that keeps the action stationary is borrowed from physics. The motivation to search for a stationary trajectory by varying the near-future voltages ũ instead of u is assigned to the evolutionary pressure in biology to 'think ahead of time'. To not react too late, internal delays involved in the integration of external feedback need to be considered and eventually need to be overcome. In fact, only for the 'prospective coordinates' defined by looking ahead into the future, even when only virtually, will a real-time learning from feedback errors become possible (as expressed by our Theorems below).”

      Bottom of page 8. The authors say that in the case of single equilibrium and strong nudging the model reduced to the Least Control Principle. Does it also reduce to Predictive coding for supervised learning? If so, it would be helpful to state so.

      Yes, in this case the prediction error in the apical dendrite becomes the one of predictive coding. We are stating this now right at the end of the cited sentence:

      “In the case of strong nudging and a single steady-state equilibrium, the NLA principle reduces to the Least-Control Principle (Meulemans et al., 2022) that minimizes the mismatch energy E^M for a constant input and a constant target, with the apical prediction error becoming the prediction error from standard predictive coding (Rao & Ballard, 1999).”

      In the Discussion we also added a further point (iv) to compare the NLA principle with predictive coding. Both “improve” the sensory representation, but the NLA does in favor of an output, and the predictive coding in favor of the sensory prediction itself (see Discussion).

      Whenever you refer to supplementary materials, please specify the section, so it is easier for the reader to find it.

      Done. Sorry to not have done it earlier. We are now also indicate specific sections when referring to the Methods.

      Reviewer #2 (Recommendations For The Authors):

      There are no major issues with this article, but I have several considerations that I think would greatly improve the impact, clarity, and validity of the claims.

      (1) Unifying the narrative. There are many many ideas put forward in what feels like a deluge. While I appreciate the enthusiasm, as a reader I found it hard to understand what it was that the authors thought was the main breakthrough. For instance, the abstract, results, introduction, and discussion all seem to provide different answers to that question. The abstract seems to focus on the motor error idea. The introduction seems to focus on the novel prospective+predictive setup of the energy function. The discussion lists the different perks of the theory (delay compensation, moving equilibrium, microcircuit) without referring to the prospective+predictive setup of the energy function.

      Thanks much for these helpful hints. Yes, the paper became an agglomerate of many ideas, also own to the fact that we wish to show how the NLA principle can be applied to explain various phenomenology in neurosicence. We now simplified the narrative to this one point of providing a novel theoretical framework for neuroscience, and explaining why this is novel and why it “suddenly works” (the prospective minimization of the energy).

      As you can see from the dominating red in the revised pdf, we did fully rewrite Abstract, Introduction and Discussion under the narrative of the NLA and prospective coding.

      (2) Laying out the organization of the notation clearly. There are quite a few subtle distinctions of what is meant by the different weight matrices (omnibus matrix then input vs recurrent then layered architecture), different temporal horizon formalisms (bar, not bar, tilde), different operators (L, curly L, derivative version, integral version). These different levels are introduced on the fly, which makes it harder to grasp. The fact that there are many duplicate notations for the same quantities does not help the reader. For instance u_0 becomes equal to u_N at one point (above Eq 25). Another example is the constant flipping between integrated and 'current input' pictures. So laying out the multiple layers early, making a table or a figure for the notation, or sticking with one level would help convey the idea to a wide readership.

      Thanks for the hints. We included the table you suggested, but put it to the SI as it became a full page itself. We banned the curly L abbreviating the look-ahead operator.

      The “change of notation” you are alluding to is tricky, though. In a recurrent layer, the index of the output neuron is called o. In a forward network with N layer, the index of the output neurons becomes the last layer N. One has to introduce the layer index l anway for the deeper layers l < N, and we found it more consistent to explain that, while switching from the recurrent to the forward network, the voltage of the output layer becomes now u_o = u_N. There are more of these examples, like the weight matrix W splitting into a intrinsic network part W_net across which errors backpropagate, and a part conveying the input, W_in, that has to be excluded when writing the backpropagation formula for general networks. Again, in the case of the feedforward networks, the notation reduces to W_l, with index l coding for the layer. Presenting the general approach and a specific example may appear as we would duplicate notations – we haven’t found a solution here.

      (3) Separate the algorithm from the implementation level. I particularly struggled with separating the ideas that belonged to the algorithm level (cost function, optimization objectives) and the biophysics. The two are interwoven in a way that does not have to be. Particularly, some of the normative elements may be implemented by other types of biophysics than the authors have in mind. It is for this reason that I think that separating more clearly what belongs to the implementation and algorithm levels would help make the ideas more widely understood. On this point, a trigger point for me was the definition of the 'prospective input rates' e_i, which comes in the second paragraph.

      We are very sorry to have made you thinking that the 'prospective input rates' would be e_i. The prospective input rates are r_i. The misunderstanding likely appeared by an unclear formulation from our side that is now corrected (see first and second paragraph of the Results where we introduce r_i and e_i).

      From a biophysical perspective, it is quite arbitrary to define the input to be the difference between the basal input and the somatic (prospective) potential. It sounds like it comes from some unclear normative picture at this point. But the authors seem to have in mind to use the fact that the somatic potential is the sum of apical and basal input, that's the biophysical picture.

      We hope to have disentangled the normative and biophysical view in the 2nd and 3rd paragraph of the Results, respectively. We introduce the prospective error ei as abstract notion in the first paragraph, while explaining that it will be interpreted as somato-dendritic mismatch error in neuron I in the next paragraph. The second paragraph contains the biophysical details with the apical and basal morphology.

      (4) Experts and non-expert would appreciate an explanation of why/how the choice of state variables matters in the NLA. The prospective coding state variables cannot be said to be the naïve guess. Why does the simple u, dot{u} not work as state variables applied on the same energy function, as would be a naïve application of the Lagrangian ideas?

      We are very glad for this hint to present an intuition behind the variation of the action with respect to a prospective state, instead of the state itself. The simple L(u, dot{u}) does not work because one does not obtain the first-order voltage dynamics compatible with the biophysics. We made an effort to explain the intuition to non-experts and experts in an additional paragraph right after presenting the voltage and error dynamics (Eq. 7 on page 4).

      Here is how the paragraph starts (not displaying the formulas here):

      “From the point of view of theoretical physics, where the laws of motion derived from the least-action principle contain an acceleration term (as in Newton's law of motion, like … for a harmonic oscillator), one may wonder why no second-order time derivative appears in the NLA dynamics. As an intuitive example, consider driving into a bend. Looking ahead in time helps us to reduce the lateral acceleration by braking early enough, as opposed to braking only when the lateral acceleration is already present. This intuition is captured by minimizing the neuronal action A with respect to the discounted future voltages ũi instead of the instantaneous voltages ui.

      Keeping up an internal equilibrium in the presence of a changing environment requires to look ahead and compensate early for the predicted perturbations.

      Technically, …”

      More details are given in the Methods after Eq. 20. Moreover, in the last part of the SI, Sect. F, we have made the link to the least-action principle in physics more explicitly. There we show how the voltage dynamics can be derived from the physical least-action principle by including the Rayleigh dissipation (Eq. 92 and 95).

      (5) Specify that the learning rules have not been observed. Though the learning rules are Hebbian, the details of the rules have not to my knowledge been observed. Would be worth mentioning as this is a sticking point of most related theories.

      We agree, and we do now explicitly write in the Discussion that the learning rule still awaits to be experimentally tested.

      6) Some relevant literature. Chalk et al. PNAS (2018) have explored the relationship between temporal predictive coding and Rao & Ballard predictive coding based on the parameters of the cost function. Harkin et al. eLife (2023) have shown that 'prospective coding' also takes place in the serotonergic system, while Kim ... Ma (2021) have put forward similar ideas for dopamine, both may participate in setting the cost function. Instantaneous voltage propagation is also a focus of Greedy et al. (2023). The authors cite Zenke et al. for spiking error propagation, but there are biological references to that end.

      Thanks much for these hints. We do now cite the book of Gerstner & Kistler on spiking neurons, and more specifically the spike-based approach for learning to represent signals (Brendel, .., Machens, Denève, PLoS CB, 2020). Otherwise, we had difficulties to incorporate the other literature that seems to us not directly related to our approach, even when related notions come up (like predictive coding and temporal processing in Chalk et al. (2018), where various temporal coding schemes coding efficiency is studied as a function of the signal-to-noise ratio), or the apical activities in Greedy et al. (2022), where bursting, multiplexing and synaptic facilitation arises). We found it would confuse more than it would help if we would cite these papers too (we do already cite 95 papers).

      (7) In the main text, theorem two is presented as proof without assumptions on the level of nudging, but the actual proof uses strong assumptions in that respect, relying on numerical ad hoc observations for the general case.

      Thanks for pointing this out. We agree it is a better style to state all the critical assumptions in Theorem itself, rather than deferring them to the Methods. We now state: “Then, for suitable top-down nudging, learning rates, and initial conditions, the ….weights …evolve such that…”.

      (8) In the discussion regarding error-backpropagation, it seems to me that it could be clarified that the current algorithm asks for a weight alignment between FF and FB matrices as well as between FB and interneuron circuit matrices. Whether all of these matrices can be learned together remains to be shown; neither Akrout, Kunin nor Max et al. have shown this explicitly. Particularly when there are other inputs to the apical dendrites from other areas.

      Yes, it is difficult to learn to align all in parallel. Nevertheless, our simulations in fact do align the lateral and vertical circuits, at is also claimed in Theorem 2. Yet, as specified in the theorem, “for suitable learning rates” (that were all the same, but were commonly reduced after some training time, as previously explained in the Methods, Details for Fig. 5).

      In the Discussion we now emphasis that, in general, simulating all the circuitries jointly from scratch in a single phase is tricky. We write:

      “A fundamental difficulty arises when the neuronal implementation of the Euler-Lagrange equations requires an additional microcircuit with its own dynamics. This is the case for the suggested microcircuit extracting the local errors. Formally, the representation of the apical feedback errors first needs to be learned before the errors can teach the feedforward synapses on the basal dendrites. We showed that this error learning can itself be formulated as minimizing an apical mismatch energy. What the lateral feedback through interneurons cannot explain away from the top-down feedback remains as apical prediction error.

      Ideally, while the network synapses targetting the basal tree are performing gradient descent on the global cost, the microcircuit synapses involved in the lateral feedback are performing gradient descent on local error functions, both at any moment in time.

      The simulations show that this intertwined system can in fact learn simultaneously with a common learning rate that is properly tuned. The cortical model network of inter- and pyramidal neurons learned to classify handwritten digits on the fly, with 10 digit samples presented per second. Yet, the overall learning is more robust if the error learning in the apical dendrites operates in phases without output teaching but with corresponding sensory activity, as may arise during sleep (see e.g. Deperrois et al., 2022 and 2023).”

      (9) The short-term depression model is assuming a slow type of short-term depression, not the fast types that are the focus of much recent experimental literature (like Campagnola et al. Science 2022).

      This assumption should be specified.

      Thanks for hinting to this literature that we were not aware of. We are now citing the releaseindependent plasticity (Campagnola et al. 2022) in the context of our synaptic depression model.

      (10) There seems to be a small notation issue: Eq 21 combines vectors of the size of the full network (bar{e}) and the size of the readout network (bar{e}star).

      Well, for notational convenience we set the target error to e*=0 for non-output neurons. This way we can write the total error for an arbitrary network neuron as the sum of the backpropagated error plus the putative target error (if the neuron is an output neuron). Otherwise we would always have to distinguish between network neuron that may be output neurons, and those that are not. We did say this in the main text, but are repeating it now again right after Eq. 21. -- Notations are often the result of a tradoff.

    1. What are the ways in which a parasocial relationship can be authentic or inauthentic?

      I think parasocial relationships can be authentic when such a figure or celebrity has a positive influence on the follower, encouraging them to grow and become a better person. I also believe that having a sense of mutual respect for one another is crucial. Though the celebrity might not know them, they can still have some level of respect, and for the follower, they can create respect by also not forming an unhealthy attachment to the figure. However, I also believe there are times when the relationship might not be authentic, as followers might misunderstand their relationship and assume that they have a deeper connection. As we see with Jessica, she believed that Mr. Rogers knew her and liked her. Although in Jessica's case it is quite an innocent misunderstanding, in some cases, it can lead to having unrealistic expectations of the figure as well as a lack of boundaries. Followers can presume the figure genuinely has a connection with them, and be devastatingly disappointed. Other times, followers may become obsessed with said figure and behave irrationally. As such, I think parasocial relationships can be authentic to a limit. I think it is important for the follower and even the figure to clarify the extent of their relationship.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a new and valuable theoretical account of spatial representational drift in the hippocampus. The evidence supporting the claims is convincing, with a clear and accessible explanation of the phenomenon. Overall, this study will likely attract researchers exploring learning and representation in both biological and artificial neural networks.

      We would like to ask the reviewers to consider elevating the assessment due to the following arguments. As noted in the original review, the study bridges two different fields (machine learning and neuroscience), and does not only touch a single subfield (representational drift in neuroscience). In the revision, we also analysed data from four different labs, strengthening the evidence and the generality of the conclusions.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors start from the premise that neural circuits exhibit "representational drift" -- i.e., slow and spontaneous changes in neural tuning despite constant network performance. While the extent to which biological systems exhibit drift is an active area of study and debate (as the authors acknowledge), there is enough interest in this topic to justify the development of theoretical models of drift.

      The contribution of this paper is to claim that drift can reflect a mixture of "directed random motion" as well as "steady state null drift." Thus far, most work within the computational neuroscience literature has focused on the latter. That is, drift is often viewed to be a harmless byproduct of continual learning under noise. In this view, drift does not affect the performance of the circuit nor does it change the nature of the network's solution or representation of the environment. The authors aim to challenge the latter viewpoint by showing that the statistics of neural representations can change (e.g. increase in sparsity) during early stages of drift. Further, they interpret this directed form of drift as "implicit regularization" on the network.

      The evidence presented in favor of these claims is concise. Nevertheless, on balance, I find their evidence persuasive on a theoretical level -- i.e., I am convinced that implicit regularization of noisy learning rules is a feature of most artificial network models. This paper does not seem to make strong claims about real biological systems. The authors do cite circumstantial experimental evidence in line with the expectations of their model (Khatib et al. 2022), but those experimental data are not carefully and quantitatively related to the authors' model.

      We thank the reviewer for pushing us to present stronger experimental evidence. We now analysed data from four different labs. Two of those are novel analyses of existing data (Karlsson et al, Jercog et al). All datasets show the same trend - increasing sparsity and increasing information per cell. We think that the results, presented in the new figure 3, allow us to make a stronger claim on real biological systems.

      To establish the possibility of implicit regularization in artificial networks, the authors cite convincing work from the machine-learning community (Blanc et al. 2020, Li et al., 2021). Here the authors make an important contribution by translating these findings into more biologically plausible models and showing that their core assumptions remain plausible. The authors also develop helpful intuition in Figure 4 by showing a minimal model that captures the essence of their result.

      We are glad that these translation efforts are appreciated.

      In Figure 2, the authors show a convincing example of the gradual sparsification of tuning curves during the early stages of drift in a model of 1D navigation. However, the evidence presented in Figure 3 could be improved. In particular, 3A shows a histogram displaying the fraction of active units over 1117 simulations. Although there is a spike near zero, a sizeable portion of simulations have greater than 60% active units at the end of the training, and critically the authors do not characterize the time course of the active fraction for every network, so it is difficult to evaluate their claim that "all [networks] demonstrated... [a] phase of directed random motion with the low-loss space." It would be useful to revise the manuscript to unpack these results more carefully. For example, a histogram of log(tau) computed in panel B on a subset of simulations may be more informative than the current histogram in panel A.

      The previous figure 3A was indeed confusing. In particular, it lumped together many simulations without proper curation. We redid this figure (now Figure 4), and added supplementary figures (Figures S1, S2) to better explain our results. It is now clear that the simulations with a large number of active units were either due to non-convergence, slow timescale of sparsification or simulations featuring label noise in which the fraction of active units is less affected. Regarding the log(tau) calculation, while it could indeed be an informative plot, it could not be calculated in a simple manner for all simulations. This is because learning curves are not always exponential, but sometimes feature initial plateaus (see also Saxe et al 2013, Schuessler et al 2020). We added a more detailed explanation of this limitation in the methods section, and we believe the current figure exemplifies the effect in a satisfactory manner.

      Reviewer #2 (Public Review):

      Summary:

      In the manuscript "Representational drift as a result of implicit regularization" the authors study the phenomenon of representational drift (RD) in the context of an artificial network that is trained in a predictive coding framework. When trained on a task for spatial navigation on a linear track, they found that a stochastic gradient descent algorithm led to a fast initial convergence to spatially tuned units, but then to a second very slow, yet directed drift which sparsified the representation while increasing the spatial information. They finally show that this separation of timescales is a robust phenomenon and occurs for a number of distinct learning rules.

      Strengths:

      This is a very clearly written and insightful paper, and I think people in the community will benefit from understanding how RD can emerge in such artificial networks. The mechanism underlying RD in these models is clearly laid out and the explanation given is convincing.

      We thank the reviewer for the support.

      Weaknesses:

      It is unclear how this mechanism may account for the learning of multiple environments.

      There are two facets to the topic of multiple environments. First, are the results of the current paper relevant when there are multiple environments? Second, what is the interaction between brain mechanisms of dealing with multiple environments and the results of the current paper?

      We believe the answer to the first question is positive. The near-orthogonality of representations between environments implies that changes in one can happen without changes in the other. This is evident, for instance, in Khatib et al and Geva et al - in both cases, drift seems to happen independently in two environments, even though they are visited intermittently and are visually similar.

      The second question is a fascinating one, and we are planning to pursue it in future work. While the exact way in which the brain achieves this near-independence is an open question, remapping is one possible window into this process.

      We extended the discussion to make these points clear.

      The process of RD through this mechanism also appears highly non-stationary, in contrast to what is seen in familiar environments in the hippocampus, for example.

      The non-stationarity noted by the reviewer is indeed a major feature of our observations, and is indeed linked to familiarity. We divide learning into three phases (now more clearly stated in Table 1 and Figure 4C). The first, rapid phase, consists of improvement of performance - corresponding to initial familiarity with the environment. The third phase, often reported in the literature of representational drift, is indeed stationary and obtained after prolonged familiarity. Our work focuses on the second phase, which is not as immediate as the first one, and can take several days. We note in the discussion that experiments which include a long familiarization process can miss this phase (see also Table 3). Furthermore, we speculate that real life is less stationary than a lab environment, and this second phase might actually be more relevant there.

      Reviewer #3 (Public Review):

      Summary:

      Single-unit neural activity tuned to environmental or behavioral variables gradually changes over time. This phenomenon, called representational drift, occurs even when all external variables remain constant, and challenges the idea that stable neural activity supports the performance of well-learned behaviors. While a number of studies have described representational drift across multiple brain regions, our understanding of the underlying mechanism driving drift is limited. Ratzon et al. propose that implicit regularization - which occurs when machine learning networks continue to reconfigure after reaching an optimal solution - could provide insights into why and how drift occurs in neurons. To test this theory, Ratzon et al. trained a Feedforward Network trained to perform the oft-utilized linear track behavioral paradigm and compare the changes in hidden layer units to those observed in hippocampal place cells recorded in awake, behaving animals.

      Ratzon et al. clearly demonstrate that hidden layer units in their model undergo consistent changes even after the task is well-learned, mirroring representational drift observed in real hippocampal neurons. They show that the drift occurs across three separate measures: the active proportion of units (referred to as sparsification), spatial information of units, and correlation of spatial activity. They continue to address the conditions and parameters under which drift occurs in their model to assess the generalizability of their findings.

      However, the generalizability results are presented primarily in written form: additional figures are warranted to aid in reproducibility.

      We added figures, and a Github with all the code to allow full reproducibility.

      Last, they investigate the mechanism through which sparsification occurs, showing that the flatness of the manifold near the solution can influence how the network reconfigures. The authors suggest that their findings indicate a three-stage learning process: 1) fast initial learning followed by 2) directed motion along a manifold which transitions to 3) undirected motion along a manifold.

      Overall, the authors' results support the main conclusion that implicit regularization in machine learning networks mirrors representational drift observed in hippocampal place cells.

      We thank the reviewer for this summary.

      However, additional figures/analyses are needed to clearly demonstrate how different parameters used in their model qualitatively and quantitatively influence drift.

      We now provide additional figures regarding parameters (Figures S1, S2).

      Finally, the authors need to clearly identify how their data supports the three-stage learning model they suggest.

      Their findings promise to open new fields of inquiry into the connection between machine learning and representational drift and generate testable predictions for neural data.

      Strengths:

      (1) Ratzon et al. make an insightful connection between well-known phenomena in two separate fields: implicit regularization in machine learning and representational drift in the brain. They demonstrate that changes in a recurrent neural network mirror those observed in the brain, which opens a number of interesting questions for future investigation.

      (2) The authors do an admirable job of writing to a large audience and make efforts to provide examples to make machine learning ideas accessible to a neuroscience audience and vice versa. This is no small feat and aids in broadening the impact of their work.

      (3) This paper promises to generate testable hypotheses to examine in real neural data, e.g., that drift rate should plateau over long timescales (now testable with the ability to track single-unit neural activity across long time scales with calcium imaging and flexible silicon probes). Additionally, it provides another set of tools for the neuroscience community at large to use when analyzing the increasingly high-dimensional data sets collected today.

      We thank the reviewer for these comments. Regarding the hypotheses, these are partially confirmed in the new analyses we provide of data from multiple labs (new Figure 3 and Table 3) - indicating that prolonged exposure to the environment leads to more stationarity.

      Weaknesses:

      (1) Neural representational drift and directed/undirected random walks along a manifold in ML are well described. However, outside of the first section of the main text, the analysis focuses primarily on the connection between manifold exploration and sparsification without addressing the other two drift metrics: spatial information and place field correlations. It is therefore unclear if the results from Figures 3 and 4 are specific to sparseness or extend to the other two metrics. For example, are these other metrics of drift also insensitive to most of the Feedforward Network parameters as shown in Figure 3 and the related text? These concerns could be addressed with panels analogous to Figures 3a-c and 4b for the other metrics and will increase the reproducibility of this work.

      We note that the results from figures 3 and 4 (original manuscript) are based on abstract tasks, while in figure 2 there is a contextual notion of spatial position. Spatial position metrics are not applicable to the abstract tasks as they are simple random mapping of inputs, and there isn’t necessarily an underlying latent variable such as position. This transition between task types is better explained in the text now. In essence the spatial information and place field correlation changes are simply signatures of the movements in parameter space. In the abstract tasks their change becomes trivial, as the spatial information becomes strongly correlated with sparsity and place fields are simply the activity vectors of units. These are guaranteed to change as long as there are changes in the activity statistics. We present here the calculation of these metrics averaged over simulations for completeness.

      Author response image 1.

      PV correlation between training time points averaged over 362 simulations. (B) Mean SI of units normalized to first time step, averaged over 362 simulations. Red line shows the average time point of loss convergence, the shaded area represents one standard deviation.

      (2) Many caveats/exceptions to the generality of findings are mentioned only in the main text without any supporting figures, e.g., "For label noise, the dynamics were qualitatively different, the fraction of active units did not reduce, but the activity of the units did sparsify" (lines 116-117). Supporting figures are warranted to illustrate which findings are "qualitatively different" from the main model, which are not different from the main model, and which of the many parameters mentioned are important for reproducing the findings.

      We now added figures (S1, S2) that show this exactly. We also added a github to allow full reproduction.

      (3) Key details of the model used by the authors are not listed in the methods. While they are mentioned in reference 30 (Recanatesi et al., 2021), they need to be explicitly defined in the methods section to ensure future reproducibility.

      The details of the simulation are detailed in the methods sections. We also added a github to allow full reproducibility.

      (4) How different states of drift correspond to the three learning stages outlined by the authors is unclear. Specifically, it is not clear where the second stage ends, and the third stage begins, either in real neural data or in the figures. This is compounded by the fact that the third stage - of undirected, random manifold exploration - is only discussed in relation to the introductory Figure 1 and is never connected to the neural network data or actual brain data presented by the authors. Are both stages meant to represent drift? Or is only the second stage meant to mirror drift, while undirected random motion along a manifold is a prediction that could be tested in real neural data? Identifying where each stage occurs in Figures 2C and E, for example, would clearly illustrate which attributes of drift in hidden layer neurons and real hippocampal neurons correspond to each stage.

      Thanks for this comment, which urged us to better explain these concepts.

      The different processes (reduction in loss, reduction in Hessian) happen in parallel with different timescales. Thus, there are no sharp transitions between the phases. This is now explained in the text in relation to figure 4C, where the approximate boundaries are depicted.

      The term drift is often used to denote a change in representation without a change in behavior. In this sense, both the second and third phases correspond to drift. Only the third stage is stationary. This is now emphasized in the text and in the new Table 1. Regarding experimental data, apart from the new figure 3 with four datasets, we also summarize in Table 3 the relation between duration of familiarity and stationarity of the data.

      Recommendations for the authors:

      The reviewers have raised several concerns. They concur that the authors should address the specific points below to enhance the manuscript.

      (1) The three different phases of learning should be clearly delineated, along with how they are determined. It remains unclear in which exact phase the drift is observed.

      This is now clearly explained in the new Table 1 and Figure 4C. Note that the different processes (reduction in loss, reduction in Hessian) happen in parallel with different timescales. Thus, there are no sharp transitions between the phases. This is now explained in the text in relation to figure 4C, where the approximate boundaries are depicted.

      The term drift is often used to denote a change in representation without a change in behavior. In this sense, both the second and third phases correspond to drift. Only the third stage is stationary. This is now emphasized in the text and in the new Table 1. Regarding experimental data, apart from the new figure 3 with four datasets, we also summarize in Table 3 the relation between duration of familiarity and stationarity of the data.

      (2) The term "sparsification" of unit activity is not fully clear. Its meaning should be more explicitly explained, especially since, in the simulations, a significant number of units appear to remain active (Fig. 3A).

      We now define precisely the two measures we use - Active Fraction, and Fraction Active Units. There is a new section with an accompanying figure in the Methods section. As Figure S2 shows, the noise statistics (label noise vs. update noise) differentially affects these two measures.

      (3) While the study primarily focuses on one aspect of representational drift-the proportion of active units-it should also explore other features traditionally associated with representational drift, such as spatial information and the correlation between place fields.

      This absence of features is related to the abstract nature of some of the tasks simulated in our paper. In our original submission the transition between a predictive coding task to more abstract tasks was not clearly explained, creating some confusion regarding the measured metrics. We now clarified the motivation for this transition.

      Both the initial simulation and the new experimental data analysis include spatial information (Figures 2,3). The following simulations (Figure 4) with many parameter choices use more abstract tasks, for which the notion of correlation between place cells and spatial information loses its meaning as there is no spatial ordering of the inputs, and every input is encountered only once. Spatial information becomes strongly correlated with the inverse of the active fraction metric. The correlation between place cells is also directly linked to increase in sparseness for these tasks.

      (4) There should be a clearer illustration of how labeling noise influences learning dynamics and sparsification.

      This was indeed confusing in the original submission. We removed the simulations with label noise from Figure 4, and added a supplementary figure (S2) illustrating the different effects of label noise.

      (5) The representational drift observed in this study's simulations appears to be nonstationary, which differs from in vivo reports. The reasons for this discrepancy should be clarified.

      We added experimental results from three additional labs demonstrating a change in activity statistics (i.e. increase in spatial information and increase in sparseness) over a long period of time. We suggest that such a change long after the environment is already familiar is an indication for the second phase, and stress that this change seems to saturate at some point, and that most drift papers start collecting data after this saturation, hence this effect was missed in previous in vivo reports. Furthermore, these effects are become more abundant with the advent on new calcium imaging methods, as the older electrophysiological regording methods did not usually allow recording of large amounts of cells for long periods of time. The new Table 3 surveys several experimental papers, emphasizing the degree of familiarity with the environment.

      (6) A distinctive feature of the hippocampus is its ability to learn different spatial representations for various environments. The study does not test representational drift in this context, a topic of significant interest to the community. Whether the authors choose to delve into this is up to them, but it should at least be discussed more comprehensively, as it's only briefly touched upon in the current manuscript version.

      There are two facets to the topic of multiple environments. First, are the results of the current paper relevant when there are multiple environments? Second, what is the interaction between brain mechanisms of dealing with multiple environments and the results of the current paper?

      We believe the answer to the first question is positive. The near-orthogonality of representations between environments implies that changes in one can happen without changes in the other. This is evident, for instance, in Khatib et al and Geva et al - in both cases, drift seems to happen independently in two environments, even though they are visited intermittently and are visually similar.

      The second question is a fascinating one, and we are planning to pursue it in future work. While the exact way in which the brain achieves this near-independence is an open question, remapping is one possible window into this process.

      We extended the discussion to make these points clear.

      (7) The methods section should offer more details about the neural nets employed in the study. The manuscript should be explicit about the terms "hidden layer", "units", and "neurons", ensuring they are defined clearly and not used interchangeably..

      We changed the usage of these terms to be more coherent and made our code publicly available. Specifically, “units” refer to artificial networks and “neurons” to biological ones.

      In addition, each reviewer has raised both major and minor concerns. These are listed below and should be addressed where possible.

      Reviewer #1 (Recommendations For The Authors):

      I recommend that the authors edit the text to soften their claims. For example:

      In the abstract "To uncover the underlying mechanism, we..." could be changed to "To investigate, we..."

      Agree. Done

      On line 21, "Specifically, recent studies showed that..." could be changed to "Specifically, recent studies suggest that..."

      Agree. Done

      On line 100, "All cases" should probably be softened to "Most cases" or more details should be added to Figure 3 to support the claim that every simulation truly had a phase of directed random motion.

      The text was changed in accordance with the reviewer’s suggestion. In addition, the figure was changed and only includes simulations in which we expected unit sparsity to arise (without label noise). We also added explanations and supplementary figures for label noise.

      Unless I missed something obvious, there is no new experimental data analysis reported in the paper. Thus, line 159 of the discussion, "a phenomenon we also observed in experimental data" should be changed to "a phenomenon that recently reported in experimental data."

      We thank the reviewer for drawing our attention to this. We now analyzed data from three other labs, two of which are novel analyses on existing data. All four datasets show the same trends of sparseness with increasing spatial information. The new Figure 3 and text now describe this.

      On line 179 of the Discussion, "a family of network configurations that have identical performance..." could be softened to "nearly identical performance." It would be possible for networks to have minuscule differences in performance that are not detected due to stochastic batch effects or limits on machine precision.

      The text was changed in accordance with the reviewer’s suggestion.

      Other minor comments:

      Citation 44 is missing the conference venue, please check all citations are formatted properly.

      Corrected.

      In the discussion on line 184, the connection to remapping was confusing to me, particularly because the cited reference (Sanders et al. 2020) is more of a conceptual model than an artificial network model that could be adapted to the setting of noisy learning considered in this paper. How would an RNN model of remapping (e.g. Low et al. 2023; Remapping in a recurrent neural network model of navigation and context inference) be expected to behave during the sparsifying portion of drift?

      We now clarified this section. The conceptual model of Sanders et al includes a specific prediction (Figure 7 there) which is very similar to ours - a systematic change in robustness depending on duration of training. Regarding the Low et al model, using such mechanistic models is an exciting avenue for future research.

      Reviewer #2 (Recommendations For The Authors):

      I only have two major questions.

      (1) Learning multiple representations: Memory systems in the brain typically must store many distinct memories. Certainly, the hippocampus, where RD is prominent, is involved in the ongoing storage of episodic memories. But even in the idealized case of just two spatial memories, for example, two distinct linear tracks, how would this learning process look? Would there be any interference between the two learning processes or would they be largely independent? Is the separation of time scales robust to the number of representations stored? I understand that to answer this question fully probably requires a research effort that goes well beyond the current study, but perhaps an example could be shown with two environments. At the very least the authors could express their thoughts on the matter.

      There are two facets to the topic of multiple environments. First, are the results of the current paper relevant when there are multiple environments? Second, what is the interaction between brain mechanisms of dealing with multiple environments and the results of the current paper?

      We believe the answer to the first question is positive. The near-orthogonality of representations between environments implies that changes in one can happen without changes in the other. This is evident, for instance, in Khatib et al and Geva et al - in both cases, drift seems to happen independently in two environments, even though they are visited intermittently and are visually similar.

      The second question is a fascinating one, and we are planning to pursue it in future work. While the exact way in which the brain achieves this near-independence is an open question, remapping is one possible window into this process.

      We extended the discussion to make these points clear.

      (2) Directed drift versus stationarity: I could not help but notice that the RD illustrated in Fig.2D is not stationary in nature, i.e. the upper right and lower left panels are quite different. This appears to contrast with findings in the hippocampus, for example, Fig.3e-g in (Ziv et al, 2013). Perhaps it is obvious that a directed process will not be stationary, but the authors note that there is a third phase of steady-state null drift. Is the RD seen there stationary? Basically, I wonder if the process the authors are studying is relevant only as a novel environment becomes familiar, or if it is also applicable to RD in an already familiar environment. Please discuss the issue of stationarity in this context.

      The non-stationarity noted by the reviewer is indeed a major feature of our observations, and is indeed linked to familiarity. We divide learning into three phases (now more clearly stated in Table 1 and Figure 4C). The first, rapid, phase consists of improvement of performance - corresponding to initial familiarity with the environment. The third phase, often reported in the literature of representational drift, is indeed stationary and obtained after prolonged familiarity. Our work focuses on the second phase, which is not as immediate as the first one, and can take several days. We note in the discussion that experiments which include a long familiarization process can miss this phase (see also Table 3). Furthermore, we speculate that real life is less stationary than a lab environment, and this second phase might actually be more relevant there.

      Reviewer #3 (Recommendations For The Authors):

      Most of my general recommendations are outlined in the public review. A large portion of my comments regards increasing clarity and explicitly defining many of the terms used which may require generating more figures (to better illustrate the generality of findings) or modifying existing figures (e.g., to show how/where the three stages of learning map onto the authors' data).

      Sparsification is not clearly defined in the main text. As I read it, sparsification is meant to refer to the activity of neurons, but this needs to be clearly defined. For example, lines 262-263 in the methods define "sparseness" by the number of active units, but lines 116-117 state: "For label noise, the dynamics were qualitatively different, the fraction of active units did not reduce, but the activity of the units did sparsify." If the fraction of active units (defined as "sparseness") did not change, what does it mean that the activity of the units "sparsified"? If the authors mean that the spatial activity patterns of hidden units became more sharply tuned, this should be clearly stated.

      We now defined precisely the two measures we use - Active Fraction, and Fraction Active Units. There is a new section with an accompanying figure in the Methods section. As Figure S2 shows, the noise statistics (label noise vs. update noise) differentially affects these two measures.

      Likewise, it is unclear which of the features the authors outlined - spatial information, active proportion of units, and spatial correlation - are meant to represent drift. The authors should clearly delineate which of these three metrics they mean to delineate drift in the main text rather than leave it to the reader to infer. While all three are mentioned early on in the text (Figure 2), the authors focus more on sparseness in the last half of the text, making it unclear if it is just sparseness that the authors mean to represent drift or the other metrics as well.

      The main focus of our paper is on the non-stationarity of drift. Namely that features (such as these three) systematically change in a directed manner as part of the drift process. This is in The new analyses of experimental data show sparseness and spatial information.

      The focus on sparseness in the second half of the paper is because we move to more abstract These are also easy to study in the more abstract tasks in the second part of the paper. In our original submission the transition between a predictive coding task to more abstract tasks was not clearly explained, creating some confusion regarding the measured metrics. We now clarified the motivation for this transition.

      It is not clear if a change in the number of active units alone constitutes "drift", especially since Geva et al. (2023) recently showed that both changes in firing rate AND place field location drive drift, and that the passage of time drives changes in activity rate (or # cells active).

      Our work did not deal with purely time-dependent drift, but rather focused on experience-dependence. Furthermore, Geva et al study the stationary phase of drift, where we do not expect a systematic change in the total number of cells active. They report changes in the average firing rate of active cells in this phase, as a function of time - which does not contradict our findings.

      "hidden layer", "units", and "neurons" seem to be used interchangeably in the text (e.g., line 81-85). However, this is confusing in several places, in particular in lines 83-85 where "neurons" is used twice. The first usage appears to refer to the rate maps of the hidden layer units simulated by the authors, while the second "neurons" appears to refer to real data from Ziv 2013 (ref 5). The authors should make it explicit whether they are referring to hidden layer units or actual neurons to avoid reader confusion.

      We changed the usage of these terms to be more coherent. Specifically, “units” refer to artificial networks and “neurons” to biological ones.

      The authors should clearly illustrate which parts of their findings support their three-phase learning theory. For example, does 2E illustrate these phases, with the first tenth of training time points illustrating the early phase, time 0.1-0.4 illustrating the intermediate phase, and 0.4-1 illustrating the last phase? Additionally, they should clarify whether the second and third stages are meant to represent drift, or is it only the second stage of directed manifold exploration that is considered to represent drift? This is unclear from the main text.

      The different processes (reduction in loss, reduction in Hessian) happen in parallel with different timescales. Thus, there are no sharp transitions between the phases. This is now explained in the text in relation to figure 4C, where the approximate boundaries are depicted.

      The term drift is often used to denote a change in representation without a change in behavior. In this sense, both the second and third phases correspond to drift. Only the third stage is stationary. This is now emphasized in the text and in the new Table 1. Regarding experimental data, apart from the new figure 3 with four datasets, we also summarize in Table 3 the relation between duration of familiarity and stationarity of the data.

      Line 45 - It appears that the acronym ML is not defined above here anywhere.

      Added.

      Line 71: the ReLU function should be defined in the text, e.g., sigma(x) = x if x > 0 else 0.

      Added.

      106-107: Figures (or supplemental figures) to demonstrate how most parameters do not influence sparsification dynamics are warranted. As written, it is unclear what "most parameters" mean - all but noise scale. What about the learning rule? Are there any interactions between parameters?

      We now removed the label noise from Figure 4, and added two supplementary figures to clearly explain the effect of parameters. Figure 4 itself was also redone to clarify this issue.

      2F middle: should "change" be omitted for SI?

      The panel was replaced by a new one in Figure 3.

      116-119: A figure showing how results differ for label noise is warranted.

      This is now done in Figure S1, S2.

      124: typo, The -> the

      Corrected.

      127-129: This conclusion statement is the first place in the text where the three stages are explicitly outlined. There does not appear to be any support or further explanation of these stages in the text above.

      We now explain this earlier at the end of the Introduction section, along with the new Table 1 and marking on Figure 4C.

      132-133 seems to be more of a statement and less of a prediction or conclusion - do the authors mean "the flatness of the loss landscape in the vicinity of the solution predicts the rate of sparsification?"

      We thank the reviewer for this observation. The sentence was rephrased:

      Old: As illustrated in Fig. 1, different solutions in the zero-loss manifold might vary in some of their properties. The specific property suggested from theory is the flatness of the loss landscape in the vicinity of the solution.

      New: As illustrated in Fig. 1, solutions in the zero-loss manifold have identical loss, but might vary in some of their properties. The authors of [26] suggest that noisy learning will slowly increase the flatness of the loss landscape in the vicinity of the solution.

      135: typo, it's -> its

      Corrected.

      Line 135-136 "Crucially, the loss on the 136 entire manifold is exactly zero..." This appears to contradict the Figure 4A legend - the loss appears to be very high near the top and bottom edges of the manifold in 4A. Do the authors mean that the loss along the horizontal axis of the manifold is zero?

      The reviewer is correct. The manifold mentioned in the sentence is indeed the horizontal axis. We changed the text and the figure to make it clearer.

      Equation 6: This does not appear to agree with equation 2 - should there be an E_t term for an expectation function?

      Corrected.

      Line 262-263: "Sparseness means that a unit has become inactive for all inputs." This should also be stated explicitly as the definition of sparseness/sparsification in the main text.

      We now define precisely the two measures we use - Active Fraction, and Fraction Active Units. There is a new section with an accompanying figure in the Methods section. As Figure S2 shows, the noise statistics (label noise vs. update noise) differentially affects these two measures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      General comments

      All three experts have raised excellent ideas and made important suggestions to extend the scope of our study and provide additional information. While we fully acknowledge that these points are valid and would provide exciting new knowledge, we also should not lose track of the fact that a single study cannot cover all bases. Sulfated steroids, for example, are clearly essential components of mouse urine. Unfortunately, however, all chemical analysis approaches are limited and the one we opted for is not suitable for analysis of such signaling molecules. Future studies should certainly focus on these aspects. The same holds true for the fact that we do not know which of the identified compounds are actually VSN ligands. These are inherent limitations of the approach, and we are not claiming otherwise.

      Reviewer #1 (Public Review):

      (1) In this manuscript, Nagel et al. sought to comprehensively characterize the composition of urinary compounds, some of which are putative chemosignals. They used urines from adult males and females in three different strains, including one wild-derived strain. By performing mass spectrometry of two classes of compounds: volatile organic compounds and proteins, they found that urines from inbred strains are qualitatively similar to those of a wild strain. This finding is significant because there is a high degree of genetic diversity in wild mice, with chemosensory receptor genes harboring many polymorphisms.

      We agree and thank the Reviewer for his / her positive assessment.

      (2) In the second part of this work, the authors used calcium imaging to monitor the pattern of vomeronasal neuron responses to these urines. By performing pairwise comparisons, the authors found a large degree of strain-specific response and a relatively minor response to sex-specific urinary stimuli. This is a finding generally in agreement with previous calcium imaging work by Ron Yu and colleagues in 2008. The authors extend the previous work by using urines from wild mice. They further report that the concentration diversity of urinary compounds in different urine batches is largely uncorrelated with the activity profiles of these urines. In addition, the authors found that the patterns of vomeronasal neuron response to urinary cues are not identical when measured using different recipient strains. This fascinating finding, however, requires an additional control to exclude the possibility that this is not due to sampling error.

      We thank Reviewer 1 for pointing this out. We agree that this is truly a “fascinating finding.” Reviewer 1 emphasizes that we need to add an “additional control to exclude […] that this is not due to sampling error”, and he / she elaborates on the required control in his / her Recommendations For The Authors (see below). Reviewer 1 states that “for Fig. 5, in order to conclude that the same urine activates a different population of VSNs in two different strains, a critical control is needed to demonstrate that this is not due to the sampling variability - as compositions of V1Rs and V2Rs could vary between different slices, one preferred control is to use VNO slices from the same strain and compare the selectivity used here across the A-P axis.” Importantly, we believe that this is already controlled for. In fact, for each experiment, we routinely prepare VNO slices along the organ’s entire anterior-to-posterior axis (not including the most anterior tip, where the VNO lumen tapers into the vomeronasal duct, and the most posterior part, the lumen ‘‘twists’’ toward the ventral aspect and its volume decreases (see Figs. 7 & S7 in Hamacher et al., 2024, Current Biology)). This usually yields ~7 slices per individual experiment / session. Therefore, we routinely sample and average across the entire VNO anterior-to-posterior axis for each experiment. In Fig. 5, in which we analyzed whether the “same urine activates a different population of VSNs in two different strains”, individual independent experiments from each strain (C57BL/6 versus BALB/c) amounted to (a) n = 6 versus n = 8; (b) n = 10 versus n = 10; (c) n = 7 versus n = 9; (d) n = 9 versus n = 10; (e) n = 10 versus n = 9; and (f) n = 12 versus n = 10. Together, we conclude that it is very unlikely that the considerably different response profiles measured in different recipient strains result from a “sampling error.”

      To clarify this point in the revised manuscript, we now explain our sampling routine in more detail in the Materials and Methods. Moreover, we now also refer to this point in the Results.

      (3) There are several weaknesses in this manuscript, including the lack of analysis of the compositions of sulfated steroids and other steroids, which have been proposed to be the major constituents of vomeronasal ligands in urines and the indirect (correlational) nature of their mass spectrometry data and activity data.

      Reviewer 1 is correct to point out that our chemical profiling approach omits (sulfated) steroids. We are aware of this weakness. We deliberately decided to omit steroids as well as other nonvolatile small organic molecules for three main reasons: (i) as the reviewer points out, (sulfated) steroid composition has been the focus of analysis in several previous studies and there is ample published information available on their role as VSN stimuli; (ii) the analytical tools available to us do not allow comprehensive profiling of non-volatile small organic molecules; employing two-dimensional head-space GC-MS as well as LC-MS/MS is not suitable for steroid detection; and (iii) the relatively small sample volumes forced us to prioritize and focus on specific chemical classes (in our case, VOCs and proteins). We made an effort to use of the exact same stimuli as previously employed to investigate sensory representations in the accessory olfactory bulb (AOB) (Bansal et al., 2021), a feature that we consider a strength of the current study. However, this entailed that we had to effectively split our samples, further reducing the available sample volume.

      We acknowledge that we did not sufficiently describe our rationale for focusing on VOCs and proteins on the previous version of the manuscript (nor did we discuss the known role of (sulfated) steroids in VSN signaling in adequate detail). We have now made an effort to address these shortcomings in the revised manuscript. Specifically, we have added new text to the Introduction (“Prominent molecularly identified VSN stimuli include various sulfated steroids (Celsi et al., 2012; Fu et al., 2015; Haga-Yamanaka et al., 2015, 2014; Isogai et al., 2011; Nodari et al., 2008; Turaga and Holy, 2012), which could reflect the dynamic endocrine state of an individual.”) and the Discussion (“Notably, our chemical profiling approach omits (sulfated) steroids other non-volatile small organic molecules, which have previously been identified in mouse urine as VSN stimuli (Nodari et al., 2008). Caution should thus be exerted to not attempt to fully explain VSN response specificity based on VOC and protein content alone.” & “In line with the notion of highly selective vomeronasal sampling is our observation that the concentration differences between compounds shared among strains, which are often substantial, are not reflected by similarly pronounced differences in response strength among generalist VSNs. There are several, not necessarily mutually exclusive explanations for this finding: First, concentration could simply not be a read-out parameter for VSNs, which would support previous ideas of concentration-invariant VSN activity (Leinders-Zufall et al., 2000). Second, the concentrations in freshly released urine could just exceed the dynamic tuning range of VSNs since, particularly for VOCs, natural signals (e.g., in scent marks) must be accessible to a recipient for a prolonged amount of time (sometimes days). A similar rationale could explain the increased protein concentrations in male urine, since male mice use scent marking to establish and maintain their territories and urinary lipocalins serve as long-lasting reservoirs of VOCs (Hurst et al., 1998). Third, generalist VSNs might sample information only from a select subset of urinary compounds, which, given their role as biologically relevant chemosignals, might be released at tightly controlled (and thus similar) concentrations. In fact, in the most extreme scenario, several compounds that do display substantial strain- and/or sex-specific differences in concentration might not act as chemosignals at all. Forth, to some extent, different response profiles could be attributed to non-volatile small organic molecules such as steroids (Nodari et al., 2008), which were beyond the focus of our chemical analysis.”).

      (4) Overall, the major contribution of this work is the identification of specific molecules in mouse urines. This work is likely to be of significant interest to researchers in chemosensory signaling in mammals and provides a systematic avenue to exhaustively identify vomeronasal ligands in the future.

      We thank the Reviewer for his / her generally positive assessment.

      Reviewer #2 (Public Review):

      (1) This manuscript by Nagel et al provides a comprehensive examination of the chemical composition of mouse urine (an important source of semiochemicals) across strain and sex, and correlates these differences with functional responses of vomeronasal sensory neurons (an important sensory population for detecting chemical social cues). The strength of the work lies in the careful and comprehensive imaging and chemical analyses, the rigor of quantification of functional responses, and the insight into the relevance of olfactory work on lab-derived vs wild-derived mice.

      We thank the Reviewer for his / her generally positive assessment.

      (2) With regards to the chemical analysis, the reader should keep in mind that a difference in the concentration of a chemical across strain or sex does not necessarily mean that that chemical is used for chemical communication. In the most extreme case, the animals may be completely insensitive to the chemical. Thus, the fact that the repertoire of proteins and volatiles could potentially allow sex and/or strain discrimination, it is unclear to what degree both are used in different situations.

      Reviewer 2 is correct to point out that sex- and/or strain-dependent differences in urine molecular composition do not automatically attribute a signaling function to those molecules. We concur and, in fact, stress this point many times throughout the manuscript. In the Results, for example, we point out (i) that “in female urine, BALB/c-specific proteins are substantially underrepresented, a fact not reflected by VSN response profiles”, (ii) that “as observed in C57BL/6 neurons, the skewed distributions of protein concentration indices were not reflected by BALB/c generalist VSN profiles”, and (iii) that “VSN population response profiles do not reflect the global molecular content of urine, suggesting that the VNO functions as a rather selective molecular detector.” Moreover, in the Discussion, we state (i) that “caution should thus be exerted to not attempt to fully explain VSN response specificity based on VOC and protein content alone”; (ii) that, for several sex- and/or strain-specific molecules, none “has previously been attributed a chemosensory function. Challenging the mouse VNO with purified recombinant protein(s) will help elucidate whether such functions exist”; (iii) that “generalist VSNs might sample information only from a select subset of urinary compounds, which, given their role as biologically relevant chemosignals, might be released at tightly controlled (and thus similar) concentrations”; and (iv) that “to some extent, different response profiles could be attributed to non-volatile small organic molecules such as steroids (Nodari et al., 2008), which were beyond the focus of our chemical analysis.”

      In the revised manuscript, we now aim to even more strongly emphasize the point made by Reviewer 2. In the Discussion, we have deleted a sentence that read: “Sex- and strain-specific chemical profiles give rise to unique VSN activity patterns.” Moreover, we have added the following statement: “In fact, in the most extreme scenario, several compounds that do display substantial strain- and/or sex-specific differences in concentration might not act as chemosignals at all.”

      Reviewer #3 (Public Review):

      (1) One of the primary objectives in this study is to ascertain the extent to which the response profiles of VSNs are specific to sex and strain. The design of these Ca2+ imaging experiments uses a simple stimulus design, using two interleaved bouts of stimulation with pairs of urine (e.g. male versus female C57BL/6, male C57BL/6 versus male BALB/c) at a single dilution factor (1:100). This introduces two significant limitations: (1) the "generalist" versus "specialist" descriptors pertain only to the specific pairwise comparisons made and (2) there is no information about the sensitivity/concentration-dependence of the responses.

      Reviewer 3 points to two limitations of our VSN activity assay. He / she is correct to mention that characterizing a VSN as generalist or specialist based on a “pairwise comparison” should not be the basis of attributing such a “generalist” or “specialist” label in general (i.e., regarding the global stimulus space). We acknowledge this point, but we do not regard this as a limitation of our study since we are not investigating rather broad (i.e., multidimensional) questions of selectivity. All we are asking in the context of this study is whether VSNs - when being challenged with pairs of sex- or strain-specific urine samples - act as rather selective semiochemical detectors. Of course, one can always think of a study design that provides more information. However, we here opted for an assay that - in our hands - is robust, “low noise” (i.e., displays low intrinsic signal variability as evident form reliability index calculations), ensures recovery from VSN adaptation (Wong et al., 2018), and, importantly, answers the specific question we are asking.

      Regarding the second point (“there is no information about the sensitivity/concentrationdependence of the responses”), we would like to emphasize that this was not a focus of our study either. In fact, concentration-dependence of VSN activity has been a major focus of several previous studies referenced in our manuscript (e.g., Leinders-Zufall et al., 2000; He et al., 2008), albeit with contradictory results. In our study, we ask whether a pair of stimuli that we have shown to display, in part, strikingly different chemical composition (both absolute and relative) preferentially activates the same or different VSNs. With this question in mind, we believe that our assay (and its results) are highly informative.

      (2) The functional measurements of VSN tuning to various pairs of urine stimuli are consistently presented alongside mass spectrometry-based comparisons. Although it is clear from the manuscript text that the mass spectrometry-based analysis was separated from the VSN tuning experiments/analysis, the juxtaposition of VSN tuning measurements with independent molecular diversity measurements gives the appearance to readers that these experiments were integrated (i.e., that the diversity of ligands was underlying the diversity of physiological responses). This is a hypothesis raised by the parallel studies, not a supported conclusion of the work. This data presentation style risks confusing readers.

      As Reviewer 3 points out correctly “it is clear from the manuscript text that the mass spectrometry-based analysis was separated from the VSN tuning experiments/analysis.” In the figures, we try make the distinction between VSN response statistics and chemical profiling more obvious by gray shadows that link the plots depicting VSN response characteristics to the general pie charts.

      We now also made an extra effort to avoid “confusing readers” by stating in the Discussion (i) that “caution should thus be exerted to not attempt to fully explain VSN response specificity based on VOC and protein content alone”; (ii) that, for several sex- and/or strain-specific molecules, none “has previously been attributed a chemosensory function. Challenging the mouse VNO with purified recombinant protein(s) will help elucidate whether such functions exist”; (iii) that “generalist VSNs might sample information only from a select subset of urinary compounds, which, given their role as biologically relevant chemosignals, might be released at tightly controlled (and thus similar) concentrations”; and (iv) that “to some extent, different response profiles could be attributed to non-volatile small organic molecules such as steroids (Nodari et al., 2008), which were beyond the focus of our chemical analysis.” Moreover, we have deleted a sentence that read: “sex- and strain-specific chemical profiles give rise to unique VSN activity patterns”, and we have added the following statement: “In fact, in the most extreme scenario, several compounds that do display substantial strain- and/or sex-specific differences in concentration might not act as chemosignals at all.”

      However, we believe that there is value in presenting “VSN tuning measurements” next to “independent molecular diversity measurements.” While these are independent measurements, their similarity or, quite frequently, lack thereof are informative. We are sure that by taking the above “precautions” we have now mitigated the risk of “confusing readers.”

      (3) The impact of mass spectrometry findings is limited by the fact that none of these molecules (in bulk, fractions, or monomolecular candidate ligands) were tested on VSNs. It is possible that only a very small number of these ligands activate the VNO. The list of variably expressed proteins - especially several proteins that are preferentially found in female urine - is compelling, but, again, there is no evidence presented that indicates whether or not these candidate ligands drive VSN activity. It is noteworthy that the largest class of known natural ligands for VSNs are small nonvolatiles that are found at high levels in mouse urine. These molecules were almost certainly involved in driving VSN activity in the physiology assays (both "generalist" and "specialist"), but they are absent from the molecular analysis.

      Reviewer 3 is right, of course, that at this point we have not tested the identified molecules on VSNs. This is clearly beyond the scope of the present study. We believe that the data we present will be the basis of (several full-length) future studies that aim to identify specific ligands and - best case scenario - receptor-ligand pairs. We find it hard to concur that our study, which provides the necessary basis for those future endeavors, is regarded as “incomplete”. By design, all studies are somewhat incomplete, i.e., there are always remaining questions and we are not contesting that.

      It is true, of course, that a class of “known natural ligands for VSNs are small nonvolatiles.” As we replied above, our chemical profiling approach omits (sulfated) steroids. We are aware of this weakness. We deliberately decided to omit steroids as well as other non-volatile small organic molecules for three main reasons: (i) steroid composition has been the focus of analysis in several previous studies and there is ample published information available on their role as VSN stimuli; (ii) the analytical tools available to us do not allow comprehensive profiling of non-volatile small organic molecules; employing two-dimensional head-space GC-MS as well as LC-MS/MS is not suitable for steroid detection; and (iii) the relatively small sample volumes forced us to prioritize and focus on specific chemical classes (in our case, VOCs and proteins). We made an effort to use of the exact same stimuli as previously employed to investigate sensory representations in the accessory olfactory bulb (AOB) (Bansal et al., 2021), a fact that we consider a key strength of our current study. However, this entailed that we had to effectively split our samples, further reducing the available sample volume.

      We acknowledge that we did not sufficiently describe our rationale for focusing on VOCs and proteins on the previous version of the manuscript (nor did we discuss the known role of (sulfated) steroids in VSN signaling in adequate detail). We have now made an effort to address these shortcomings in the revised manuscript. Specifically, we have added new text to the Introduction (“Prominent molecularly identified VSN stimuli include various sulfated steroids (Celsi et al., 2012; Fu et al., 2015; Haga-Yamanaka et al., 2015, 2014; Isogai et al., 2011; Nodari et al., 2008; Turaga and Holy, 2012), which could reflect the dynamic endocrine state of an individual.”) and the Discussion (“Notably, our chemical profiling approach omits (sulfated) steroids other non-volatile small organic molecules, which have previously been identified in mouse urine as VSN stimuli (Nodari et al., 2008). Caution should thus be exerted to not attempt to fully explain VSN response specificity based on VOC and protein content alone.” & “In line with the notion of highly selective vomeronasal sampling is our observation that the concentration differences between compounds shared among strains, which are often substantial, are not reflected by similarly pronounced differences in response strength among generalist VSNs. There are several, not necessarily mutually exclusive explanations for this finding: First, concentration could simply not be a read-out parameter for VSNs, which would support previous ideas of concentration-invariant VSN activity (Leinders-Zufall et al., 2000). Second, the concentrations in freshly released urine could just exceed the dynamic tuning range of VSNs since, particularly for VOCs, natural signals (e.g., in scent marks) must be accessible to a recipient for a prolonged amount of time (sometimes days). A similar rationale could explain the increased protein concentrations in male urine, since male mice use scent marking to establish and maintain their territories and urinary lipocalins serve as long-lasting reservoirs of VOCs (Hurst et al., 1998). Third, generalist VSNs might sample information only from a select subset of urinary compounds, which, given their role as biologically relevant chemosignals, might be released at tightly controlled (and thus similar) concentrations. In fact, in the most extreme scenario, several compounds that do display substantial strain- and/or sex-specific differences in concentration might not act as chemosignals at all. Forth, to some extent, different response profiles could be attributed to non-volatile small organic molecules such as steroids (Nodari et al., 2008), which were beyond the focus of our chemical analysis.”).

      Reviewer #1 (Recommendations For The Authors):

      (1) I find that the study is highly valuable for researchers in this field. With the finding that wild mouse urines do not elicit significantly more variable responses from urines from inbred strains, researchers can now be reassured to use inbred strains to gain general insights on pheromone signaling.

      A major omission of this study is non-volatile small organic molecules such as steroids. These compounds are the only molecular class in urine that have been identified to stimulate specific vomeronasal receptors to date. It is unclear to me that the specificity of VOC and proteins can alone fully explain the response specificity of the VSNs that have been monitored in this study. The discussion of this topic is highly beneficial for the readers.

      Reviewer 1 is correct to point out that our chemical profiling approach omits (sulfated) steroids. We are aware of this weakness. We deliberately decided to omit steroids as well as other nonvolatile small organic molecules for three main reasons: (i) as the reviewer points out, (sulfated) steroid composition has been the focus of analysis in several previous studies and there is ample published information available on their role as VSN stimuli; (ii) the analytical tools available to us do not allow comprehensive profiling of non-volatile small organic molecules; employing two-dimensional head-space GC-MS as well as LC-MS/MS is not suitable for steroid detection; and (iii) the relatively small sample volumes forced us to prioritize and focus on specific chemical classes (in our case, VOCs and proteins). We made an effort to use of the exact same stimuli as previously employed to investigate sensory representations in the accessory olfactory bulb (AOB) (Bansal et al., 2021), a fact that we consider a key strength of our current study. However, this entailed that we had to effectively split our samples, further reducing the available sample volume.

      We acknowledge that we did not sufficiently describe our rationale for focusing on VOCs and proteins on the previous version of the manuscript (nor did we discuss the known role of (sulfated) steroids in VSN signaling in adequate detail). We have now made an effort to address these shortcomings in the revised manuscript. Specifically, we have added new text to the Introduction (“Prominent molecularly identified VSN stimuli include various sulfated steroids (Celsi et al., 2012; Fu et al., 2015; Haga-Yamanaka et al., 2015, 2014; Isogai et al., 2011; Nodari et al., 2008; Turaga and Holy, 2012), which could reflect the dynamic endocrine state of an individual.”) and the Discussion (“Notably, our chemical profiling approach omits (sulfated) steroids other non-volatile small organic molecules, which have previously been identified in mouse urine as VSN stimuli (Nodari et al., 2008). Caution should thus be exerted to not attempt to fully explain VSN response specificity based on VOC and protein content alone.” & “In line with the notion of highly selective vomeronasal sampling is our observation that the concentration differences between compounds shared among strains, which are often substantial, are not reflected by similarly pronounced differences in response strength among generalist VSNs. There are several, not necessarily mutually exclusive explanations for this finding: First, concentration could simply not be a read-out parameter for VSNs, which would support previous ideas of concentration-invariant VSN activity (Leinders-Zufall et al., 2000). Second, the concentrations in freshly released urine could just exceed the dynamic tuning range of VSNs since, particularly for VOCs, natural signals (e.g., in scent marks) must be accessible to a recipient for a prolonged amount of time (sometimes days). A similar rationale could explain the increased protein concentrations in male urine, since male mice use scent marking to establish and maintain their territories and urinary lipocalins serve as long-lasting reservoirs of VOCs (Hurst et al., 1998). Third, generalist VSNs might sample information only from a select subset of urinary compounds, which, given their role as biologically relevant chemosignals, might be released at tightly controlled (and thus similar) concentrations. Forth, to some extent, different response profiles could be attributed to non-volatile small organic molecules such as steroids (Nodari et al., 2008), which were beyond the focus of our chemical analysis.”).

      (2) How many different wild mouse urines were tested in this study? Is this sufficient to capture the diversity of wild M. musculus in local (Prague) habitats?

      We thank the reviewer for pointing this out. For the present study, 20 male (M) and 27 female (F) wild mice were caught at six different sites in the broader Prague area (i.e., Bohnice (50.13415N, 14.41421E; 2M+4F), Dolni Brezany (49.96321N, 14.4585E; 3M+4F), Hodkovice (49.97227N, 14.48039E; 5M+6F), Písnice (49.98988N, 14.46625E; 3M+6F), Lhota (49.95369N, 14.43087E; 1M+2F), and Zalepy (49.9532N, 14.40829E; 6M+5F). 18 of the 27 wild females were caught pregnant. The remaining 9 females were mated with males caught at the same site and produced offspring within a month. When selecting 10 male and 10 female individuals from first-generation offspring for urine collection, we ensured that all six capture sites were represented and that age-matched animals displayed similar weight (~17g). We believe that this capture / breeding strategy sufficiently represents “the diversity of wild M. musculus in local (Prague) habitats.” In the revised manuscript, we have now included these details in the Materials and Methods.

      (3) I found Figure 1e and figures in a similar format confusing - one panel describes the response statistics of VSNs, and other panels show the number of compounds found in different MS profiling, which is not immediately obvious from the figures. Is the y-axis legend correct (%)?

      We now try make the distinction between VSN “response statistics” and chemical profiling more obvious by gray shadows that link the plots depicting VSN response characteristics to the general pie charts. Moreover, we thank the Reviewer for pointing out the mislabeling of the y-axis. Accordingly, we have deleted “%” in all corresponding figures.

      (4) For Figure 5, in order to conclude that the same urine activates a different population of VSNs in two different strains, a critical control is needed to demonstrate that this is not due to the sampling variability - as compositions of V1Rs and V2Rs could vary between different slices, one preferred control is to use VNO slices from the same strain and compare the selectivity used here across the A-P axis.

      We thank Reviewer 1 for pointing this out. Importantly, we believe that this is already controlled for (see our response to the Public Review). In fact, for each experiment, we routinely prepare VNO slices along the entire anterior-to-posterior axis (not including the most anterior tip, where the VNO lumen tapers into the vomeronasal duct, and the most posterior part, the lumen ‘‘twists’’ toward the ventral aspect and its volume decreases (see Figs. 7 & S7 in Hamacher et al., 2024, Current Biology)). This usually yields ~7 slices per individual experiment / session. Therefore, we routinely sample and average across the entire VNO anterior-to-posterior axis for each experiment. In Fig. 5, individual independent experiments from each strain (C57BL/6 versus BALB/c) amounted to (a) n = 6 versus n = 8; (b) n = 10 versus n = 10; (c) n = 7 versus n = 9; (d) n = 9 versus n = 10; (e) n = 10 versus n = 9; and (f) n = 12 versus n = 10. Together, we can thus exclude that the considerably different response profiles that we measured using different recipient strains result from a “sampling error.”

      To clarify this point in the revised manuscript, we now explain our sampling routine in more detail in the Materials and Methods. Moreover, we now also mention this point in the Results.

      Reviewer #2 (Recommendations For The Authors):

      (1) Pg 5 Lines 3-16: This summary paragraph contains too much detail given that the reader has not read the paper yet, which makes it bewildering. This should be condensed.

      We agree and have substantially condensed this paragraph.

      (2) Pg 6 Line 5-8: This summary of the experimental design is obtuse and should be edited for clarity.

      We have edited the relevant passage for clarity.

      (3) Pg 6 Line 11: "VSNs were categorized..." Specialist vs generalist is defined as responding to one or both stimuli. This definition is placed right after saying that the cells were also tested with KCl. The reader might think that specialist vs generalist was defined in relation to KCl.

      We have edited this sentence, which now reads: “Dependent on their individual urine response profiles, VSNs were categorized as either specialists (selective response to one stimulus) or generalists (responsive to both stimuli).”

      (4) Pg 6 Line 13: "we recorded urine-dependent Ca2+ signals from a total of 16,715 VSNs". Is a "signal" a response? Did all 16,715 VSNs respond to urine? What was the total of KCl responsive cells recorded?

      We edited the corresponding passage for clarification. The text now reads: “Overall, we recorded >43,000 K+-sensitive neurons, of which a total of 16,715 VSNs (38.4%) responded to urine stimulation. Of these urine-sensitive neurons, 61.4% displayed generalist profiles, whereas 38.6% were categorized as specialists (Figure 1c,d).”

      (5) Pg 7 Line 6: The repeated use of the word "pooled" is confusing as it suggests a variation in the experiment. The authors should establish once in the Methods and maybe in the Results that stimuli were pooled across animals. Then they should just refer to the stimulus as male or female or BALB/c rather than "pooled" male etc.

      We acknowledge the reviewer’s argument. Accordingly, we now introduce the experimental use of pooled urine once in the Methods and in the introductory paragraph of the Results. All other references to “pooled” urine in the Results and Captions have been deleted.

      (6) Pg 7 Line 10: "...detected in >=3 out of 10 male..." For the chemical analysis, were these samples not pooled?

      Correct. We deliberately did not pool samples for chemical analysis, but instead analyzed all individual samples separately (i.e., 60 samples were subjected to both proteomic and metabolomic analyses). Thus, the criterion that a VOC or protein must be detected in at least 3 of the 10 individual samples from a given sex/strain combination for a ‘present’ call (and in at least 6 of the 10 samples to be called ‘enriched’) ensures that the molecular signatures we identify are not “contaminated” by unusual aberrations within single samples.<br /> For clarification, we now explicitly outline this procedure in the Methods (Experimental Design and Statistical Analysis – Proteomics and metabolomics).

      (7) Pg 7 Line 23: In line 7, the specialist rate was defined as 5% in reference to the total KCl responsive cells. Here the specialist rate is defined from responsive cells. This is confusing.

      We apologize for the confusion. In both cases, the numbers (%) refer to all K+-sensitive neurons. We have added this information to both relevant sentences (l. 7 as well as ll. 23-24). Note that the rate in ll. 23-24 refers to generalists.

      (8) Pg 7 Line 25: Concentration index should be defined before its use here.

      We have revised the corresponding sentence, which now reads: “By contrast, analogously calculated concentration indices (see Materials and Methods) that can reflect potential disparities are distributed more broadly and non-normally (Figure 1h).”

      (9) Pg 7 Line 29: change "trivially" to "simply".

      Done

      (10) Pg 7 Line 30: What is meant by a "generalist" ligand? The neurons are generalists. Probably should read "common ligands"

      We have changed the text accordingly.

      (11) Pg 7 Line 31: What is meant by "global observed concentration disparities" ?

      We have changed the text to “…represented by the observed general concentration disparities.”

      (12) Pg 8 Lines 7-11: This section needs to be edited for clarity as it is very difficult to follow. For example, the definition of "enriched" is buried in a parenthetical. Also, it is very difficult to figure out what a "sample" is in this paper. Is it a pooled stimulus, or is it urine from an individual animal?

      We apologize for the confusion. Throughout the paper a “sample” is a pooled stimulus (from all 10 individuals of a given sex/strain combination) for all physiological experiments. For chemical analysis a “sample” refers to urine from an individual animal.

      (13)Pg 8 Line 11: "abundant proteins" Does this mean absolute concentration or enriched in one sample vs another?

      We changed the term “abundant” to “enriched” as this descriptor has been defined (present in ≥6 of 10 individual samples) in the previous sentence.

      (14) Pg 8 Line 18: "While 32.9% of all..." Please edit for clarity. What is the point?

      The main point here is that, for VOCs, the vast majority of compounds (91.3%) are either generic mouse urinary molecules or are sex/strain-specific.

      (15) Pg 10 Line 18: "Increased VSN selectivity..." This title is misleading as it suggests a change in sensitivity with animal exposure. I think the authors are trying to say "VSNs are more selective for strain than for sex". The authors should avoid the term "exposure to" when they mean "stimulation with" as the former suggests chronic exposure prior to testing.

      We thank the reviewer for the advice and have changed the title accordingly. We also edited the text to avoid the term "exposure to" throughout the manuscript.

      (16) Pg 12 Line 10: "we recorded hardly any..." Hardly any in comparison to what? BALB/c?

      We apologize for the confusion. We have edited the text for clarity, which now reads: “In fact, (i) compared to an average specialist rate of 11.2% ± 6.6% (mean ± SD) calculated over all 13 binary stimulus pairs (n = 26 specialist types), we observed only few specialist responses upon stimulation with urine from wild females (2% and 3%, respectively), and…”

      Reviewer #3 (Recommendations For The Authors):

      (1) Related to the pairwise stimulus-response experimental design and analysis: there is precedent in the field for studies that explore the same topic (sex- and strain-selectivity), but measure VSN sensitivity across many urine stimuli, not just two at a time. This has been done both in the VNO (He et al, Science, 2008; Fu, et al, Cell, 2015) and in the AOB (Tolokh, et al, Journal of Neuroscience, 2013). The current manuscript does not cite these studies.

      Reviewer 3 is correct and we apologize for this oversight. We now cite the two VSN-related studies by He et al. and Fu et al. in the Introduction.

      (2) The findings of the mass spectrometry-based profiling of mouse urine - especially for volatiles - is only accessible through repositories, making it difficult to for readers to understand which molecules were found to be highly divergent between sexes/strains. There is value in the list of ligands to further investigate, but this information should be made more accessible to readers without having to comb through the repositories.

      We agree that there “is value in the list of ligands to further investigate” and, accordingly, we now provide a table (Table 1) that lists the top-5 VOCs that – according to sPLS-DA – display the most discriminative power to classify samples by sex (related to Figure 2c) or strain (related to Figure 2d). For ease of identification, all entries list internal mass spectrometry identifiers, identifiers extracted from MS analysis database, the sex or strain that drives separation, which two-dimensional component / x-variate represents the most discriminative variable, PubChem chemical formula, PubChem common or alternative names, Chemical Entities of Biological Interest or PubChem Compound Identification, and the VOC’s putative origin.

      (3) There is a long precedent for integrating molecular assessments and physiological recordings to identify specific ligands for the vomeronasal system: - nonvolatiles (e.g., Leinders-Zufall, et al., Nature, 2000)

      • peptides (e.g., Kimoto et al., Nature, 2005; Leinders-Zufall et al. Science, 2004; Riviere et al., Nature, 2009; Liberles, et al., PNAS, 2009)
      • proteins (e.g., Chamero et al., Nature, 2007; Roberts et al., BMC Biology, 2010)

      • excreted steroids and bile acids (Nodari et al., Journal of Neuroscience, 2008; Fu et al., Cell, 2015; Doyle, et al., Nature Communications, 2016)

      The Leinders-Zufall (2000), Roberts, and Nodari papers are referenced, but the broader efforts by the community to find specific drivers of vomeronasal activity are not fully represented in the manuscript. The focus of this paper is fully related to this broader effort, and it would be appropriate for this work to be placed in this context in the introduction and discussion.

      We now refer to all of the studies mentioned in the Introduction (except the article published by Liberles et al. in 2009, since the authors of that study do not identify vomeronasal ligands).

      (4) Throughout the manuscript (starting in Fig. 1h) the figure panels and captions use the term "response index" whereas the methods define a "preference index." It seems to be the case that these two terms are synonymous. If so, a single term should be consistently used. If not, this needs to be clarified.

      We now consistently use the term “response index” throughout the manuscript.

      (5) It would be useful to provide a table associated with Figure 2 - figure supplement 1 that lists the common names and/or chemical formulas for the volatiles that were found to be of high importance.

      We agree and, accordingly, we now provide a table (Table 2) that lists VOC, which – according to Random Forest classification and resulting Gini importance scores – display the most discriminative power to classify samples by sex (related to Figure 2 - figure supplement 1a) or strain (related to Figure 2 - figure supplement 1b). Notably, it is generally reassuring that several VOCs are listed in both Table 1 and Table 2, emphasizing that two different supervised machine learning algorithms (i.e., sPLS-DA (Table 1) and Random Forest (Table 2)) yield largely congruent results.

      (5) The use of the term "comprehensive" for the molecular analysis is a little bit misleading, as volatiles and proteins are just two of the many categories of molecules present in mouse urine.

      We have now deleted most mentions of the term "comprehensive" when referring to the molecular analysis.

      (7) Page 11, lines 24-27: The sentences starting "We conclude..." and ending in "semiochemical concentrations." These two sentences do not make sense. It is not known how many of the identified proteins are actual VSN ligands. Moreover, there is abundant evidence from other studies that individual VSN activity provides information about distinct semiochemical concentrations.

      We have substantially edited and rephrased this paragraph to better reflect that different scenarios / interpretations are possible. The relevant text now reads: “We conclude that VSN population response strength might not be so strongly affected by strain-dependent concentration differences among common urinary proteins. In that case, it would appear somewhat unlikely that individual VSN activity provides fine-tuned information about distinct semiochemical concentrations. Alternatively, as some (or even many) of the identified proteins could not serve as vomeronasal ligands at all, generalist VSNs might sample information from only a subset of compounds which, in fact, are secreted at roughly similar concentrations.”

      (8) The explanation of stimulus timing is mentioned several times but not defined clearly in methods. Page 19, lines 14-19 have information about the stimulus delivery device, but it would be helpful to have stimulus timing explicitly stated.

      In addition to the relevant captions, we now explicitly state stimulus timing (i.e., 10 s stimulations at 180 s inter-stimulus intervals) in the Results.

      (9) Typos: Page 10, line 7: "male biased" → "male-biased" for clarity

      Wilcoxon "signed-rank" test is often misspelled "Wilcoxon singed ranked test" or "Wilcoxon signed ranked test"

      In the Fig. 3 legend, the asterisk meaning is unspecified.

      "(im)balances" → imbalances (page 27, line 24; page 37, line 16; page 38, line 16)

      Figure 2 - figure supplement 1 and in Figure 2 - figure supplement 2, in the box-andwhisker plots the units are not specified in the graph or legend.”

      We have made all required corrections.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #4

      Evidence, reproducibility and clarity

      Summary:

      In this work, Zemlianski and colleagues exploit S. pombe mutations responsible for catastrophic mitoses, in particular those leading to a cut / cut-like phenotypes, whereby cytokinesis takes place without proper DNA segregation, trapping DNA molecules by septum formation in between the two separating cells. The work builds on the team's previous observation that these defects can be alleviated when cells are grown in a nitrogen-rich medium, and motivate their efforts to understand this better. The manuscript is written in a concise, neat and informative manner, and the results are presented clearly, with consistence in the format and the style all along. The analyses appear to have been, in general, conducted under the best standards. The findings are important and the data are of good quality. I have, however, important concerns that will be detailed below, and which, as I hope will be made clear, question the pertinence of including "TOR signaling" in the title, and making a distinction between "good" and "poor" nitrogen sources in the abstract.

      Major comments:

      Results

      The conclusion that the phenotype is suppressed by "good" but not "poor" nitrogen sources is not sufficiently supported. First, this interpretation is based on comparing only two or three sources of each type; Second, the "good" source glutamate needed to be raised for it to have a significant effect; 3) there is a strange datum, as Glu 100 mM in Graph 1D looks exactly the same as Glu 50 mM in Graph 1E, I guess there is a mistake in the plotting; 4) and, more important, the fact that the authors had the nice initiative of reproducing their YES medium experiments for every graph led to the inevitable fact that slightly different values were obtained every time, which is normal. While the values yield very similar data for panels 1B, 1C and even 1D, the frequency of catastrophic mitoses for the cbf11 mutant in YES in panel 1E is much lower than in panel Figure 1B, for example. This has the consequence of making the suppression obtained when adding 'poor' sources, such as proline or uracil, non-significantly different. Thus, the authors conclude that 'poor' nitrogen sources are not good at suppressing the phenotype. I suggest that the authors pool all their YES data (they will have 12 repeats of their experiment) and plot, in a single graph, all the other treatments. By performing the analyses again, using the appropriate statistical test for that, perhaps they will have a surprise. After which, the question is, is it so important to put the emphasis on whether the source is good or poor? The incontestable observation is that, in general, there is clear trend of suppression of the phenotype.

      In Figure 2, images should be shown as an example of what was seen, what was quantified, how the "decrease in nuclear cross-section area" looked like indeed.

      Also, important for Figure 2, the authors used the nuclear cross-section area as a readout for nuclear envelope expansion versus shrinkage. For that, they did not use a fluorescent marker for the nuclear envelope that is continuous, but a nucleoporin (Cut11-GFP). In my experience, nucleoporins being discontinuously distributed throughout the nuclear envelope, the area encompassed by the signal may be underestimated in the event of a strong nuclear envelope deformation, as I have tried to illustrate in the scheme below: I WILL SEND THE SCHEME BY MAIL TO THE EDITOR, AS I CANNOT COPY-PASTE IT IN THE SYSTEM BOX Given that the photos from which the data were retrieved have not been shown, I cannot at present judge whether the use of a nuclear envelope marker providing continuous signals is absolutely necessary or not, and whether this consideration will affect (or not at all) the conclusions.

      The authors do not seem to comment or pay any attention to a very crucial result they obtain: the addition of ammonium to the WT strain has the effect of also restricting the nuclear cross-section area. They indeed say in their text "we did not observe any differences between cultures grown with or without ammonium supplementation (Fig.2)". I guess they refer here to the cbf11 mutant, in which case the sentence is true (although unfair to the WT). But by neglecting that the supplementation with ammonium had the power of reducing the cross-section area of WT nuclei, they are misled (or misleading) in their interpretation. The same, although milder, is true for Figure 5C, where the addition of ammonium to the WT culture does not alter the median value of prophase + metaphase duration, however has the virtue of very much rendering sharp (less scattered) the population of values, suggesting that the accuracy / control of the process is enhanced. What does this mean? I think it should be carefully thought about and considered as a whole.

      In the same line as above, the authors omit the RNA-seq analysis concerning the treatment of the WT with ammonium (Figure 3). This is very important to understand the standpoint of what this treatment elicits. It would also help unravel the observations I mentioned above that the authors did not assess in their descriptions. Also regarding Figure 3, it is completely obscure why the authors decided to show the genes on the right axis, and not others. Knowing how vast the lipid pathways are, there are likely many other hits that could be relevant. A particular thought goes for the proteins in charge of filling lipid droplets, such as sterol- and fatty acid-esterifying enzymes. Unless a very justified reason is provided, the choice at present seems arbitrary and it would be better to show a more unbiased data representation.

      In the same vein, related to the effect of ammonium onto the WT, in Figure S1 (I want to congratulate the authors for showing their 3 experimental replicates), the results very neatly show that ammonium supplementation to the WT leads to a neat and reproducible increase in TAG, a fact on which the authors do not comment. In the mutant, irrespective of ammonium presence or absence, a huge increase in squalene and steryl esters (SE) are seen. I think the work would benefit from actually quantifying the intensity of these bands and thus materializing this in the form of values. TAG, squalene and SE are all neutral lipids, and are all stored within LD to prevent lipotoxicity if accumulated in the endoplasmic reticulum. While ammonium elicits strong TAG accumulation in the WT, this is not the case in the mutant, likely because the massive occupation of LD storage capacity is overwhelmed with squalene and SE. Could this have something to do with the suppression they are studying?

      In the section of results where the authors comment the TLC analysis, they write "suggesting failed coordination between sterol and TAG lipid metabolism pathway". As it stands, the sentence is rather devoid of real meaning and may be even misleading, when considering what I wrote before.

      My biggest concern has to do with the very last part, when they explore the implications of TOR:

      • First, all the data presented in the two concerned panels of Figure 7 (B and C) and of Figure S3 lack the values obtained for the single mutants with which cbf11 was combined. This is not acceptable from a genetic point of view, and may prevent us from having important information. For example: if the authors were right that Tor2/TORC1 is ensuring successful progression through closed mitosis (last sentence of results), then one would predict that the tor2-S allele leads to an increase, already per se, of the frequency of catastrophic mitoses. However, at present, I cannot check that.
      • the authors turn to use a ∆ssp2 mutant to "increase Tor2 activity". However, this is a pleiotropic strategy, as AMP-kinase is the major sensor and responder to energy depletion, frequently triggered by glucose shortage, thus I am not sure the effects associated to its absence can be unequivocally be ascribed to a Tor2 raise.
      • there is a counterintuitive observation: rapamycin, which mimics nitrogen shortage, has the same effect than ammonium supplementation. This is strangely bypassed in the discussion, where the authors wrote "we showed increased mitotic fidelity in cbf11 cells when the stress-response branch of the TOR network was suppressed, either by ablation of Tor1/TORC2 or by boosting the activity of the pro-growth Tor2/TORC1 branch. These data are in agreement with previous findings that Tor2/TORC1 inhibition mimics nitrogen starvation".
      • last, and irrespective of what was said above, the authors conclude that the phenotype suppression is due to "a role for Tor2/TORC1 in ensuring successful progression through mitosis". If, as stated by the authors, Tor1/TORC2 absence not only abrogates Tor1/TORC2 activity, but it simultaneously raises Tor2/TORC1 activity, and if reciprocally Tor2/TORC1 increased activity concurs with Tor1/TORC2 attenuation, it cannot therefore be discerned if the suppression is due to Tor2/TORC1 raise or to Tor1/TORC2 dampening.

      Discussion

      The authors invoke that TOR controls lipin, despite what they go on to dismiss the link between TOR and lipids by saying "we did not observe any major changes in phospholipid composition when cells were grown in ammonium-supplemented YES medium compared to plain YES (Figure S2)", with this reinforcing their conclusion that ammonium does not suppress lipid-related cut mutants through directly correcting lipid metabolism defects. While I agree with that reasoning, I invoke again that they nevertheless neglected the clear change observed in their three replicates (Figure S2) that ammonium addition to WT cells strongly increases the amount of TAG (esterified fatty acids). Since lipin activity promotes DAG formation, which then leads to TAG accumulation, this aspect should not be neglected.

      The emphasis on TOR, which expands several paragraphs of the Discussion, should be revisited if the evidence provided for this part of the data is not reinforced.

      To finish, if I may provide some personal thoughts that may be useful for the authors, I would first remind that TAG storage prevents the channeling of phosphatidic acid towards novel phospholipid synthesis thus antagonizes NE expansion, which agrees with their neglected observation for the WT in Figure 2A. The antagonization of NE expansion can be achieved through autophagy (DOI 10.1038/s41467-023-39172-3; DOI 10.1177/25152564231157706), and indeed rapamycin addition (a very potent inducer of autophagy) also suppressed the cut phenotype (Figure 7A). What is more, in S. cerevisiae, autophagy has been shown as important to transition through mitosis conveniently and to prevent mitotic aberrations (DOI 10.1371/journal.pgen.1003245), and to impose a "genome instability" intolerance threshold by restricting NE expansion (DOI 10.1177/25152564231157706). In the first mentioned work, the authors proposed that autophagy may help raising aminoacid levels, which could assist cell cycle progression. This would have the virtue of reconciling the otherwise counterintuitive observation of the authors that rapamycin, which mimics nitrogen shortage, has the same effect than ammonium supplementation. It could be that ammonium supplementation mimics the downstream signal of a complex cascade initiated by actual aminoacid shortage, known to elicit autophagy-like processes (thus explaining why TAG raise, why the NE does not expand), and may culminate with launching a program for more accurate mitosis and genome segregation. In further support, TORC1 inhibition (as elicited by +rapamycin) is a central node that integrates multiple cues, not only nitrogen availability, but also carbon shortage (DOI 10.1016/j.molcel.2017.05.027), and even genetic instability cues (DOI 10.1016/j.celrep.2014.08.053), perhaps helping unravel why ammonium (via TOR) suppresses very diverse cut mutants, irrespective of whether they stem from lipid or chromatid cohesion deficiencies. These previous works should be considered by the authors.

      Minor

      There was no speculation about why the suppressions are partial.

      Reference 15, cited in the text, is absent from the references list.

      An explanation of which statistical tests were chosen and why they were chosen would be necessary.

      In particular, for the analyses performed for Figure 5, one-way ANOVA should be applied instead of several t-tests.

      A small section in M&M about how data in general was acquired, quantified, plotted and analyzed would be appropriate.

      In the discussion, the sentence "this could mean that the signaling of availability of a good nitrogen source is by itself more important for mitotic fidelity than the actual physical presence of the nutrients" is a rather void sentence. Because, from the point of view of how a cell "works", the signal is important for the basic reason that it is supposed to represent the actual real cue eliciting it.

      In the second part of Results, when the phenotype of cbf11 mutants concerning LD is mentioned, the authors said "aberrant LD content". It would be good if they can mention already at this stage which type of aberration this was: more LD? less LD? bigger? smaller?

      What is the difference between the two SE bands in Figure S2? What exactly does SE-1 and SE-2 mean?

      In Figure 2, the two graphs, presented side by side, would be more easily comparable if they could be plotted with the same y-axis scale.

      In Figure 1A, it would be useful for non-specialists of this phenotype and non-pombe readers to show a control of how it looks to be "normal".

      Referees cross-commenting

      Overall, there is a striking consensus on the need to either address experimentally or remove the emphasis put on the TOR/mitotic fidelity connection, and of clarifying the counter-intuitive notions associated to the results obtained with rapamycin. Also, the need for revisiting / improving / justifying the means by which nuclear envelope deformation is assessed has been raised at least twice. I therefore guess that the common guidelines for improving this manuscript are clearly established.

      Significance

      In view of all of the above, my feeling is that the authors have put the accent on the TOR message, which is weak, while they have less put the accent on very strong and elegant findings they do: The authors discover that the suppression of cut(-like) mutant phenotype by addition of NH4 is not due to a correction in lipid metabolism defects, suggesting that the effect is indirect. In support, cut-like mutants whose molecular defect stems from lipid-unrelated defects are also suppressed by ammonium addition. What is more, the authors refine the type of cut-like mutants susceptible of being "corrected" by ammonium addition, finding a "novel definition of cuts" that invoke a temporal rule. This important observation has relevant implications:

      • the long-standing interpretation (commented by the authors) that lipid-related cut mutants are defective because of insufficient synthesis of lipids to be able to grow their nuclear envelope membranes seems now inappropriate in light of their data;
      • this has the immediate implication that perhaps the importance of nitrogen supplementation for accurate mitosis is no longer a fact that may apply only to (yeast) organisms performing closed mitosis, which may broaden the implications of their finding substantially;
      • the nature of the temporal ruler they discover that makes defects appearing early susceptible of being suppressed by nitrogen supplementation deserves analysis in further works, thus opening an immediate perspective.
    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study utilizes a virus-mediated short hairpin RNA (shRNA) approach to investigate in a novel way the role of the wild-type PHOX2B transcription factor in critical chemosensory neurons in the brainstem retrotrapezoid nucleus (RTN) region for maintaining normal CO2 chemoreflex control of breathing in adult rats. The solid results presented show blunted ventilation during elevated inhaled CO2 (hypercapnia) with knockdown of PHOX2B, accompanied by a reduction in expression of Gpr4 and Task2 mRNA for the proposed RTN neuron proton sensor proteins GPR4 and TASK2. These results suggest that maintained expression of wild-type PHOX2B affects respiratory control in adult animals, which complements previous studies showing that PHOX2B-expressing RTN neurons may be critical for chemosensory control throughout the lifespan and with implications for neurological disorders involving the RTN. When some methodological, data interpretation, and prior literature reference issues further highlighting novelty are adequately addressed, this study will be of interest to neuroscientists studying respiratory neurobiology as well as the neurodevelopmental control of motor behavior.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This important study investigated the role of the PHOX2B transcription factor in neurons in the key brainstem chemosensory structure, the retrotrapezoid nucleus (RTN), for maintaining proper CO2 chemoreflex responses of breathing in the adult rat in vivo. PHOX2B has an important transcriptional role in neuronal survival and/or function, and mutations of PHOX2B severely impair the development and function of the autonomic nervous system and RTN, resulting in the developmental genetic disease congenital central hypoventilation syndrome (CCHS) in neonates, where the RTN may not form and is functionally impaired. The function of the wild-type PHOX2B protein in adult RTN neurons that continue to express PHOX2B is not fully understood. By utilizing a viral PHOX2B-shRNA approach for knockdown of PHOX2B specifically in RTN neurons, the authors' solid results show impaired ventilatory responses to elevated inspired CO2, measured by whole-body plethysmography in freely behaving adult rats, that develop progressively over a four-week period in vivo, indicating effects on RTN neuron transcriptional activity and associated blunting of the CO2 ventilatory response. The RTN neuronal mRNA expression data presented suggests the impaired hypercapnic ventilatory response is possibly due to the decreased expression of key proton sensors in the RTN. This study will be of interest to neuroscientists studying respiratory neurobiology as well as the neurodevelopmental control of motor behavior.

      Strengths:

      (1) The authors used a shRNA viral approach to progressively knock down the PHOX2B protein, specifically in RTN neurons to determine whether PHOX2B is necessary for the survival and/or chemosensory function of adult RTN neurons in vivo.

      (2) To determine the extent of PHOX2B knockdown in RTN neurons, the authors combined RNAScope® and immunohistochemistry assays to quantify the subpopulation of RTN neurons expressing PHOX2B and neuromedin B (Nmb), which has been proposed to be key chemosensory neurons in the RTN.

      (3) The authors demonstrate that knockdown efficiency is time-dependent, with a progressive decrease in the number of Nmb-expressing RTN neurons that co-express PHOX2B over a four-week period.

      (4) Their results convincingly show hypoventilation particularly in 7.2% CO2 only for PHOX2B-shRNA RTN-injected rats after four weeks as compared to naïve and non-PHOX2B-shRNA targeted (NT-shRNA) RTN injected rats, suggesting a specific impairment of chemosensitive properties in RTN neurons with PHOX2B knockdown.

      (5) Analysis of the association between PHOX2B knockdown in RTN neurons and the attenuation of the hypercapnic ventilatory response (HCVR), by evaluating the correlation between the number of Nmb+/PHOX2B+ or Nmb+/PHOX2B- cells in the RTN and the resulting HCVR, showed a significant correlation between HCVR and number of Nmb+/PHOX2B+ and Nmb+/PHOX2B- cells, suggesting that the number of PHOX2B-expressing cells in the RTN is a predictor of the chemoreflex response and the reduction of PHOX2B protein impairs the CO2-chemoreflex.

      (6) The data presented indicate that PHOX2B knockdown not only causes a reduction in the HCVR but also a reduction in the expression of Gpr4 and Task2 mRNAs, suggesting that PHOX2B knockdown affects RTN neurons transcriptional activity and decreases the CO2 response, possibly by reducing the expression of key proton sensors in the RTN.

      (7) Results of this study show that independent of the role of PHOX2B during development, PHOX2B is still required to maintain proper CO2 chemoreflex responses in the adult brain, and its reduction in CCHS may contribute to the respiratory impairment in this disorder.

      Weaknesses:

      (1) The authors found a significant decrease in the total number of Nmb+ RTN neurons (i.e., Nmb+/PHOX2B+ plus Nmb+/ PHOX2B-) in NT-shRNA rats at two weeks post viral injection, and also at the four-week period where the impairment of the chemosensory function of the RTN became significant, suggesting some inherent cell death possibly due to off-target toxic effects associated with shRNA procedures that may affect the experimental results.

      (2) The tissue sampling procedures for quantifying numbers of cells expressing proteins/mRNAs throughout the extended RTN region bilaterally have not been completely validated to accurately represent the full expression patterns in the RTN under experimental conditions.

      (3) The inferences about RTN neuronal expression of NMB, GPR4, or TASK2 are based on changes in mRNA levels, so it remains speculation that the observed reduction in Gpr4 and Task2 mRNA translates to a reduction in the protein levels and associated reduction of RTN neuronal chemosensitive properties.

      Thank you for sharing the excitement for our study showing novel findings on the contribution of PHOX2B to the chemoreflex response and activity of adult RTN neurons. We believe that reporting the results on cell death following shRNA viral injections, potentially due to some off-target effects, are important to share with the scientific community to help plan experiments of similar kind in various fields of neuroscience.

      Thanks for pointing out your concerns about cell quantification, we have edited the methods and results section to add clarity about our analytical procedure.

      As we discussed in the manuscript, we were only able to assess mRNA levels of Nmb, Gpr4, Task2 as current available antibodies for the 3 targets are still unreliable. Future studies will benefit from the analysis of changes at protein levels and possibly electrophysiological recordings to verify that chemosensitive properties of RTN neurons are impaired due to reduction of PHOX2B expression. We discuss these limitations in the discussion.

      Reviewer #2 (Public Review):

      Summary:

      The authors used a short hairpin RNA technique strategy to elucidate the functional activity of neurons in the retrotrapezoid nucleus (RTN), a critical brainstem region for central chemoreception. Dysfunction in this area is associated with the neuropathology of congenital central hypoventilation syndrome (CCHS). The subsequent examination of these rats aimed to shed light on the intricate aspects of RTN and its implications for central chemoreception and disorders like CCHS in adults. They found that using the short hairpin RNA (shRNA) targeting Phox2b mRNA, a reduction of Phox2b expression was observed in Nmb neurons. In addition, Phox2b knockdown did not affect breathing in room air or under hypoxia, but the hypercapnia ventilatory response was significantly impaired. They concluded that Phox2b in the adult brain has an important role in CO2 chemoreception. They thought that their findings provided new evidence for mechanisms related to CCHS neuropathology. The conclusions of this paper are well supported by data, but careful discussion seems to be required for comparison with the results of various previous studies performed by different genetic strategies for the RTN neurons.

      Strengths:

      The most exciting aspect of this work is the modelling of the Phox2b knockdown in one element of the central neuronal circuit mediating respiratory reflexes, that is in the RTN. To date, mutations in the PHOX2B gene are commonly associated with most patients diagnosed with CCHS, a disease characterized by hypoventilation and absence of chemoreflexes, in the neonatal period, which in severe cases can lead to respiratory arrest during sleep. In the present study, the authors demonstrated that the role of Phox2b extends beyond the developmental period, and its reduction in CCHS may contribute to the respiratory impairment observed in this disorder.

      Weaknesses:

      Whereas the most exciting part of this work is the knockdown of the Phox2b in the RTN in adult rodents, the weakness of this study is the lack of a clear physiological, developmental, and anatomical distinction between this approach and similar studies already reported elsewhere (Ruffault et al., 2015, DOI: 10.7554/eLife.07051; Ramanantsoa et al., 2011, DOI: 10.1523/JNEUROSCI.1721-11.2011; Huang et al., 2017, DOI: 10.1016/j.neuron.2012.06.027; Hernandez-Miranda et al., 2018, DOI: 10.1073/pnas.1813520115; Ferreira et al., 2022 DOI: 10.7554/eLife.73130; Takakura et al., 2008 DOI: 10.1113/jphysiol.2008.153163; Basting et al., 2015 DOI: 10.1523/JNEUROSCI.2923-14.2015; Marina et al., 2010 DOI: 10.1523/JNEUROSCI.3141-10.2010). In addition, several conclusions presented in this work are not directly supported by the provided data.

      Thanks for the feedback on or manuscript. We have further highlighted in our discussion the previous developmental work aimed at determining the role of PHOX2B in embryonic development. Our study was triggered by the fascinating observations that despite its important role in development of the central and peripheral nervous system, PHOX2B is still present in the adult brain and its function in adult neurons is unknown, thus we aimed to investigate its role in the adult RTN by knocking down its expression with a shRNA approach. Therefore, in our model knockdown of PHOX2B does not affect development of the RTN. Previous studies (mentioned by the reviewer, as well as cited in the manuscript) have focused on investigating 1) the role of PHOX2B in the developmental period, 2) the physiological changes associated with the transgenic expression of mutant forms of PHOX2B in relation to CCHS, 3) the killing or the acute silencing/excitation of neuronal activity of PHOX2B+ RTN neurons. Our study had a different aim: to test whether the transcription factor PHOX2B had a physiologically relevant role in adult RTN neurons. In this experimental approach PHOX2B is not altered throughout embryonic or postnatal development. By knocking down PHOX2B in the Nmb+ cells of the RTN our results show a reduction in chemoreflex response and mRNA expression of protein sensors. Hence, we conclude that PHOX2B alters the function of Nmb+ RTN neurons, possibly through transcriptional changes including the reduction in Gpr4 and Task2 mRNA expression.

      Reviewer #3 (Public Review):

      A brain region called the retrotrapezoid nucleus (RTN) regulates breathing in response to changes in CO2/H+, a process termed central chemoreception. A transcription factor called PHOX2B is important for RTN development and mutations in the PHOX2B gene result in a severe type of sleep apnea called Congenital Central Hypoventilation Syndrome. PHOX2B is also expressed throughout life, but its postmitotic functions remain unknown. This study shows that knockdown of PHOX2B in the RTN region in adult rats decreased expression of Task2 and Gpr4 in Nmb-expressing RTN chemoreceptors and this corresponded with a diminished ventilatory response to CO2 but did not impact baseline breathing or the hypoxic ventilatory response. These results provide novel insight regarding the postmitotic functions of PHOX2B in RTN neurons.

      Main issues:

      (1) The experimental approach was not targeted to Nmb+ neurons and since other cells in the area also express Phox2b, conclusions should be tempered to focus on Phox2b expressing parafacial neurons NOT specifically RTN neurons.

      (2) It is not clear whether PHOX2B is important for the transcription of pH sensing machinery, cell health, or both. If knockdown of PHOX2B knockdown results in loss of RTN neurons this is also expected to decrease Task2 and Gpr4 levels, albeit by a transcription-independent mechanism.

      Although we did not specifically target Nmb+ neurons, we performed viral injections within the area where neurons expressing PHOX2B and Nmb are localized (i.e., the RTN region). We carefully quantified the impact of PHOX2B knockdown on Nmb expressing neurons, as well as the effects on the adjacent TH expressing C1 population and FN neurons (figure 5). As reported in the results section, significant changes in the numbers of PHOX2B expressing neurons was only observed at the site of injection in PHOX2B+/Nmb+ neurons. We did not observe changes in the total number of C1 cells (TH+/PHOX2B+), in the number of TH cells coexpressing PHOX2B, or in the hypoxic ventilatory response (which is dependent on the health status of C1 neuron). We have updated figure 5 to show representative expression of PHOX2B in TH+ neurons in the ventral medulla to complement our cell count analysis. To address potential effects on other cell populations we have edited our discussion as follows:

      “PHOX2B knockdown was also restricted to RTN neurons, as adjacent C1 TH+ neurons did not show any change in number of TH+/PHOX2B+ expressing cells, although we cannot exclude that some C1 cells may have been infected and their relative PHOX2B expression levels were reduced. To support the lack of significant alterations associated with the possible loss of C1 function was the absence of significant changes in the hypoxic response that has been shown to be dependent on C1 neurons (Malheiros-Lima et al., 2017).”

      Where appropriate, we have substituted “RTN” with “Nmb expressing neurons of the RTN” throughout the manuscript.

      We have clarified in the methods and results section how we quantified Task2 and Gpr4 mRNA expression. The quantification was performed on a pool of single cells (200-250/rat) expressing Nmb. Hence, the overall reduction is not a result of general fluorescence loss in the RTN region, but specifically assessed in single cells expressing Nmb. This is therefore independent of the reduction of the total number of Nmb cells.

      We propose that cell death is not a direct effect of PHOX2B knockdown, but rather it is associated with the injection of the viral constructs that have been already reported to promote some off-target effects (as reported in the manuscript). While modest cell death is observed only in the first two weeks post-infection, it does not increase further between 2 and 4 weeks post infection, when the reduction in PHOX2B (not associated with a further reduction in Nmb+ cells, hence no further cell death in RTN) is evident and the respiratory chemoreflex is impaired. These results suggest that 1) reduction of PHOX2B is not responsible for cell death; 2) it is the reduction of PHOX2B levels that promotes chemoreflex impairment. Given the observation that Nmb cells with no detectable PHOX2B protein show reduced expression of Task2 and Gpr4 mRNA, we propose that one of the possible mechanisms of chemoreflex impairment in PHOX2B shRNA rats is the reduction of Task2 and Gpr4. In the discussion we also suggest possible additional mechanisms that can be investigated in further studies.

      Recommendations for the authors:.

      In revising this manuscript, the authors should carefully address the issues raised by the reviewers to substantially improve the manuscript and solidify the reviewers' general assessment of the potential importance of this work.

      Reviewer #1 (Recommendations For The Authors):

      Major concerns:

      (1) The cell counts for Nmb+/PHOX2B+ and Nmb+/PHOX2B- RTN neurons are a critical component of the study, and it is unclear how the tissue sampling procedures (eight sections per animal) for quantifying numbers of cells expressing proteins/mRNAs throughout the extended RTN region bilaterally has been validated to accurately represent the full expression patterns in the RTN under the experimental conditions. It is possible that the sampling/quantification procedures used may be adequate, but validation is important. Also, quantification of the CTCF signal for Nmb, Gpr4, and Task2 mRNA is an important component of this study, but only four sections/rats were used.

      Thank you for pointing out your concern on our quantification method. We have clarified in the methods section the procedure for cell counting and quantification of the CTCF signal. We have sampled the area of the RTN in order to identify Nmb cells of RTN.

      We have edited the methods section as follows:

      “To quantify Nmb+/PHOX2B- and Nmb+/PHOX2B+ neurons within the RTN region, we analysed one in every seven sections (210 µm interval; 8 sections/rat in total) along the rostrocaudal distribution of the RTN on the ventral surface of the brainstem and compared total bilateral cell counts of PHOX2B-shRNA rats with non-target control (NT-shRNA) and naïve rats. Cells that expressed Nmb and Phox2b mRNAs but did not show co-localization with PHOX2B protein were considered Nmb+/PHOX2B-.

      The Corrected Total Cell Fluorescence (CTCF) signal for Nmb, Gpr4 and Task2 mRNAs was quantified as previously described (Cardani et al., 2022; McCloy et al., 2014). Briefly, a Leica TCS SP5 (B-120G) Laser Scanning Confocal microscope was used to acquire images of the tissue. Exposure time and acquisition parameters were set for the naïve group and kept unchanged for the entire dataset acquisition. The collected images were then analysed by selecting a single cell at a time and measuring the area, integrated density and mean grey value (McCloy et al., 2014). For each image, three background areas were used to normalize against autofluorescence. We used 4 sections/rat (210 µm interval) to count Nmb, Gpr4 and Task2 mRNA CTCF in the core of the RTN area where several Nmb cells could be identified. For each section two images were acquired with a 20× objective, so that at least fifty cells per tissue sample were obtained for the mRNA quantification analysis. To evaluate changes in Nmb mRNA expression levels following PHOX2B knockdown at the level of the RTN, we compared, the fluorescence intensity of each RTN Nmb+ cell (223.2 ± 37.1 cells/animal) with the average fluorescent signal of Nmb+ cells located dorsally in the NTS (4.3 ± 1.2 cells/animal) (Nmb CTCF ratio RTN/NTS) as we reasoned that the latter would not be affected by the shRNA infection and knockdown.

      To quantify Gpr4 and Task2 mRNA expression in Nmb cells of the RTN, we first quantified single cell CTCF for either Gpr4 (200.7 ± 13.2 cells/animal) or Task2 (169.6 ± 10.3 cells/animal) mRNA in Nmb+ RTN neurons in the 3 experimental groups (naïve, NT shRNA and PHOX2B shRNA) independent of their PHOX2B expression. We then compared CTCF values of Gpr4 and Task2 mRNA between Nmb+/PHOX2B+ and Nmb+/PHOX2B- RTN neurons in PHOX2B-shRNA rats to address changes in their mRNA expression induced by PHOX2B knockdown.

      (2) Furthermore, to evaluate changes in Nmb mRNA expression following PHOX2B knockdown at the level of the RTN, it is stated in Materials and Methods "we compared, on the same tissue section, the fluorescence intensity of RTN Nmb+ cells with the signal of Nmb+ cells in the NTS (Nmb CTCF ratio RTN/NTS)". How this was accomplished is unclear, considering the non-overlapping locations of the RTN and rostral NTS. Providing images would be helpful.

      The first sections containing Nmb cells in the ventral medulla also express few Nmb cells in the dorsal medulla. We used those cells as reference for fluorescence levels since they would not be affected by the viral infection. Similar cells are also present in the brains of mice and reported in the Allen Brain atlas (https://mouse.brain-map.org/experiment/show/71836874). We have clarified our procedure in the methods section (see above) and included a sample image of Nmb in both ventral and dorsal regions in Figure 5.

      (3) The staining for tyrosine hydroxylase (TH) to identify and quantify C1 cells (TH+/PHOX2B+) following shRNA injection provides important information, and it would be useful to show images of histological examples to accompany Fig. 5A.

      We included in figure 5A a sample image of C1 neurons used for our TH quantification.

      Minor:

      (1) Provide animal ns in the text of the Results section for the four weeks of PHOX2B knockdown.

      They have been included.

      (2) Please state in the legends for Figures 2 & 3, which images are superimposition images.

      We have in the figure information about merged images.

      Reviewer #2 (Recommendations For The Authors):

      This manuscript by Cardani and colleagues attempts to address whether a reduction in Phox2b expression in chemosensitive neuromedin-B (NMB)-expressing neurons in the RTN alters respiratory function. The authors used a short hairpin RNA technique to silence RTN chemosensor neurons. The present study is very interesting, but there are several major concerns that need to be addressed, including the main hypothesis.

      Major

      (1) Page 6, lines 119-121: I did not grasp the mechanistic property described by the authors in this passage, nor did I understand the experiments they conducted to establish a mechanistic link between Phox2b and the chemosensitive property. Could the authors provide further clarification on these points?

      We believe the reviewer refers to this paragraph: “In order to have a better understanding of the role of PHOX2B in the CO2 homeostatic processes we used a non-replicating lentivirus vector of two short-hairpin RNA (shRNA) clones targeting selectively Phox2b mRNA to knockdown the expression of PHOX2B in the RTN of adult rats and tested ventilation and chemoreflex responses. In parallel, we also determined whether knockdown of PHOX2B in adult RTN neurons negatively affected cell survival. Finally, we sought to provide a mechanistic link between PHOX2B expression and the chemosensitive properties of RTN neurons, which have been attributed to two proton sensors, the proton-activated G protein-coupled receptor (GPR4) and the proton-modulated potassium channel (TASK-2).”

      The rationale for running these experiments is based on the fact that it is well known in the literature that PHOX2B is an important transcription factor for the development of several neuronal populations. PHOX2B Knockout mice die before birth and heterozygous mice have some anatomical defects, but respiration is only impaired in the early post-natal period. While many developmental transcription factors are generally downregulated in the post-natal period, PHOX2B is still expressed in some neurons into adulthood. What is the function of PHOX2B in these fully developed neurons? We do not know as we do not yet know the entire set of target genes that PHOX2B regulates in the adult brain. Hence we decided to test what would happen if we knocked down the PHOX2B protein in the Nmb neurons of the RTN, an area that is critical for central chemoreception and involved in the presentation of CCHS. Our results show that reduction of PHOX2B blunts the CO2 chemoreflex response and reduces mRNA expression of Task2 and Gpr4, two pH sensors that have been shown to be key for RTN chemosensitive properties. We also show that the Nmb mRNA and cell survival are not affected by PHOX2B knockdown and we propose that the reduced CO2 chemoreflex may be attributed to a reduction of chemosensory function of Nmb neurons of the RTN due to partial loss of Gpr4 and Task2.

      (2) It is imperative for the authors to enhance the description of their hypothesis, as, from my perspective, the contribution of the data to the field is not clearly articulated. Numerous more selectively designed experiments were conducted to investigate the role of Phoxb-expressing neurons at the RTN level and their involvement in respiratory activity. In summary, the current study appears to lack novelty.

      We respectfully disagree with this statement. We believe we have adequately summarized previous work, although we realize we can’t reference every single publication on this topic. As described above, the developmental role of PHOX2B has been elegantly investigated in mouse embryonic studies (extensively cited in the manuscript). Furthermore, very interesting studies have shown that when the CCHS defining mutant PHOX2B protein (+7Ala PHOX2B) and other mutations linked to CCHS have been transgenically expressed in mice through development, severe anatomical defects are observed and respiratory function is impaired (extensively cited in the manuscript). We have also cited papers relevant to this study that describe the role of PHOX2B/Nmb RTN neurons and the pH protein sensors in the CO2 chemoreflex. If we missed some papers that the reviewer deems essential in the context of this study we will be happy to include them.

      We are not aware of other studies that have investigated the specific role of the PHOX2B protein in the adult RTN in the absence of confounding developmental pathogenesis (i.e. in an otherwise ‘healthy’ animal), and of no other studies that looked at the effects on the RTN proton sensors and Nmb expression following PHOX2B knockdown. Hence we believe that our results are novel and, in our opinion, very interesting.

      (3) On pages 13 and 14 (Results section), I am seeking clarity on the novelty of the findings. Doug Bayliss's prior work has already demonstrated the role of Gpr4 and Task2 on Phox2b neurons in regulating ventilation in conscious rodents.

      Bayliss’ group has elegantly demonstrated that Gpr4 and Task2 are the two proton sensors in the PHOX2B/Nmb neurons of the RTN that have a key role in chemoreception (cited in the manuscript). The novelty of our findings is that we show that a reduction in PHOX2B protein is associated with a reduction of mRNA levels of Gpr4 and Task2. This is a novel finding. Currently, we do not know what transcriptional activity PHOX2B has in adult RTN neurons (i.e., what gene targets PHOX2B has in this cell population and many others) and here we propose that Nmb is not a gene target of PHOX2B while Gpr4 and Task2 are.

      (4) The authors assert that the transcription factor Phox2b remains not fully understood. While I concur, the present study falls short of fully investigating the actual contribution of Phox2b to breathing regulation. In other words, the knockdown of Phox2b neurons did not add much to the knowledge of the field.

      We respectfully disagree with the reviewer. With the exception of very few target genes, the transcriptional role of PHOX2B beyond the embryonic development is poorly understood. No mechanistic connection has been made before between the transcriptional activity of PHOX2B with the expression of proton sensors in the RTN. Other groups have investigated the role of stimulating or depressing the neuronal activity of PHOX2B/NMB neurons in the RTN showing a key role of RTN on respiratory control, but these prior studies did not test whether changing the expression of the PHOX2B protein in these neurons had a role on respiratory control and the central chemoreflex. No other study has investigated the role of the PHOX2B protein within the RTN cells, with the exception of PHOX2B knockout mice or transgenic expression of the mutated PHOX2B that are relevant for CCHS. Again, these previous studies were done on a background of developmental impairment and to the best of our knowledge did not seek to show any association between PHOX2B expression and expression of Gpr4 or Task2.

      (5) I recommend removing the entire section entitled "The role of Phox2b in development and in the adult brain." The authors merely describe Phox2b expression without contextualizing it within the obtained data.

      Because reviewers raised the issue about not including important information about the role of PHOX2B in development and respiratory control we prefer to keep the section.

      (6) Are the authors aware of whether the shRNA in Phox2b/Nmb neurons truly induced cell death or solely depleted the expression of the transcription factor protein? Do the chemosensitive neurons persist?

      This is an excellent question that we tried to address with our study. As we report in figures 2 and 3, we propose that some cell death is occurring as an off-target effect within the first 2 weeks post-infection, likely due to off-target action of the shRNA approach and not dependent on the reduction of PHOX2B expression (discussed in the manuscript). This is further evidenced by our Fig.S1 data in which higher concentrations of shRNA led to more cell death, indicative of off-target effects. We do not believe it is a consequence of our surgical procedure as we do not see similar cell loss when injecting vehicle or other control solutions (unpublished work; Janes et al., 2024).

      During the first 2 weeks post-surgery the proportion of Nmb+/PHOX2B- cells does not change compared to control rats or non-target shRNA (knockdown is not yet visible at protein level). Four weeks post-injection, there is no further cell death (assessed by the total number of NMB cells), whereas the fraction of NMB cells that express PHOX2B is reduced (and the fraction of NMB not expressing PHOX2B is increased), suggesting that the reduction of PHOX2B protein in Nmb cells is not correlated with cell loss/survival whereas the impairment that we observe in terms of central chemoreception is possibly due to the progressive decrease of PHOX2B expression in these neurons.

      (7) In Figures 2 and 3, it is noteworthy that the authors observe peak expression at a very caudal level. In rats, the RTN initiates at the caudal end of the facial, approximately 11.6 mm, and should exhibit a rostral direction of about 2 mm.

      In our experience the Nmb cells on the ventral surface of the medulla peak in number around the caudal tip of the facial nucleus in adult SD rats (Janes et al., 2024). To add clarity to the figure we reported cell count distribution data in relation to the distance from caudal tip of the facial.

      Minor

      (1) I would like to suggest that the authors correct the recurring statement throughout the manuscript that Phox2b is essential only for the development of the autonomic nervous system. In my view, it also plays a crucial role in certain sensory and respiratory systems.

      We have addressed this in the manuscript.

      (2) Page 4, lines 59-60: Out of curiosity, do the data include information from different countries?

      This data refers to information from France and Japan. Currently it is estimated that there are 1000-2000 CCHS patients worldwide.

      (3) Page 7, lines 129-131: In my understanding, the sentence is quite clear; if we knock down the PHOX2B gene, we are expected to reduce or even eliminate the expression of Gpr4 or Task2. Am I right?

      This is what we propose from the results of this study. We would like to point out that the transcriptional activity of PHOX2B (i.e., what genes PHOX2B regulate) in adult neurons has not yet been fully investigated. With the exception of few target genes (e.g., TH, DBH) the transcriptional activity of PHOX2B in neurons is not yet known. Here we report novel findings that suggest that Gpr4 and Task2 are potential target genes of PHOX2B in RTN neurons.

      (4) The authors mentioned that NT-shRNA also impacts CO2 chemosensitivity. Could this effect be attributed to mechanical damage of the tissue resulting from the injection?

      Just to clarify, we observe some impairment in chemosensitivity when NT-shRNA was injected in “larger” (2x 200ul/side) volume. No impairment was observed in NT-shRNA when we injected smaller volumes (2x 100ul/side). Physical damage could be a possibility although in our experience (unpublished work; Janes et al, 2024, Acta Physiologica) injections of similar volume of solution performed by the same investigator in the same brain area and experimental settings did not produce a physical lesion associated with respiratory impairment. Hence we attribute the unexpected results with larger volumes to toxic effects associated with the shRNA viral constructs.

      (5) In the reference section, the authors should review and correct some entries. For instance, Janes, T. A., Cardani, S., Saini, J. K., & Pagliardini, S. (2024). Title: "Etonogestrel Promotes Respiratory Recovery in an In Vivo Rat Model of Central Chemoreflex Impairment." Running title: "Chemoreflex Recovery by Etonogestrel." Some references contain the journal, pages, and volume, while others lack this information entirely.

      We have updated references. Janes et al., 2024 has now been published in Acta Physiologica.

      (6) Why does the baseline have distribution points, whereas the other boxplots do not?

      We have clarified in the figure legend that, to be fair to the presentation of our results, the data points shown in some of the boxplot graphs do not refer to entire baseline data but only the ones that are outliers.

      In our Box-and Whisker-Plots, whiskers represent the 10th and 90th percentiles, showing the range of values for the middle 80% of the data. Individual data values that fall outside the 10th/90th percentile range are represented as single point (outliers).

      Reviewer #3 (Recommendations For The Authors):

      • What is the rationale behind dedicating the first paragraph of results to discussing an artifact?

      We think that it is important to report off target effects of shRNA viral constructs as concentration and volumes of viruses injected in various studies vary considerably and other investigators may attempt to use larger volumes of viruses to obtain more considerable or faster knockdown but would obtain erroneous conclusions if appropriate tests are not performed.

      Furthermore, because some readers could question whether we injected enough virus to knockdown the expression of PHOX2B, and may wonder if with a larger amount of virus we would increase knockdown efficiency, we wanted to show that, in our opinion, we used the maximum amount of virus to knockdown PHOX2B without causing toxic effects or physiological changes that are not dependent on PHOX2B knockdown.

      • All individual data points should be visible in floating bar graphs in Figures 1 and 4. For example, I don't see any dots for naïve animals in any of the panels in Figure 1.

      We have clarified in the figure legend that, to be fair to the presentation of our results, the data points shown in some of the boxplot graphs do not refer to entire baseline data but only the ones that are outliers.

      In our Box-and Whisker-Plots, whiskers represent the 10th and 90th percentiles, showing the range of values for the middle 80% of the data. Individual data values that fall outside the 10th/90th percentile range are represented as single point (outliers).

      • Please include specific F and T values along with DF.

      We have included a table with all the specific values in the supplementary section as Table 1.

      • The C1 and facial partly overlap with the RTN at this level of the medulla and these cells should appear as Phox2b+/Nmb- cells so it is not clear to me why these cells are not evident in the control tissue in Figures 2B and 3B. Also, some of the bregma levels shown in Figure 5A overlap with Figures 2-3 so again it is not clear to me how this non-cell type specific viral approach was targeted to Nmb cells but not nearby TH+ cells. Please clarify.

      In our experience, C1 TH cells are located slightly medial to the Nmb cells and they spread much more caudally than Nmb cells of the RTN. We focused our small volume injection in the core of the RTN to target Nmb cells but we also assessed PHOX2B knockdown in TH C1 cells by counting the PHOX2B/TH cells across treatment groups. Although we can’t exclude subtle changes in the C1 population, we did not observe changes in the total number of C1 cells (TH+/PHOX2B+), in the number of TH cells expressing PHOX2B, or in the hypoxic ventilatory response (which is dependent on the health status of C1 neuron). We have updated figure 5 to show representative expression of PHOX2B in TH+ neurons in the ventral medulla to complement our cell count analysis. To address potential effects on other cell populations we have edited our discussion as follows:

      “PHOX2B knockdown was also restricted to RTN neurons, as adjacent C1 TH+ neurons did not show any change in number of TH+/PHOX2B+ expressing cells, although we cannot exclude that some C1 cells may have been infected and their relative PHOX2B expression levels were reduced. To support the lack of significant alterations associated with the possible loss of C1 function was the absence of significant changes in the hypoxic response that has been shown to be dependent on C1 neurons (Malheiros-Lima et al., 2017).”

      • To confirm, Nmb is not expressed in the NTS, and this region was chosen as a background, right?

      In order to systematically analyze Nmb mRNA expression we decided to use measurement of fluorescence relative to Nmb neurons present in the dorsal brainstem. Here cells are sparse but we used them as reference fluorescence since they would not be affected by the ventral shRNA injection. Similar cells are also present in the brains of mice and reported by the Allen Brain atlas (https://mouse.brain-map.org/experiment/show/71836874). We have clarified our procedure in the methods section (see above) and included a sample image of Nmb in both ventral and dorsal in Figure 5.

      • How do you get a loss of Nmb+ neurons (Figs 2-3) with no change in Nmb fluorescence (Fig. 5B)? In the absence of representative images these results are not compelling and should be substantiated by more readily quantifiable approaches like qPCR.

      We have clarified in the methods and results section our analytical procedure to assess PHOX2B and Nmb expression. Figure 2 and 3 display the results of counting numbers of Nmb+ cells in the RTN. Figure 5B reports the average of total cell fluorescence measured inside Nmb+ cells, not an average fluorescence measurement of the area of the ventral medulla. Basically, our results show that we have less Nmb cells that express PHOX2B but the overall Nmb mRNA fluorescence (expression) in Nmb cells relative to Nmb fluorescence in cells of the dorsal brainstem is the same.

      We have edited the methods as follows:

      “The Corrected Total Cell Fluorescence (CTCF) signal for Nmb, Gpr4 and Task2 mRNAs was quantified as previously described (Cardani et al., 2022; McCloy et al., 2014). Briefly, a Leica TCS SP5 (B-120G) Laser Scanning Confocal microscope was used to acquire images of the tissue. Exposure time and acquisition parameters were set for the naïve group and kept unchanged for the entire dataset acquisition. The collected images were then analysed by selecting a single cell at a time and measuring the area, integrated density and mean grey value (McCloy et al., 2014). For each image, three background areas were used to normalize against autofluorescence. We used 4 sections/rat (210 µm interval) to count Nmb, Gpr4 and Task2 mRNA CTCF in the core of the RTN area where several Nmb cells could be identified. For each section two images were acquired with a 20× objective, so that at least fifty cells per tissue sample were obtained for the mRNA quantification analysis. To evaluate changes in Nmb mRNA expression levels following PHOX2B knockdown at the level of the RTN, we compared the fluorescence intensity of each RTN Nmb+ cell (223.2 ± 37.1 cells/animal) with the average fluorescent signal of Nmb+ cells located dorsally in the NTS ( 4.3 ± 1.2 cells/animal) (Nmb CTCF ratio RTN/NTS) as we reasoned that the latter would not be affected by the shRNA infection and knockdown. “

      A single cell qPCR analysis would be definitely ideal but a qPCR from dissected tissue would not help us determine whether within a cell there was a reduction in Nmb mRNA levels.

      • The boxed RTN region in these examples is all over the place. It the RTN should be consistently placed along the ventral surface under the facial and pprox.. equal distance from the trigeminal and pyramids.

      We have update the figures to consistently present the areas of interest where Nmb cells are located and images are taken.

      • Fluorescent in situ typically appears as discrete puncta so it is not clear to me why that is not the case here.

      Our images are taken at low magnification (20X) where it is difficult to distinguish the single mRNA molecules. However, is it possible to appreciate the differences between the grainy fluorescent signal in the in situ hybridization assay (RNAScope) and the smoother signal of protein detection in the immunofluorescence assay.

      • Can TUNEL staining be done to confirm loss of Nmb neurons is due to death and not re-localization?

      Does the reviewer mean “cell migration” with relocalization? We do not expect that this would occur in our experiments. Although TUNEL in the first week post-infection could be useful to determine cell death in our tissue, we do not expect a cell migration of neurons within the brain as our viral shRNA injections are performed in adult rats when developmental processes are already concluded.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews

      Reviewer #1 (Public Review):

      Comment: The fact that there are Arid1a transcripts that escape the Cre system in the Arid1a KO mouse model might difficult the interpretation of the data. The phenotype of the Arid1a knockout is probably masked by the fact that many of the sequencing techniques used here are done on a heterogeneous population of knockout and wild type spermatocytes. In relation to this, I think that the use of the term "pachytene arrest" might be overstated, since this is not the phenotype truly observed. Knockout mice produce sperm, and probably litters, although a full description of the subfertility phenotype is lacking, along with identification of the stage at which cell death is happening by detection of apoptosis.

      Response: As the reviewer indicates, we did not observe a complete arrest at Pachynema. In fact, the histology shows the presence of spermatids and sperm in seminiferous tubules and epididymides (Fig. Sup. 3). However, our data argue that the wild-type haploid gametes produced were derived from spermatocyte precursors that have likely escaped Cre mediated activity (Fig. Sup. 4). Furthermore, diplotene and metaphase-I spermatocytes lacking ARID1A protein by IF were undetectable in the Arid1acKO testes (Fig. S4B). Therefore, although we do not demonstrate a strict pachytene arrest, it is reasonable to conclude that ARID1A is necessary to progress beyond pachynema. We have revised the manuscript to reflect this point (Abstract lines 17,18; Results lines 153,154)

      Comment: It is clear from this work that ARID1a is part of the protein network that contributes to silencing of the sex chromosomes. However, it is challenging to understand the timing of the role of ARID1a in the context of the well-known DDR pathways that have been described for MSCI.

      Response: With respect to the comment on the lack of clarity as to which stage of meiosis we observe cell death, our data do suggest that it is reasonable to conclude that mutant spermatocytes (ARID1A-) undergo cell death at pachynema given their inability to execute MSCI, which is a well-established phenotype.

      Comment: Staining of chromosome spreads with Arid1a antibody showed localization at the sex chromosomes by diplonema; however, analysis of gene expression in Arid1a KO was performed on pachytene spermatocytes. Therefore, is not very clear how the chromatin remodeling activity of Arid1a in diplonema is affecting gene expression of a previous stage. CUTnRUN showed that ARID1a is present at the sex chromatin in earlier stages, leading to hypothesize that immunofluorescence with ARID1a antibody might not reflect ARID1a real localization.

      Response: It is unclear what the reviewer means about not understanding how ARID1A activity at diplonema affects gene expression at earlier stages. Our interpretations were not based solely on the observation of ARID1A associations with the XY body at diplonema. In fact, mRNA expression and CUT&RUN analyses were performed on pachytene-enriched populations. ARID1A's association with the XY body is not exclusive to diplonema. Based on both CUT&RUN and IF data, ARID1A associates with XY chromatin as early as pachynema. Only at late diplonema did we observe ARID1A hyperaccumulation on the XY body by IF.

      Reviewer #2 (Public Review):

      Comment: The inefficient deletion of ARID1A in this mouse model does not allow any detailed analysis in a quantitative manner.

      Response: As explained in our response to these comments in the first revision, we respectfully disagree with this reviewer’s conclusions. We have been quantitative by co-staining for ARID1A, ensuring that we can score mutant pachytene spermatocytes from escapers. Additionally, we provide data to show the efficiency of ARID1A loss in the purified pachytene populations sampled in our genomic assays.

      Reviewer #3 (Public Review):

      Comment: The data demonstrate that the mutant cells fail to progress past pachytene, although it is unclear whether this specifically reflects pachytene arrest, as accumulation in other stages of Prophase also is suggested by the data in Table 1. The western blot showing ARID1A expression in WT vs. cKO spermatocytes (Fig. S2) is supportive of the cKO model but raises some questions. The blot shows many bands that are at lower intensity in the cKO, at MWs from 100-250kDa. The text and accompanying figure legend have limited information. Are the various bands with reduced expression different isoforms of ARID1A, or something else? What is the loading control 'NCL'? How was quantification done given the variation in signal across a large range of MWs?

      Response: The loading control is Nucleolin. With respect to the other bands in the range of 100-250 kDa, it is difficult to say whether they represent ARID1A isoforms. The Uniprot entry for Mouse ARID1A only indicates a large mol. wt sequence of ~242 kDa; therefore, the band corresponding to that size was quantified. There is no evidence to suggest that lower molecular weight isoforms may be translated. Although speculative, it is possible that the lower molecular weight bands represent proteolytic/proteasomal degradation products or products of antibody non-specificity. These points are addressed in the revised manuscript (Legend to Fig S2, lines 926-931). Blots were scanned on a LI-COR Odyssey CLx imager and viewed and quantified using Image Studio Version 5.2.5 (Methods, lines 640-642).

      Comment: An additional weakness relates to how the authors describe the relationship between ARID1A and DNA damage response (DDR) signaling. The authors don't see defects in a few DDR markers in ARID1A CKO cells (including a low-resolution assessment of ATR), suggesting that ARID1A may not be required for meiotic DDR signaling. However, as previously noted the data do not rule out the possibility that ARID1A is downstream of DDR signaling and the authors even indicate that "it is reasonable to hypothesize that DDR signaling might recruit BAF-A to the sex chromosomes (lines 509-510)." It therefore is difficult to understand why the authors continue to state that "...the mechanisms underlying ARID1A-mediated repression of the sex-linked transcription are mutually exclusive to DDR pathways regulating sex body formation" (p. 8) and that "BAF-A-mediated transcriptional repression of the sex chromosomes occurs independently of DDR signaling" (p. 16). The data provided do not justify these conclusions, as a role for DDR signaling upstream of ARID1A would mean that these mechanisms are not mutually exclusive or independent of one another.

      Response: The reviewer’s argument is reasonable, and we have made the recommended changes (Results, lines 212-215; Discussion, lines 499-500).

      Comment: A final comment relates to the impacts of ARID1A loss on DMC1 focus formation and the interesting observation of reduced sex chromosome association by DMC1. The authors additionally assess the related recombinase RAD51 and suggest that it is unaffected by ARID1A loss. However, only a single image of RAD51 staining in the cKO is provided (Fig. S11) and there are no associated quantitative data provided. The data are suggestive but it would be appropriate to add a qualifier to the conclusion regarding RAD51 in the discussion which states that "...loss of ARID1a decreases DMC1 foci on the XY chromosomes without affecting RAD51" given that the provided RAD51 data are not rigorous. In the long-term it also would be interesting to quantitatively examine DMC1 and RAD51 focus formation on autosomes as well.

      Response: We agree with the reviewer’s comment and have made the recommended changes (Discussion, lines 518-519).

      Response to non-public recommendations

      Reviewer 2:

      Comment: Meiotic arrest is usually judged based on testicular phenotypes. If mutant testes do not have any haploid spermatids, we can conclude that meiotic arrest is a phenotype. In this case, mutant testes have haploid spermatids and are fertile. The authors cannot conclude meiotic arrest. The mutant cells appear to undergo cell death in the pachytene stage, but the authors cannot say "meiotic arrest."

      Response: We disagree with this comment. By IF, we see that ~70% of the spermatocytes have deleted ARID1A. Furthermore, we never observed diplotene spermatocytes that lacked ARID1A. The conclusion that the absence of ARID1A results in a pachynema arrest and that the escapers produce the haploid spermatids is firm.

      Comment: Fig. S2 and S3 have wrong figure legends.

      Response: The figure legends for Fig. S2 and S3 are correct.

      Comment: The authors do not appear to evaluate independent mice for scoring (the result is about 74% deletion above, Table S1). Sup S2: how many independent mice did the authors examine?

      Response:These were Sta-Put purified fractions obtained from 14-15 WT and mutant mice. It is difficult to isolate pachytene spermatocytes by Sta-Put at the required purity in sufficient yields using one mouse at a time. We used three technical replicates to quantify the band intensity, and the error bars represent the standard error of the mean (S.E.M) of the band intensity.

      Comment: Comparison of cKO and wild-type littermate yielded nearly identical results (Avg total conc WT = 32.65 M/m; Avg total conc cKO = 32.06 M/ml)". This sounds like a negative result (i.e., no difference between WT and cKO).

      Response: This is correct. There is no difference between Arid1aWT and Arid1aCKO sperm production. This is because wild-type haploid gametes produced were derived from spermatocyte precursors that have escaped Cre-mediated activity (Fig. S4). These data merely serve to highlight an inherent caveat of our conditional knockout model and are not intended to support the main conclusion that ARID1A is necessary for pachytene progression.

      Comment: The authors now admit ~ 70 % efficiency in deletion, and the authors did not show the purity of these samples. If the purity of pachytene spermatocytes is ~ 80%, the real proportion of mutant cells can be ~ 56%. It is very difficult to interpret the data.

      Response: The original submission did refer to inefficient Cre-induced recombination. The reviewer asked for the % efficiency, which was provided in the revised version. Also, please refer to Fig. S2, where Western blot analysis demonstrates a significant loss of ARID1A protein levels in CKO relative to WT pachytene spermatocyte populations that were used for CUT&RUN data generation.

      Comment: The authors should not use the other study to justify their own data. The H3.3 ChIP-seq data in the NAR paper detected clear peaks on autosomes. However, in this study, as shown in Fig. S7A, the authors detected only 4 peaks on autosomes based on MACS2 peak calling. This must be a failed experiment. Also, S7A appears to have labeling errors.

      Response: I believe the reviewer is referring to supplementary figure 8A. Here, it is not clear which labeling errors the reviewer is referring to. In the wild type, the identified peaks were overwhelmingly sex-linked intergenic sites. This is consistent with the fact that H3.3 is hyper-accumulated on the sex chromosomes at pachynema.

      The authors of the NAR paper did not perform a peak-calling analysis using MACS2 or any other peak-calling algorithm. They merely compared the coverage of H3.3 relative to input. Therefore, it is not clear on what basis the reviewer says that the NAR paper identified autosomal peaks. Their H3.3 signal appears widely distributed over a 6 kb window centered at the TSS of autosomal genes, which, compared to input, appears enriched. Our data clearly demonstrates a less noisy and narrower window of H3.3 enrichment at autosomal TSSs in WT pachytene spermatocytes, albeit at levels lower than that seen in CKO pachytene spermatocytes (Fig S8B and see data copied below for each individual replicate). Moreover, the lack of peaks does not mean that there was an absence of H3.3 at these autosomal TSSs (Supp. Fig. S8B). Therefore, we disagree with the reviewer’s comment that the H3.3 CUT&RUN was a failed experiment.

      Author response image 1.

      H3.3 Occupancy at genes mis-regulated in the absence of ARID1A

      Comment: If the author wishes to study the function of ARID2 in spermatogenesis, they may need to try other cre-lines to have more robust phenotypes, and all analyses must be redone using a mouse model with efficient deletion of ARID2.

      Response: As noted, we chose Stra8-Cre to conditionally knockout Arid1a because ARID1A is haploinsufficient during embryonic development. The lack of Cre expression in the maternal germline allows for transmission of the floxed allele, allowing for the experiments to progress.

      Comment: The inefficient deletion of ARID1A in this mouse model does not allow any detailed analysis in a quantitative manner.

      Response: In many experiments, we have been quantitative when possible by co-staining for ARID1A, ensuring that we can score mutant pachytene spermatocytes from escapers. Additionally, we provide data to show the efficiency of ARID1A loss in the purified pachytene populations sampled in our genomic assays.

      Reviewer 3:

      Comment: The Methods section refers to antibodies as being in Supplementary Table 3, but the table is labeled as Supplementary Table 2.

      Response: This has been corrected

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)): __ Summary In this manuscript the authors address the largely unexplored role of micro RNAs (miRNAS) in Drosophila melanogaster brain development, in particular in neural stem cell lineages. The authors for the first time adapt the Ago protein Affinity Purification by Peptides (AGO-APP) technology for Drosophila. They show that this technique works efficiently in neural stem cell lineages and identify several cell type specific active miRNAs. Through a series of bioinformatic analysis the authors identify candidate mRNA targets for these miRNAs. The authors then functionally analyse the role of some of the identified miRNAs, focusing on miRNAs significantly over-represented in neuroblasts.

      By overexpressing Mir-1, the authors demonstrate that this miRNA effectively targets the UTR of Prospero, resulting in the overproliferation of neuroblasts. In a parallel experiment, overexpression of Mir-9c causes neuroblast differentiation defects, similar to the phenotype caused by nerfin-1 mutants, a previously validated target. Loss of function analyses show that knock down of single miRNAs has little functional effects in neuroblast size, showing that the individual effect caused by miRNAs knock down is likely compensated. In contrast, a sponge against a selected group of miRNAs leads to a reduction in poxn positive neuroblasts. Overall these results validate the approach and support the theory that miRNAs cooperate in functional modules during stem cell differentiation.

      We thank Reviewer 1 for its overall positive review. We are grateful for the useful suggestions and we believe the additional experiments we have performed and added strongly improve the quality of the study and will hopefully satisfy the reviewer's concerns.

      Comments

      Title: As the authors do not really explore exit from neural stem cell state this should be altered. The authors do not assess for the levels of any temporal genes, nor other markers of neural stem cell state exit (e.g. nuclear Pros).

      We now have further evidence that the identified microRNA module preserves neuroblasts, in particular in the optic lobe. We have modified the title accordingly: "In vivo AGO-APP identifies a module of microRNAs cooperatively preserving neural progenitors"

      The observed effects, with the available experiments, rather say that neural stem cell state is not maintained in general, not being clear what mechanistically happens to these cells expressing Cluster 2 sponges. The described phenotype caused by the expression of sponges against individual miRNAs also rather shows a blockage in differentiation.

      -The miRNAs analysed were found in Ago-APP to be predominantly active in neuroblasts, but was there any phenotypes of OE or KD in neurons or glial cells?

      Since the analyzed miRNAs were either not or poorly expressed in neurons or glia overall, it seemed less essential to investigate potential phenotypes in these cells. However, we did mis-expressed miR-cluster1sponge and miR-cluster2sponge in neurons and in glial cells (using elav-GAL4 and Repo-GAL4, respectively) throughout development, and did not observe any major impact on viability. All pupae were able to hatch.

      In addition, we show now that mis-expression of the miR-cluster2sponge (that induces strong phenotypes in neuroblasts) specifically in the wing pouch throughout development did not lead to any phenotype in the adult (e.g. wing size (tissue growth), patterning defects (cell differentiation)) (Fig6K,L). Importantly, this experiment rules out unspecific effects of the sponge construct on cell fitness, and highlight the tissue-specificity of the phenotype.

      • The authors obtained a phenotype when using a sponge against Cluster 2 in poxn neuroblasts. Is this specific for these 6 neuroblasts? What happens if this sponge is expressed with a pan-neuroblast driver in central brain/VNC/optic lobe? These experiments should be included as they would show if these are conserved effects for all neuroblasts.

      We already showed in Fig.4B of the first version of the manuscript (using a flip-out approach in clones) that miR-cluster1sponge or miR-cluster2sponge expression leads to an overall reduction in the neuroblast size in the VNC and CB.

      We have now added four more experiments, all suggesting that these sponges specifically affect type I neuroblasts:

      • using the pan-neuroblast driver nab-GAL4, we show that neuroblasts in the VNC and CB expressing these sponges are significantly smaller in late L3. Also, their number is reduced, indicated that some neuroblasts are eliminated (Fig.4C-G).
      • Using pox-GAL4 (already in first version) and eagle-GAL4, we show that different subset of type I neuroblasts in the VNC exhibit different sensitivities to the sponges (from light/medium - neuroblast shrinkage, to high - neuroblast elimination) (Fig.4H-J, S6C-E)
      • using the dpnOL-GAL4 driver, that is specific and strongly active in medulla neuroblasts in the optic lobe, we demonstrate that both, miR-cluster1sponge and miR-cluster2sponge, induce neuroblast shrinking. In addition, we find that the width of the medulla neuroblast stripe is strongly reduced when using the miR-cluster2sponge, providing further evidence for precocious neuroblast elimination (6C,D). Importantly, this leads to a smaller medulla in late L3 (Fig 6F), implying that in these conditions, medulla neuroblasts produce fewer neuronal progeny. Because medulla neuroblasts generate GMCs that undergo a single division, they are also considered as type I neuroblasts
      • using a worniu-GAL4, ase-GAL80 driver, that is specifically active in type II neuroblasts, we show that expression of miR-cluster1sponge and miR-cluster2sponge does not affect neuroblast size and the number of intermediate progenitors (Fig 6H-J). Together, these additional experiments in different types of neuroblasts and in non-neural tissue (the wing pouch, see above) demonstrate a type I neuroblast-specific effect. Our new results also imply that the microRNA module is active in most, if not all type I neuroblasts. In contrast, it is not present or not affecting differentiation genes in type II neuroblasts. Importantly, in Type II lineages, intermediate progenitors produced by neuroblasts undergo themselves a few rounds of divisions before differentiating, unlike GMCs that give rise to two differentiated progeny after a single division. Therefore, the dynamics of differentiation is different in the two lineages, involving a distinct sequential expression of differentiation factors, and possibly different miRNAs.

      The authors do different analyses in different brain regions, making also a hard to conclude if all brain regions behave the same way. As authors show that some miRNAs are only expressed in sub-sets of cells, this becomes particularly relevant.

      The new set of experiments in different types of type I neuroblasts and in type II neuroblasts, presented above, addresses the points on the specificity of the microRNA module.

      Could sponge of cluster 1 cause a phenotype if it had been expressed in other neuroblast lineages?

      Yes, it can. See our new experiments discussed above.

      __ __In addition, a discussion of the results obtained from sponge 1 should be included and put in context with miRNA function, technical limitations, levels/cell, targets, pitfalls of analyses, sponges, etc.

      We have more carefully acknowledge that sponge mediated knock-down is not very efficient and dose-dependent. We also clarified that other approaches will be required in the future to rigorously assess the specificity of each miRNA/mRNA interaction as well as their cooperativity.

      For example: "In contrast to genetic miR-1KO (Fig. 3O), we found that sponge mediated knock-down of this miRNA, or of other individual miRNAs in the module, had never a significant effect on neuroblast size (Fig. 4B), likely because the inhibition induced by sponges is incomplete. However, expression of either multi-sponge 1 or multi-sponge 2 significantly reduced neuroblast size in a dose dependent manner - two copies of the transgene exacerbate the phenotype (Fig. 4B)."

      We also state at the end of the discussion: "In the future, the combination of Ago-APP with complementary genetic strategies will be required to rigorously assess the specificity of each miRNA/mRNA interaction as well as their cooperativity."

      It would also be interesting to further explore the phenotypes caused by Mir-1 sp expression - are there any milder lineage defects?

      We observed an increase in Prospero expression and a decrease of the neuroblast size in miR-1null mutant neuroblast clones (Fig.3L-O). These phenotypes are not observed when miR-1sponge is mis-expressed. This is probably due to the fact that miR-1sponge expression leads to only a partial knock -down of miR-1. Moreover, we have added data about the expression of miR-1sponge in medulla neuroblasts in the optic lobe, showing an absence of obvious phenotype when assessing neuroblast size and neuroblast maintenance. This contrasts with expression of miR-cluster1sponge and miR-cluster2sponge (Fig. 4F,G). This new data is in line with our hypothesis that the knockdown of miRNAs of a common module synergize/cooperate to produce the phenotype expected from the deregulation of their common target mRNAs.

      Any defects in other brain regions/lineages, like in type 2 neuroblasts that usually do not express Pros?

      As suggested by the reviewer, and discussed above, we tested expression of miR-cluster1sponge and miR-cluster2sponge in type-II neuroblasts using the worniu-GAL4, asense-GAL80 driver (Neumüller et al., 2011). Interestingly, in contrast to type I neuroblasts in VNC, CB and OL regions, we did not observe neuroblast shrinking or changes in INP numbers. This suggests that either the self-renewing state is more robust in Type II than in Type I neuroblasts, or that that the uncovered miRNA module is more specific to type I neuroblasts than to type II. We have added and discussed these important data in Fig 6H-J in the revised version.

      Ago-APP identifies cell type specific miRNAs in larval neurogenesis section: - "...29oC... allows Gal4-dependent expression (Fig.1B,C)" - this description of Gal80ts/Gal4 works is not correct, expression is not prevented.

      Gal80 directly binds to Gal4 carboxy terminus and prevents Gal4-mediated transcriptional activation.

      We have tried to clarify this point in the revised version.

      "Thus, when x-GAL4, tub-GAL80ts, UAS-T6B animals are maintained at 18{degree sign}C (restrictive temperature), GAL80 binds to Gal4 and inhibits its activity. *Switching to 29{degree sign}C (permissive temperature) for 24 hours inactivates GAL80, allowing for GAL4-mediated transcriptional activation of UAS-T6B" *

      • Fig S1 - nab-Gal4 also drives expression in GMCs and neurons, rephrase text. Is nab-Gal4 expressed in optic lobe, VNC and central brain neuroblasts?

      nab-GAL4 drives UAS-T6B expression in neuroblasts (in the VNC and in the CB), but also at lower levels in the medulla neuroblasts of the OL.

      We now describe this expression more precisely in the text and in Fig.S1C:

      "nab-GAL4 was used for T6B expression in all neuroblasts. However, because GAL4 is inherited by neuroblast progeny, T6B will also be present in GMCs and a few immature neurons (Fig.S1A,C)24. Of note, nab-GAL4 is highly expressed in the neuroblasts of the ventral nerve cord (VNC) and of the central brain (CB), and weaker in the neuroblasts of the optic lobe (OL) (Fig. S1C)".

      • "20 late larval CNS" - mention the exact stage

      We mention now the precise stage: the wandering stage.

      • Providing a more detailed and interpretive description of Figures 1D and 1E would greatly enhance their clarity. Currently, the descriptions of these pannels resemble typical figure legends.

      We now provide a more detailed description of the data, emphasizing that they are consistent with previous studies on specific miRNAs.

      • Fig. 1F,G,H - It is not clear why the authors sometimes use the optic lobe, other ventral nerve cord as both regions have both neuroblasts, neurons and glia. Are the drivers used for Ago-APP not expressed in all brain regions?

      We now document the activity of the GAL4 drivers used for AGO-APP throughout the entire larval central nervous system in Fig.S1B-D. We also show images of the entire larval central nervous system for the different reporter lines (Fig S1E-K) and focus on regions of interest in the main Fig 1F-M with quantitative measurement of reporter gene expression.

      • Show "data not shown" for 1H.

      It is now shown in Fig. 1M'.

      • Fig. 1F, G, H - Please quantify intensity levels in the different cell types to facilitate comparison with Ago-APP graphs. Include in figure legend what is "cpm".

      Quantification of intensity levels is now represented in Fig. 1F,I and L. Cpm means "counts per millions". We added this in the figure legend.

      A regulatory module controlling neuroblast-to-neuron transition section: - Fig. 2C - A more detailed explanation in text is required in addition to what is mentioned in the figure legend. Including a brief summary/conclusion of the results would be helpful. If possible, add in X-axis 1, 2, 3.

      We clarified this point in the text:

      "We used the Targetscan algorithm1 to determine the predicted target genes of each neuroblast-enriched miRNA. Next, we investigated the correlation between the identified miRNAs and the presence of their targets, based on independently generated mRNA expression data44.

      *This analysis showed that neuroblast-enriched miRNAs predominantly target mRNAs that are normally highly expressed in neurons (Fig. 2C), consistent with a differentiation inhibiting function." *

      • Figure S2B - as mentioned in the text elav is expressed from the neuroblast, although this is not represented in the figure.

      I In this scheme, we depict the expression of proteins, not the presence of mRNAs. elav mRNA is indeed present at low levels in neuroblasts but the protein is absent from both neuroblasts and GMCs (as shown by all our immunostainings against Elav). This fact strongly suggests post-transcriptional repression of elav mRNA (possibly by miRNAs). This likely explains why the elav-GAL4 is also active in neuroblasts. It also suggests some post-transcriptional mechanisms to silence elav in the neuroblasts/GMCs (miRNAs?)

      It is hard to tell what are young vs maturing neurons in the cartoon, pls add a label/legend.

      We added new labels in Fig S2B to uncouple neuronal maturation from temporal identity. We hope it is clearer now.

      • Fig.3I - please shown a control brain. The merge images are not easy to see. I think it would be nicer to change the figures to be color-blind friendly.

      We added the control brain in Fig 3I for VNC clones, and Fig S3A for OL clones.

      We also changed all the figures to be color-blind friendly.

      • Fig. 3K,L - why is this now done in the VNC?

      We now focus on the VNC in the main Figure 3 (Fig.3I,J,K,L,N), and show similar phenotypes in the OL in the Supplemental Figure S3 (Fig.S3A-C).

      • Are there any lineage defects when Mir-1 sp is expressed?

      See previous comment on miR-1sponge.

      • Based on which parameters/variables of the predicted targets was the Hierarchical clustering done? A brief explanation would help the interpretation of the results and of the choice of the clusters that were further analysed.

      Hierarchical clustering is now explained in the "Bioinformatics analysis" section of the Material & Methods section with an additional matrix available in Table S1.

      • "revealed the presence of three main groups" - this should be rephrased as this "grouping" was done arbitrarily by the authors and not by hclust. Hclust is set to merge individual clusters/sub-trees up to 1. Furthermore, a more detailed explanation that supported this decision of choosing this 3 large clusters should be included.

      See previous question.

      • Fig. 4B, S4B - please include in legend how were these clones generated. S4B - scale bars missing.

      We included the missing information and added the missing scale bars.

      • Fig. 4H - was the ratio of UAS/Gal4 kept in both experimental conditions? Increasing the number of UAS/Gal4 leads to weaker expression of UAS and thus could lead to a weaker phenotype. Including in legends genotype details would help.

      This is a very good point as the number of copies of the UAS and/or GAL4 can influence transgene expression and consequently the phenotype observed. We indeed kept the ratio of UAS/GAL4 in both experimental conditions. The exact genotypes for the experiments are:

      Hs-FLP/+; act>stop>Gal4, UAS-GFP/+; UAS-RFP/UAS-miR-1

      Hs-FLP/+; act>stop>Gal4, UAS-GFP/UAS-cluster2sp; UAS-miR-1/+.

      To address this important issue in the manuscript, we added a table (Table S3) listing the precise genotypes for each experiment.

      Minor - Abstract: "a defined group of miRNAs that are predicted to redundantly target all..." This is only predicted, not experimentally shown, this should be modified accordingly.

      Although the request here is not clear to us, we made a few minor changes to the abstract that we hope will satisfy the reviewer.

      • Intro: "Elav, an RNA binding protein, is expressed as soon as post-mitotic neurons..." - Elav is expressed already in neuroblasts, as also mentioned by the authors in the result section. Correct, add references.

      elav is indeed already transcribed in neuroblasts and GMCs. However, the protein is absent in the two cell types (as shown by all our immunostainings), and only present in neurons. Thus, there is a level of post-transcriptional regulation that prevents elav mRNA translation in neuroblasts and GMCs (likely at least partly mediated by miRNAs). This also explains why in elav-GAL4; UAS-T6B brains T6B is expressed in neuroblasts and GMCs, as the GAL4 mRNA transgene is not submitted to the same post-transcriptional regulation.

      • Last paragraph of Intro (Bioinformatic analyses...) - it is not easy to understand the content of this paragraph. Rewrite to improve clarity.

      The paragraph has been rewritten for more clarity with the addition of Table S1

      • All legends: Please mention which developmental stage is being analysed in each panel (i.e. wandering 3IL, hours After Larval Hatching, hours After Puparium Formation, or other), in which brain region the analyses/images are being done.

      The CNS regions are now systematically annotated in the figures. All experiments have been done in wandering L3 (except for the new Fig.6 K,L, where the experiment is done in the adult wing). We now systematically mention in the text and legend the developmental stage at which the experiment is performed.

      Please include more detailed information about the genetics in figure legends.

      We added Table S3 that describes the exact genotype of all crosses done in this study.

      • Please include brief explanation of the genetics of miR-10KOGal4 line.

      This is now also explained in the new Table S3.

      • Why are miRNAs sometimes referred as (e.g.) "miR-1" and others "miR-1-3p"?

      The miRNA found enriched (and thus active) in the neuroblast is the miR-1-3p strand. The UAS-miR-1-sponge has been designed to be complementary to the miR-1-3p strand, and is then referred as miR-1-3psp in the text and figure legend. The miR-1 null clones have been made using the miR-1KO allele, which inactivates the entire locus and therefore both, the miR-1-3p and miR-1-5p strands. This is referred to as miR-1KO or miR-1 in the text. Finally, constructions used to mis-expressed miR-1 and other miRNAs are made with the pre-miRNA, meaning that both strands of the miRNA are mis-expressed. This is then referred as miR-1 in the text.

      • Fig. 3I-M - stage of the animal? 3M - in which brain region is this?

      We have systematically mentioned the brain region on panels on all figures.

      • Fig. 3N - can actual sizes be additionally shown, or at least averages mentioned in text?

      Average sizes are indicated in the legend of new Fig. 4F.

      • If non differentially expressed miRNAs, or miRNA with other expression patterns, had been analysed to determine their targets in the sub-set of genes expressed in neuroblasts (from the transcriptome) would different targets been found? Meaning, how specific are these binding patterns for the selected miRNA?

      This is an interesting and important point. To answer, we added a new analysis (Fig.S2C), where the total number of target sites in the 3'UTR of the pro-differentiation/temporal network genes are shown for different categories of miRNAs: neuroblast-enriched miRNAs (analysed in this study), neuron-enriched miRNAs, glia-enriched miRNAs, and random miRNAs not expressed in the brain. This analysis shows that neuroblast-enriched miRNAs exhibits a higher level of promiscuity with the iconic pro-differentiation/temporal genes than other identified or random miRNAs, arguing for functional relevance.

      **Referees cross-commenting**

      *think this study is very interesting as it optimizes a novel technique in Drosophila for the investigation of cell-specific active miRNAs, and it globally addresses the role of miRNAs in neural stem cell lineages. Although the authors do not explore deeply the biological effect of these miRNAs in neural lineages, I think that the technical contribution and the identification of some miRNA targets is relevant on its own. The authors use Prospero as an example, which is very interesting, as this gene is required to be lowly expressed in Neuroblasts and then upregulated during differentiation. Which the authors propose can be regulated by miRNAs, identifying a novel player in this differentiation mechanism. I do not feel the authors need to perform additional experiments to corroborate their findings, as they are well supported by the experiments presented. I do agree that the authors did not explore deeply the biological effect in neural lineages, and the claims regarding premature terminal differentiation, nerfin, etc need to be toned down accordingly.

      * Reviewer #1 (Significance (Required)):

      This study is both a technical and conceptual advance. It is very interesting as it optimizes a novel technique in Drosophila for the investigation of cell-specific active miRNAs, and it globally addresses the role of miRNAs in neural stem cell lineages. However, the text, especially in the results section, could benefit from increased detail to enhance the comprehension of the experiments, results, and conclusions. Given that the functional analyses were not conducted at a very detailed level, there exist certain instances of over-interpretation, which could be easily addressed either by revising the text or by incorporating additional experiments, as elaborated upon below. This manuscript will be interesting for research fields interested in stem cell differentiation, brain development, micro RNAs, both for Drosophilists and scientists working with other animal models. I am an expert in Drosophila brain development.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __ Summary MicroRNAs (miRNAs) have a well-established role in fine-tuning gene expression. Because the mechanisms by which miRNAs recognize specific target transcripts are poorly understood, their functionally relevant targets in the physiological context are mostly poorly defined. Studies in vertebrates have suggested that miRNAs play a prominent role in regulating cell type specification during brain development. Insight into miRNA regulation of target selection will improve our understanding of neural development. Cell type-specific gene expression patterns and functions in the neural stem cell (neuroblast) lineage in the fly larval brain are well characterized. The fly genome is compact, and gene redundancy including miRNAs is significantly less than vertebrates. For these reasons, the authors chose to investigate how miRNAs regulate cell-state transitions by first establishing a comprehensive miRNA expression profile for major cell types in the fly larval brain. They combined the AGO-APP strategy and the GAL4-UAS inducible expression system to pull-down cell type-specific miRNAs from fly larval brain. The authors focused on miRNAs that are enriched in neuroblasts and examine how multi-miRNA modules regulate the maintenance of an undifferentiated state in neuroblasts. The cell type-specific inducible AGO-APP system introduced in this study is innovative and allows for systematic identification of miRNAs that most standard RNA-sequencing techniques missed in previously published datasets. The technological note sets high promise for this study, but the findings appear tame. It is my opinion that there are a number of shortcomings that can improve the rigor of this study. For example, strategies used to determine spatial expression patterns of miRNAs as well as to validate miRNA target genes are indirect with high likelihood of caveats. The choices of candidate target genes to assess the function of miRNAs in the cell state transition appear counterintuitive.

      We thank the reviewer for qualifying our study as "technologically excellent" and for emphasizing the "innovative character of AGO-APP" and the potential of such studies to "be hugely significant to the general audience".

      We are aware that there could be ways to more rigorously and systematically investigate the interactions between miRNAs and their targets and assess their cooperativity. Beyond in vitro luciferase assays (an approach we have used in this study), this would ideally involve multiple new transgenic assays, with point mutations in various miRNA sites in the 3'UTR of predicted target genes as proposed by Reviewer 2. Also, measuring the direct effect of miRNA knockdown on its target is notoriously difficult as it can be modest (and only be revealed through the cooperative action with other miRNAs, as proposed in this study), and sometimes not detected by measuring mRNA levels (e.g. by transcriptomic approaches or FISH).

      One of our aims in the future is to develop such non-trivial approaches, which will take a considerable amount of time and work. At this stage we believe that it would go beyond the scope of the present study which aims at illustrating how introducing a new technology for miRNA isolation (AGO-APP) can help to reassess important questions on miRNA biology and function (e.g. miRNA cooperation within in the context of developmental transitions). We discuss this point now in the last paragraph of the discussion in the revised version.

      Our unbiased AGO-APP results reveal a group of neuroblast enriched miRNAs that are predicted to target multiple times pro-differentiation genes (prospero, elav, nerfin-1, brat) while not targeting stemness genes such as miranda, worniu, inscuteable, deadpan, grainyhead. Mutation in pro-differentiation genes are known to either promote neuroblast tumors (prospero, nerfin-1, brat ) (https://doi.org/10.1016/j.cell.2006.01.03; 10.1101/gad.250282.114) or perturb neuronal differentiation (elav) (https://doi.org/10.1002/neu.480240604). On the other hand, mis-expression of these genes in neuroblasts often promotes shrinkage, precocious differentiation and /or cell cycle-exit (10.1016/j.cell.2008.03.034 ; 10.7554/eLife.03363 ; 10.1101/gad.250282.114). Therefore, bioinformatic prediction and previous studies made it likely that GOF of the neuroblast-enriched miRNAs would lead to neuroblast expansion or differentiation defects, and that LOF would lead to neuroblast shrinkage, cell cycle exit or differentiation. All these predictions are experimentally validated in our study. To reinforce our data, we have performed a number of additional experiments that are described below.

      Furthermore, the authors provided no rationale as to why they chose cell types that are not in the brain (such as wing cells and cells in the optic lobe) to assess the phenotypic effect of manipulating miRNAs.

      All our analysis were done either in the different types of neuroblasts found in the central nervous system (CNS) composed of the ventral nerve cord (VNC) (equivalent to vertebrate spinal cord) and brain (comprising the central brain (CB) and the optic lobes (OL) (10.1016/j.neuron.2013.12.017) - not to be confused with eye imaginal discs that produce the retina but do not contain neuroblasts. We tested the role of the neuroblast-enriched miRNAs in all neuroblasts of the CNS based on the pan-neuroblast activity of the nab-GAL4 driver used for the AGO-APP experiment. We then focused on different types of neuroblasts using lineage specific GAL4 drivers (poxn-GAL4, eagle-GAL4, dpnOL-GAL4, type II-GAL4). This is shown in the entirely revisited last paragraph of the results (Fig 4, 5, 6, S6 and S7). These experiments demonstrate that sponges simultaneously targeting several miRNAs of the module only affect type I neuroblasts but not type II neuroblasts.

      To investigate whether miR-1 directly regulates prospero mRNA in vivo, we used a tissue where prospero is not normally expressed (the wing pouch of the wing imaginal disc in late l3 larvae), allowing us to test how over-expressing miR-1 post-transcriptionally affects versions of prospero mRNAs that either possess or not its endogenous 3'UTR. The obtained results are consistent with in vitro luciferase assays, and miR-1 gain-of function in neuroblasts and GMCs, supporting the hypothesis that prospero mRNA is a direct target of miR-1 via its 3'UTR. We have clarified these points in the revised version of the manuscript.

      Using solely a reduced cell size as the functional readout for "precocious differentiation" is not rigorous and should be complemented with additional measures.

      Reduced neuroblast size always precedes neuroblast differentiation and has been widely used as functional readout of precocious differentiation (this is more clearly emphasized and referenced in the revised version). We have now also observed this phenotype in the neuroblasts of the optic lobe (Fig 6), together with precocious "plunging" of old neuroblasts in the deep layer of the medulla (Fig S7G), another sign of differentiation. These experiments show that the shrinkage phenotype is robust to all type I neuroblasts (medulla neuroblasts of the optic lobe can also be considered as type I neuroblasts because they generate GMCs that undergo a single division).

      Moreover, opposite to precocious differentiation induced by the simultaneous knockdown of multiple miRNAs of the neuroblast module, we now show that mis-expression of many of the miRNAs of the module prevents proper neuronal differentiation (miR-1, miR-9, miR-92a, miR-8) (Fig S5). Taken together, these experiments strongly suggest that the miRNAs of the module have the ability to block neuronal differentiation and that they represent a functional module in type I neuroblasts.

      Major concern: 1. The authors should use a direct method to confirm the expression pattern of identified miRNAs such as miRNA scope (ACD) in the whole mount brain instead of indirect methods such as reporters.

      Such techniques are not trivial and do not represent a standard in Drosophila. Instead, the reporter genes we have used in our study have been already validated in other studies to reflect the expression of particular miRNAs in different tissues. We thus have taken advantages of these available lines to correlate expression patterns as reflected by transgenics with our AGO-APP experiment. All reporter lines tested quantitatively support the AGO-APP data as now shown in the revised Fig 1F,I,L.

      The entire figure 3 aims to provide evidence to support that prospero mRNA is a direct target of miR-1-3p. These convoluted experiments with significant caveats should be replaced with mutating the endogenous miR-1-3p binding sites in the 3'UTR of the prospero reading frame, and demonstrate that the endogenous prospero transcript level is increased by sm-FISH. The authors could also use this novel allele to assess the phenotypic effect of "unregulated prospero" in the larval brain.

      It would indeed be an interesting experiment to perform to show that miR-1 directly regulates pros RNA in vivo. However, our miR-1 mutant clones suggests that miR-1 on its own has only a small contribution to prospero mRNA regulation during the neuroblast-to-neuron transition. This could be due to the low physiological levels of miR-1-3p in neuroblasts and to the fact that several miRNAs of the module may act partly redundantly and collaboratively to maintain the correct level of prospero mRNA. Thus, in this case, it is well possible that changes in the endogenous prospero mRNA transcript may not be significant and detected by smFISH, unless more miRNA sites are mutated. Such an experiment would involve the generation of several new transgenic lines using the CRISPR technology, which represents a long-term project.

      Again, these approaches are powerful and we agree that they would represent a more rigorous assessment of miRNA cooperation. But we feel that it goes beyond the scope of this article, as mentioned above.

      The effect of overexpressing mir-1 on the prospero transgene with its 3'UTR vs without 3'UTR cannot easily compared since the UTR might be regulated by other regulatory mechanisms in addition to mir-1.

      To minimize the potential effect of other regulators, we only compare conditions where the only difference is the presence or absence of miR-1. We do not directly compare levels of Prospero with its 3'UTR vs without 3'UTR. However, there is indeed still the possibility that miR-1 overexpression would change the expression of a protein that regulates prospero mRNA via its 3'UTR.

      Considering this we have tuned-down our conclusion concerning this part in the revised version of the manuscript and now used the sentence:

      "These experiments performed in two different cellular contexts strongly suggest that prospero mRNA is a direct target of miR-1-3p."

      How could the author use evidence-based strategy to demonstrate that massive amplification of Mira-expressing cells induced by overexpressing mir-1 in the optic lobe is indeed due to mis-regulation of prospero instead of mimicking the prospero-mutant phenotype?

      First, we noted that miR-1 overexpression in neuroblast clones causes neuroblast amplification in all regions of the CNS (not only in the optic lobe) at the expense of neuronal differentiation. This is now shown in Fig 3 and S3.

      Second, multiple chemical or genome-wide RNAi screens have been performed (Gould lab, Chia lab, Knoblich lab, etc) to identify genes whose downregulation causes efficient neuroblast amplification (10.1186/1471-2156-7-33 ; 10.1016/j.stem.2011.02.022). In VNC type I neuroblasts, only inactivation of prospero or miranda can lead to efficient neuroblast amplification in late larvae, generating tumour-like structures devoid of neurons. We find that while Miranda is highly expressed in neuroblast clones overexpressing miR-1 (Fig 3J), Prospero is completely absent, suggesting that it is efficiently silenced by miR-1 overexpression, and therefore responsible for the observed phenotype. This new result is now added in Fig.S3D. It is very unlikely that the down-regulation of another gene is responsible for this phenotype. However, we cannot exclude that other genes are deregulated that contribute to this phenotype in addition to prospero knockdown.


      Similarly, what is the evidence that the phenotype associated with mir-9a knockout is due to mis-regulation of nerfin-1?

      In contrast to prosperoKD clones that are devoid of neurons, nerfin-1 mutant clones are known to be composed of a mix of neuroblasts and neurons (Fig S4E,G) (10.1101/gad.250282.114 ). When over-expressing miR-9 in neuroblast clones in the VNC, we observed a strong downregulation of nerfin-1 (Fig S4A, C) showing that nerfin-1 is a likely target of miR-9. However, downregulation is not complete which could explain why we do not see neuroblast amplification in the VNC (Fig 4F). Together with the significant up-regulation of nerfin-1 upon miR-9sponge expression, and the results of our luciferase assays, these data are consistent with nerfin-1 being a direct target of miR-9. Finally, the fact that overexpression of miR-9 in the optic lobes triggers phenotypes very similar to loss of function of nerfin-1 (but different from loss of function of prospero which is upstream of Nerfin-1 in epistatic tests) suggests that down-regulation of nerfin-1 is at least partially responsible for the phenotype (Fig S4D,E).

      Again, we cannot exclude that other deregulated targets contribute to the phenotype.

      Most of look-alike mutant phenotypes presented by the authors appear to occur in the OL. Is there any reason why cells in the visual center, which is not a part of the brain, appears to be more suspectable to loss of function of miRNAs? This is particularly important when manipulating the same miRNAs appear to have very subtle effects on VNC neuroblasts.

      Optic lobes (OL) are a part of the brain (10.1016/j.neuron.2013.12.017). Indeed, each OL constitutes a large region located on both sides of the central brain that integrates signals from retinal photoreceptors coming from the retina in the eyes. Moreover, medulla neuroblasts in the OL can be considered as type I neuroblasts because they generate GMCs that undergo a single division, in contrast to intermediate progenitors (INPs) produced by type II neuroblasts.

      In the original version of our manuscript, we mainly showed gain-of-function in the OL , as for some of the miRNAs the phenotypes were more striking than elsewhere. We have now more systematically tested our gain-of-function and loss-of-function in both the VNC (type-I neuroblasts) (Fig 3, 4, 5, S3, S4, S6) and in the OL (medulla neuroblasts) (Fig 6, S4, S5, S7).

      Results in the VNC are presented generally in the main figures, while results in the OL are presented mainly in supplemental figures; but phenotypes obtained in both parts are now clearly described in the text of the revised version.

      How do the authors know that multi-sponge 2 expression leads to loss of stemness potential in neuroblasts? Any additional evidence that supports precocious differentiation but not death or cell cycle exit?

      This is indeed an important point which we have investigated further in the new version. We now show that inhibiting apoptosis partially rescues neuroblast elimination but not shrinkage when miR-cluster2sponge is expressed in the poxn lineage in the VNC (Fig.4L,M). This shows that VNC neuroblast can disappear by apoptosis upon miR-cluster2sponge, but that shrinkage precedes apoptosis. We also show that optic lobe neuroblasts also shrink upon miR-cluster2sponge and are precociously eliminated as indicated by the thinner neuroblast stripe, by a mechanism independent of apoptosis (Fig 6C,D, S7F). Indeed, the neuroblast stripe in the optic lobe remains free of anti-activated caspase 1 (Dcp1), a widely used label of apoptotic cells, upon miR-cluster2sponge (Fig S7F). Finally, we also show precocious "plunging" of the old OL neuroblasts deep in the medulla, another sign of precocious differentiation (Fig S7G).

      Therefore, these experiments reinforce the conclusion that the neuroblast-enriched miRNA module is involved in neuroblast maintenance and that down-regulation of this module leads to the progressive loss of the neuroblast state.

      Lastly, we show that miR-cluster2sponge has no effect on type II neuroblasts or wing imaginal discs arguing for a specific type I neuroblast effect (including VNC, CB and medulla neuroblasts).

      Again, how do the authors know that mir-1 overexpression efficiently silenced prospero mRNA in neuroblasts and GMCs in Fig. 4F?

      This relevant question is addressed in our response to questions 2 and 3.

      Have the authors considered other targets to better assess the function of these miRNAs enriched in neuroblasts. For example, could these miRNAs function to dampen the expression of genes that are required for maintaining these cells in an undifferentiated state? Several studies using the neuroblast model suggest that the expression of these genes needs to be downregulated at the transcriptional and post-transcriptional levels. Perhaps, these miRNAs might target these "stemness" transcripts instead of "differentiation" transcripts. Is there evidence for or against this possibility?

      This is definitely a good point that we have now discussed in the revised version. We found that neuroblast identity genes (e.g. Mira, Dpn, Insc, etc) are not targeted by the miRNA module. However, the module of miRNA in late L3 neuroblasts also appears to target the early temporal genes (Chinmo, Imp), that are strongly oncogenic and stemness promoting. These need to be silenced in late L3 to ensure that neuroblasts stop dividing during metamorphosis ( 10.7554/eLife.13463). Therefore, there is indeed a strong possibility that the miRNA module we have identified in late L3 both maintains stemness by inhibiting differentiation genes and dampens stemness by silencing early temporal genes ensuring timely elimination in pupal stage. We are actively working on the regulation of temporal genes by microRNAs along development and will describe this in details in another study.

      This point was clarified in the discussion as followed:

      "In this context it is interesting to note that, in addition to differentiation factors, the early temporal factors Chinmo and Imp are predicted to be highly targeted by the neuroblast-enriched miRNA module. Given the strong oncogenic potential of these genes30*, it possible that the microRNA module not only protects neuroblasts against precocious differentiation but also protects against uncontrolled self-renewal. Therefore, in principle the same miRNA module could control neuroblast activity through the control of both self-renewal and differentiation, two seemingly opposing biological activities." *

      Minor point 1. There are a number of mis-leading statements throughout the manuscript. -In the abstract, the authors indicated "isolate actively inhibiting miRNAs from different neural cell populations in the larval Drosophila central nervous system". For example, the expression patterns of Nub-Gal4 an Elav-Gal4 drivers appear to be partially overlapping in multiple cell types and might be active in the visual center (optic lobe). If true, it was unclear to me what neural cell types were actually used in their analyses and how they could confidently indicate that cell types in the central nervous system were used in their study. Aren't there more specific Gal4 drivers or more sophisticated genetic tools available to increase the purity of cell types? If not, the alternative could be a much more precise secondary screening step to directly determine where these miRNAs are actually detected instead of relying on indirect readouts of where they might be expressed.

      The expression patterns with additional figures are now more clearly described in the main text and in Fig.S1C,D.

      We are in the process of using other GAL4 drivers that target more specific populations of neurons. But this is beyond the scope of this first study and will be published later.

      -The statement "GMCs lacking Prospero, Nerfin-1 or Brat fail to differentiate and reacquire a neuroblast identity" is very problematic. Nerfin-1 does not appear to be expressed in GMCs according to Fig. S2B. Furthermore, Froldi et al., 2015 suggested that Nerfin-1 appears to prevent activated Notch from reverting neurons to ectopic neuroblasts.

      Indeed, Nerfin-1 is not expressed in GMCs but in immature neurons to stabilize neuronal identity and prevent reversion as shown by Froldi et al. and other studies (DOI: 10.1101/gad.250282.114 ; https://doi.org/10.1242/dev.141341). We have now clarified this point in the introduction: "This process involves the sequential activity of key cell fate determinants such as the transcription factor Prospero and the RNA-binding protein Brat in the GMCs followed by the transcription factor Nerfin-1 and the RNA-binding protein Elav in the maturing neurons20-23. GMCs lacking Prospero, or immature post-mitotic progeny lacking Nerfin-1, fail to initiate or maintain differentiation respectively, and progressively reacquire a neuroblast identity, leading to neuroblast amplification 21,23-25."

      -The statement on page 6 "Strikingly, the group of genes ... contained all iconic genes known to induce neuron differentiation after neuroblast asymmetric division, including nerfin-1, prospero, elav and brat" is problematic. Again, Nerfin-1 probably functions to maintain a neuronal state rather to induce differentiation. Is there evidence that Elav induces neuron differentiation after neuroblast asymmetric division? Brat seems to downregulate Notch signaling in neuroblast progeny rather than instructing neuron differentiation. Furthermore, previous studies suggested that loss of brat function does not affect identity of GMCs and their symmetric division to generate neurons. A similar statement is used at the end of this same paragraph to reiterates mis-leading messages.

      Prospero and Nerfin-1 are sequentially expressed in maturing neurons. Nerfin-1 shares many similar targets as Prospero. It has been proposed that Nerfin-1 prolonged the action of Prospero, allowing stabilisation/maintenance of the differentiated neuronal state (10.1101/gad.250282.114 ; 10.1016/j.celrep.2018.10.038)

      Brat is also involved in the sequence of events needed to produce neurons upon neuroblast asymmetric division. However, the mode of action of Brat in GMCs from type-I neuroblasts and in INPs from type-II neuroblasts is unclear. It was shown that Brat is an RNA-binding protein that has multiple targets. For example, it can bind and silence Myc, Zelda and Deadpan, and promote neuroblast-to-INP differentiation. It may also inhibit Notch signaling which is required for neuroblast-to-INP differentiation (https://doi.org/10.1016/j.devcel.2006.01.017; 10.1016/j.devcel.2008.03.004 ; https://doi.org/10.15252/embr.201744188; https://doi.org/10.1158/0008-5472.CAN-15-2299)

      We have clarified the difference between Type I and Type II neuroblasts in the introduction: "A sparse subset of neuroblasts (Type II) generate intermediate progenitors (INPs) that can undergo a few more asymmetric divisions, allowing for larger lineages to be produced. The neuroblast-to-neuron process in Type II lineages involves a slightly different sequential expression of differentiation factors21,24."

      We have also added a new reference describing that neuronal differentiation and maintenance are severely affected upon elav loss of function:

      Yao, K.-M., Samson, M.-L., Reeves, R. & White, K. Gene elav of Drosophila melanogaster: A prototype for neuronal-specific RNA binding protein gene family that is conserved in flies and humans. J. Neurobiol. 24, 723-739 (1993).

      **Referees cross-commenting**

      My main concern about data in this study remains direct vs. indirect effects of manipulating miRNA functions and the corresponding phenotype in various cell types in flies. The authors focused most of their effort on using genes that promote GMC differentiation in order to establish the role of neuroblast-specific miRNAs. Most of the experiments were not rigorously performed to the level that eliminates obvious caveats and suggests their interpretation is the most likely possibility. It is a technologically excellent study but lacks in-depth analyses in biological effects.

      Reviewer #2 (Significance (Required)):

      I believe there is a strong general interest in better appreciating how miRNAs regulate precise gene expression. Deriving some sort of rules such as the specificity of target selection or the efficiency of downregulating gene expression will be hugely significant to the general audience

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study advances our understanding of the brain nuclei involved in rapid-eye movement (REM) sleep regulation. Using a combination of imaging, electrophysiology, and optogenetic tools, the study provides convincing evidence that inhibitory neurons in the preoptic area of the hypothalamus influence REM sleep. This work will be of interest to neurobiologists working on sleep and/or brain circuitry.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper identifies GABA cells in the preoptic hypothalamus which are involved in REM sleep rebound (the increase in REM sleep) after selective REM sleep deprivation. By calcium photometry, these cells are most active during REM, and show more claim signals during REM deprivation, suggesting they respond to "REM pressure". Inhibiting these cells ontogenetically diminishes REM sleep. The optogenetic and photometry work is carried out to a high standard, the paper is well-written, and the findings are interesting.

      We thank the reviewer for the detailed feedback and thoughtful comments on how to improve our manuscript. To address the reviewer’s concerns, we revised our discussion and added new data. Below, we address the concerns point by point.

      Points that could be addressed or discussed:

      (1) The circuit mechanism for REM rebound is not defined. How do the authors see REM rebound as working from the POAGAD2 cells? Although the POAGAD2 does project to the TMN, the actual REM rebound could be mediated by a projection of these cells elsewhere. This could be discussed.

      We demonstrate thatPOA GAD2→TMN cells become more frequently activated as the pressure for REMs builds up, whereas inhibiting these neurons during high REMs pressure leads to a suppression of the REMs rebound. It is not known how POA GAD2→TMN cells encodeincreased REMs pressure and subsequently influence the REMs rebound. REMsdeprivation wasshown to changethe intrinsic excitabilityof hippocampal neurons and impact synaptic plasticity (McDermott et al., 2003; Mallick and Singh, 2011 ; Zhou et al., 2020) . We speculate that increasedREMs pressure leads to an increase in the excitabilityof POA->TMN neurons, reflected inthe increased number ofcalcium peaks. The increased excitability of POA GAD2→TMN neurons in turn likely leads to stronger inhibition of downstream REM-off neurons. Consequently, as soon as REMsdeprivation stops, there is an increased chance for enteringREMs. The time coursefor how long it takes till the POA excitability resettles toits baseline consequently sets a permissive time window for increasedamounts of REMs to recover its lostamount. For future studies, it would be interesting to map how quickly the excitability ofPOA neurons increases or decays as afunction of the lost or recovered amount of REMs andunravel the cellularmechanisms underlying the elevated activity of POAGAD2 →TMN neurons during highREMs pressure, e.g., whether changes in the expression of ion channels contribute to increasedexcitability of these neurons (Donlea et al., 2014) . As we mentioned in the Discussion, the POAalso projects to other REMs regulatorybrain regions such as the vlPAG and LH. Therefore, it remains to be tested whether POA GAD2 →TMN neurons also innervate these brain regions to potentially regulate REMs homeostasis. We explicitly state this now in the revised Discussion.

      (2) The "POAGAD2 to TMN" name for these cells is somewhat confusing. The authors chose this name because they approach the POAGAD2 cells via retrograde AAV labelling (rAAV injected into the TMN). However, the name also seems to imply that neurons (perhaps histamine neurons) in the TMN are involved in the REM rebound, but there is no evidence in the paper that this is the case. Although it is nice to see from the photometry studies that the histamine cells are selectively more active (as expected) in NREM sleep (Fig. S2), I could not logically see how this was a relevant finding to REM rebound or the subject of the paper. There are many other types of cells in the TMN area, not just histamine cells, so are the authors suggesting that these non-histamine cells in the TMN could be involved?

      We acknowledge that other types of neurons in the TMN may also be involved in the REMs rebound, and therefore inhibition of histamine neurons by POA GAD2 →TMN neurons may not be the sole source of the observed effect. To stress that other neurons within the TMN and/or brain regions may also contribute to the REMs rebound, we have revised the Results section.

      We performed complementary optogenetic inhibition experiments of TMN HIS neurons to investigate if suppression of these neurons is sufficient to promote REMs. We foundthat SwiChR++ mediated inhibition of TMNHIS neurons increased theamount of REMs compared withrecordings without laser stimulation in the same mice and eYFPmice withlaser stimulation. Thus, while TMN HIS neurons may not bethe only downstream target of GABAergic POA neurons, these data suggest that they contribute to REMs regulation. We have incorporated these results in Fig. S4 .

      We further investigated whether the activity of TMN HIS neurons changes between two REMs episodes. Assumingthat REMs pressure inhibits the activity ofREM-off histamine neurons,their firing rates should behighest right after REMs ends when REMs pressure is lowest, and progressivelydecay throughout the inter-REM interval, and reach their lowest activity right before the onset of REMs ( Park et al., 2021) , similarto the activity profile observed for vlPAG REM-off neurons (Weber et al., 2018).We indeed found that TMNHIS neurons displaya gradual decrease in their activity throughout theinter-REM interval and thus potentially reflect the build up of REM pressure ( Fig. S2F ).

      (3) It is a puzzle why most of the neurons in the POA seem to have their highest activity in REM, as also found by Miracca et al 2022, yet presumably some of these cells are going to be involved in NREM sleep as well. Could the same POAGAD2-TMN cells identified by the authors also be involved in inducing NREM sleep-inhibiting histamine neurons (Chung et al). And some of these POA cells will also be involved in NREM sleep homeostasis (e.g. Ma et al Curr Biol)? Is NREM sleep rebound necessary before getting REM sleep rebound? Indeed, can these two things (NREM and REM sleep rebound) be separated?

      Previous studies have demonstrated that POA GABAergic neurons, including those projecting to the TMN, are involved in NREMs homeostasis (Sherin et al., 1998; Gong et al., 2004; Ma et al., 2019) . Therefore, we predict that POA neurons that are involved in NREMs homeostasis are a subset of POA GAD2 → TMN neurons in our manuscript.

      Using optrode recordings in the POA, we recently reported that 12.4% of neurons sampled have higher activity during NREMs compared with REMs; in contrast, 43.8% of neurons sampled have the highest activity during REMs compared with NREMs (Antila et al., 2022) indicating that the proportion of NREM max neurons is smaller compared with REM max neurons. These proportions of neurons are in agreement with previous results (Takahashi et al., 2009) . Considering fiber photometry monitors the average activity of a population of neurons as opposed to individual neurons, it is possible that we recorded neural activity across heterogeneous populations and therefore our findings may disguise the neural activity of the low proportion of NREMs neurons. We previously reported thespiking activity of POA GAD2 →TMN neurons at the singlecell level (Chung et al., 2017) . We have noted in themanuscript thatwhile the activity ofPOA GAD2→TMN neurons is highestduring REMs, theneural activity increases at NREMs → REMs transitions indicating these neurons also areactive during NREMs.

      Using our REMs restriction protocol, we selectively restricted REMs leading to the subsequent rebound of REMs without affecting NREMs and consequently we did not find an increase in the amount of NREMs during the rebound or an increase in slow-wave activity, a key characteristic of sleep rebound that gradually dissipates during recovery sleep (Blake and Gerard, 1937; Williams et al., 1964; Rosa and Bonnet, 1985; Dijk et al., 1990; Neckelmann and Ursin, 1993; Ferrara et al., 1999) . However, during total sleep deprivation when subjects are deprived of both NREMs and REMs, isolating NREMs and REMs rebound may not be attainable.

      (4) Is it possible to narrow down the POA area where the GAD2 cells are located more precisely?

      POA can be subdivided into anatomically distinct regions such as medial preoptic area, median preoptic area, ventrolateral preoptic area, and lateral preoptic area (MPO, MPN, VLPO, and LPO respectively). To quantify where the virus expressing GAD2 cells and optic fibers are located within the POA, we overlaid the POA coronal reference images (with red boundaries denoting these anatomically distinct regions) over the virus heat maps and optic fiber tracts from datasets used in Figure 1A. We found that virus expression and optic fiber tracts were located in the ventrolateral POA, lateral POA, and the lateral part of medial POA, and included this description in the text.

      Author response image 1.

      Location of virus expression (A) and optic fiber placement (B) within subregions of POA.

      (5) It would be ideal to further characterize these particular GAD2 cells by RT-PCR or RNA seq. Which other markers do they express?

      Single-cell RNA-sequencing of POA neurons has revealed an enormous level of molecular diversity, consisting of nearly 70 subpopulations based on gene expression of which 43 can be clustered into inhibitory neurons (Moffitt et al., 2018) . One of the most studied subpopulation of POA sleep-active neurons contains the inhibitory neuropeptide galanin (Sherin et al., 1998; Gaus et al., 2002; Chung et al., 2017; Kroeger et al., 2018; Ma et al., 2019; Miracca et al., 2022) . Galanin neurons have been demonstrated to innervate the TMN (Sherin et al., 1998) yet, within the galanin neurons 7 distinct clusters exist based on unique gene expression (Moffitt et al., 2018) . In addition to galanin, we have previously performed single-cell RNA-seq on POA GAD2 → TMN neurons and identified additional neuropeptides such as cholecystokinin (CCK), corticotropin-releasing hormone (CRH), prodynorphin (PDYN), and tachykinin 1 (TAC1) as subpopulations of GABAergic POA sleep-active neurons (Chung et al., 2017; Smith et al., 2023) . Like galanin, these neuropeptides can also be divided into multiple subtypes as well (Chen et al., 2017; Moffitt et al., 2018) . Thus while these molecular markers for POA neurons are immensely diverse, we agree that characterizing the molecular identity of POA GAD2 → TMN neurons and investigating the functional relevance of these neuropeptides in the context of REMs homeostasis would enrich our understanding of a neural circuit involved in REMs homeostasis and can stand as a separate extension of this manuscript.

      Reviewer #2 (Public Review):

      Maurer et al investigated the contribution of GAD2+ neurons in the preoptic area (POA), projecting to the tuberomammillary nucleus (TMN), to REM sleep regulation. They applied an elegant design to monitor and manipulate the activity of this specific group of neurons: a GAD2-Cre mouse, injected with retrograde AAV constructs in the TMN, thereby presumably only targeting GAD2+ cells projecting to the TMN. Using this set-up in combination with technically challenging techniques including EEG with photometry and REM sleep deprivation, the authors found that this cell-type studied becomes active shortly (≈40sec) prior to entering REM sleep and remains active during REM sleep. Moreover, optogenetic inhibition of GAD2+ cells inhibits REM sleep by a third and also impairs the rebound in REM sleep in the following hour. Despite a few reservations or details that would benefit from further clarification (outlined below), the data makes a convincing case for the role of GAD2+ neurons in the POA projecting to the TMN in REM sleep regulation.

      We thank the reviewer for the thorough assessment of our study and supportive comments. We have addressed your concerns in the revised manuscript, and our point by point response is provided below.

      The authors found that optogenetic inhibition of GAD2+ cells suppressed REM sleep in the hour following the inhibition (e.g. Fig2 and Fig4). If the authors have the data available, it would be important to include the subsequent hours in the rebound time (e.g. from ZT8.5 to ZT24) to test whether REM sleep rebound remains impaired, or recovers, albeit with a delay.

      We thank the reviewer for this comment and agree that it would be interesting to know how REMs changes for a longer period of time throughout the rebound phase. For Fig. 2, we did not record the subsequent hours. For Fig 4, we recorded the subsequent rebound between ZT7.5 and 10.5. When we compare the REMs amount during this 4 hr interval, the SwiChR mice have less REMs compared with eYFP mice with marginal significance (unpaired t-test, p=0.0641). We also plotted the cumulative REMs amount during restriction and rebound phases, and found that the cumulative amount of REMs was still lower in SwiChR mice than eYFP mice at ZT 10.5 (Author response image 2). Therefore, it will be interesting to record for a longer period of time to test when the SwiChR mice compensate for all the REMs that was lost during the restriction period.

      Author response image 2.

      Cumulative amount of REMs during REMs deprivation and rebound combined with optogenetic stimulation in eYFP and SwiChR groups. This data is shown as bar graphs in Figure 4.

      REM sleep is under tight circadian control (e.g. Wurts et al., 2000 in rats; Dijk, Czeisler 1995 in humans). To contextualize the results, it would be important to mention that it is not clear if the role of the manipulated neurons in REM sleep regulation hold at other circadian times of the day.

      Author response image 3.

      Inhibiting POA GAD2→ TMN neurons at ZT5-8 reduces REMs. (A) Schematic of optogenetic inhibition experiments. (B) Percentage of time spent in REMs, NREMs and wakefulness with laser in SwiChR++ and eYFP mice. Unpaired t-tests, p = 0.0013, 0.0469 for REMs and wakeamount. (C) Duration of REMs, NREMs, and wake episodes. Unpaired t-tests, p = 0.0113 for NREMs duration. (D) Frequency of REMs, NREMs, and wake episodes. Unpaired t-tests, p = 0.0063, 0.0382 for REMs and NREMs frequency.

      REMs propensity is largest towards the end of the light phase (Czeisler et al., 1980; Dijk and Czeisler, 1995; Wurts and Edgar, 2000). As a control, we therefore performed the optogenetic inhibition experiments of POA GAD2→TMN neurons during ZT5-8 (Author response image 3). Similar to our results in Figure 2, we found that SwiChR-mediated inhibition of POA GAD2 →TMN neurons attenuated REMs compared with eYFP laser sessions. These findings suggest our results are consistentat other circadian times of the day.

      The effect size of the REM sleep deprivation using the vibrating motor method is unclear. In FigS4-D, the experimental mice reduce their REM sleep to 3% whereas the control mice spend 6% in REM sleep. In Fig4, mice are either subjected to REM sleep deprivation with the vibrating motor (controls), or REM sleep deprivations + optogenetics (experimental mice).

      The control mice (vibrating motor) in Fig4 spend 6% of their time in REM sleep, which is double the amount of REM sleep compared to the mice receiving the same treatment in FigS4-D. Can the authors clarify the origin of this difference in the text?

      The effect size for REM sleep deprivation is now added in the text.

      It is important to note that these figures are analyzing two different intervals of the REMs restriction. In Fig. S4D, we analyzed the total amount of REMs over the entire 6 hr restriction interval (ZT1.5-7.5). In Fig. 4, we analyzed the amount of REMs only during the last 3 hr of restriction (ZT4.5-7.5) as optogenetic inhibition was performed only during the last 3 hrs when the REMs pressure is high. In Fig. S4D, we looked at the amount of REMs during ZT1.5-4.5 and 4.5-7.5 and found that the amount of REMs during ZT4.5-7.5 (4.46 ± 0.25 %; mean ± s.e.m.) is indeed higher than ZT 1.5-4.5 (1.66 ± 0.62 %), and is comparable to the amount of REMs during ZT4.5-7.5 in eYFP mice (5.95 ± 0.52 %) in Fig. 4. We now clearly state in the manuscript at which time points we analyzed the amount, duration and frequency of REMs.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) A few further citations suggested: Discussion "The TMN contains histamine producing neurons and antagonizing histamine neurons causes sleepiness..." It would be appropriate to cite Uygun DS et al 2016 J Neurosci (PMID: 27807161) here. Using the same HDC-Cre mice as used by Maurer et al., Uygun et al found that selectively increasing GABAergic inhibition onto histamine neurons produced NREM sleep.

      We apologize for omitting this important paper. In the revised manuscript, we added this citation.

      (2) Materials and Methods.

      Although the JAX numbers are given for the mouse lines based on researchers generously donating to JAX for others to use, please cite the papers corresponding to the GAD2-ires-Cre and HDC-ires-Cre mouse lines deposited at JAX.

      GAD2-ires-Cre was described in Taniguchi H et al., 2011, Neuron (PMID: 21943598).

      The construction of the HDC-ires-CRE line is described in Zecharia AY et al J Neurosci et al 2012 (PMID: 22993424).

      We have now added these important citations in the revised manuscript.

      (3) Similarly, for the viruses, please provide the citations for the AAV constructs that were donated to Addgene.

      We have now added these citations in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      The authors rely heavily on their conclusions by using an optogenetic tool that inhibits the activity of GAD2+ neurons, however, it is not shown that these neurons are indeed inhibited as expected. An alternative approach to tackle this could be the application of a different technique to achieve the same output (e.g. chemogenetics). However, both experiments (confirmation of inhibition, or using a different technique) would require a significant amount of work, and given the numerous studies out there showing that these optogenetic tools tend to work, may not be necessary. Hence the authors could also cite a similar study that used a likewise construct and where it was indeed shown that this technique works (i.e. similar retrograde optogenetic construct with Cre depedendent expression combined with electrophysiological recordings).

      This laser stimulation protocol was designed based on previous reports of sustained inhibition using the same inhibitory opsin and our prior results that recapitulate similar findings as inhibitory chemogenetic techniques (Iyer et al., 2016; Kim et al., 2016; Wiegert et al., 2017; Stucynski et al., 2022). We have now added this description in the Result section.

      Fig1A - Right: the virus expression graphs are great and give a helpful insight into the variability. The image on the left (GCAMP+ cells) is less clear, the GCAMP+ cells don't differentiate well from the background. Perhaps the whole brain image with inset in POA can show the GCAMP expression more convincingly.

      We have added a histology picture showing the whole brain image with inset in the POA in the updated Fig. 1A .

      Statistics: The table is very helpful. Based on the degrees of freedom, it seems that in some instances the stats are run on the recordings rather than on the individual mice (e.g. Fig1). It could be considered to use a mixed model where subjects as taken into account as a factor.

      Author response image 4.

      ΔF/Factivity of POA GAD2→TMN neurons during NREMs. The duration of NREMs episodes was normalized in time, ranging from 0 to 100%. Shading, ± s.e.m. Pairwise t-tests with Holm-Bonferroni correctionp = 5.34 e-4 between80 and100. Graybar, intervals where ΔF/F activity was significantly different from baseline (0 to 20%, the first time bin). n = 10 mice. In Fig. 1E , we ran stats based on the recordings. In this data set, we ran stats based on the individual mice, and found that the activity also gradually increased throughout NREMs episodes.

      There is an effect of laser in Fig2 on REM sleep amount, as well as an interaction effect with virus injection (from the table). Therefore, it would be helpful for the reader to also show REM sleep data from the control group (laser stimulation but no active optogenetics construct) in Fig 2.

      To properly control laser and virus effect, we performed the same laser stimulation experiments in eYFP control mice (expressing only eYFP without optogenetic construct, SwiChR++) and the data is provided in Fig 2C .

      Fig3B: At the start of the rebound of REM sleep, there is a massive amount of wakefulness, also reflected in the change of spectral composition. Could you comment on the text about what is happening here?

      We quantified the amount of wakefulness during the first hour of REMs rebound and found that indeed there is no significant difference in wakefulness between REM restriction and baseline control conditions ( Fig. S4H ). Therefore, while the representative image in Fig 3B shows increased wakefulness at the beginning of REMs rebound, we do not think the overall amount of wakefulness is increased.

      Fig 4, supplementary data: it would be helpful for the reader to have mentioned in the text the effect size of the REM sleep restriction protocol (e.g. mean and standard deviation).

      Thank you for this suggestion. We have now added the effect size for the REM sleep restriction experiments in the main text.

      REM sleep restriction and photometry experiment: could be improved by adding within the main body of text that, in order to conduct the photometry experiment in the last hours of REM sleep deprivation, the manual REM sleep deprivation had to be applied, because the vibrating motor technique disturbed the photometry recordings.

      Thank you for this suggestion. We have added the description in the main text.

      Suggestion to build further on the already existing data (not for this paper): you have a powerful dataset to test whether REM sleep pressure builds up during wakefulness or NREM sleep, by correlating when your optogenetic treatment occurs (NREM or wakefulness), with the subsequent rebound in REM sleep (see also Endo et al., 1998; Benington and Heller, 1994; Franken 2001).

      We thank the reviewer for this excellent suggestion. We plan to carry out this experiment in the future.

      References

      Antila, H., Kwak, I., Choi, A., Pisciotti, A., Covarrubias, I., Baik, J., et al. (2022). A noradrenergic-hypothalamic neural substrate for stress-induced sleep disturbances. Proc. Natl. Acad. Sci. 119, e2123528119. doi: 10.1073/pnas.2123528119.

      Blake, H., and Gerard, R. W. (1937). Brain potentials during sleep. Am. J. Physiol.-Leg. Content 119, 692–703. doi: 10.1152/ajplegacy.1937.119.4.692.

      Chen, R., Wu, X., Jiang, L., and Zhang, Y. (2017). Single-Cell RNA-Seq Reveals Hypothalamic Cell Diversity. Cell Rep. 18, 3227–3241. doi: 10.1016/j.celrep.2017.03.004.

      Chung, S., Weber, F., Zhong, P., Tan, C. L., Nguyen, T., Beier, K. T., et al. (2017). Identification of Preoptic Sleep Neurons Using Retrograde Labeling and Gene Profiling. Nature 545, 477–481. doi: 10.1038/nature22350.

      Czeisler, C. A., Zimmerman, J. C., Ronda, J. M., Moore-Ede, M. C., and Weitzman, E. D. (1980). Timing of REM sleep is coupled to the circadian rhythm of body temperature in man. Sleep 2, 329–346.

      Dijk, D. J., Brunner, D. P., Beersma, D. G., and Borbély, A. A. (1990). Electroencephalogram power density and slow wave sleep as a function of prior waking and circadian phase. Sleep 13, 430–440. doi: 10.1093/sleep/13.5.430.

      Dijk, D. J., and Czeisler, C. A. (1995). Contribution of the circadian pacemaker and the sleep homeostat to sleep propensity, sleep structure, electroencephalographic slow waves, and sleep spindle activity in humans. J. Neurosci. Off. J. Soc. Neurosci. 15, 3526–3538. doi: 10.1523/JNEUROSCI.15-05-03526.1995.

      Donlea, J. M., Pimentel, D., and Miesenböck, G. (2014). Neuronal machinery of sleep homeostasis in Drosophila. Neuron 81, 860–872. doi: 10.1016/j.neuron.2013.12.013.

      Ferrara, M., De Gennaro, L., Casagrande, M., and Bertini, M. (1999). Auditory arousal thresholds after selective slow-wave sleep deprivation. Clin. Neurophysiol. Off. J. Int. Fed. Clin. Neurophysiol. 110, 2148–2152. doi: 10.1016/s1388-2457(99)00171-6.

      Gaus, S. E., Strecker, R. E., Tate, B. A., Parker, R. A., and Saper, C. B. (2002). Ventrolateral preoptic nucleus contains sleep-active, galaninergic neurons in multiple mammalian species. Neuroscience 115, 285–294. doi: 10.1016/S0306-4522(02)00308-1.

      Gong, H., McGinty, D., Guzman-Marin, R., Chew, K.-T., Stewart, D., and Szymusiak, R. (2004). Activation of c-fos in GABAergic neurones in the preoptic area during sleep and in response to sleep deprivation. J. Physiol. 556, 935–946. doi: 10.1113/jphysiol.2003.056622.

      Iyer, S. M., Vesuna, S., Ramakrishnan, C., Huynh, K., Young, S., Berndt, A., et al. (2016). Optogenetic and chemogenetic strategies for sustained inhibition of pain. Sci. Rep. 6, 30570. doi: 10.1038/srep30570.

      Kim, H., Ährlund-Richter, S., Wang, X., Deisseroth, K., and Carlén, M. (2016). Prefrontal Parvalbumin Neurons in Control of Attention. Cell 164, 208–218. doi: 10.1016/j.cell.2015.11.038.

      Kroeger, D., Absi, G., Gagliardi, C., Bandaru, S. S., Madara, J. C., Ferrari, L. L., et al. (2018). Galanin neurons in the ventrolateral preoptic area promote sleep and heat loss in mice. Nat. Commun. 9, 4129. doi: 10.1038/s41467-018-06590-7.

      Ma, Y., Miracca, G., Yu, X., Harding, E. C., Miao, A., Yustos, R., et al. (2019). Galanin Neurons Unite Sleep Homeostasis and α2-Adrenergic Sedation. Curr. Biol. CB 29, 3315-3322.e3. doi: 10.1016/j.cub.2019.07.087.

      Mallick, B. N., and Singh, A. (2011). REM sleep loss increases brain excitability: role of noradrenaline and its mechanism of action. Sleep Med. Rev. 15, 165–178. doi: 10.1016/j.smrv.2010.11.001.

      McDermott, C. M., LaHoste, G. J., Chen, C., Musto, A., Bazan, N. G., and Magee, J. C. (2003). Sleep deprivation causes behavioral, synaptic, and membrane excitability alterations in hippocampal neurons. J. Neurosci. Off. J. Soc. Neurosci. 23, 9687–9695. doi: 10.1523/JNEUROSCI.23-29-09687.2003.

      Miracca, G., Anuncibay-Soto, B., Tossell, K., Yustos, R., Vyssotski, A. L., Franks, N. P., et al. (2022). NMDA Receptors in the Lateral Preoptic Hypothalamus Are Essential for Sustaining NREM and REM Sleep. J. Neurosci. 42, 5389–5409. doi: 10.1523/JNEUROSCI.0350-21.2022.

      Moffitt, J. R., Bambah-Mukku, D., Eichhorn, S. W., Vaughn, E., Shekhar, K., Perez, J. D., et al. (2018). Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science 362. doi: 10.1126/science.aau5324.

      Neckelmann, D., and Ursin, R. (1993). Sleep stages and EEG power spectrum in relation to acoustical stimulus arousal threshold in the rat. Sleep 16, 467–477.

      Park, S.-H., Baik, J., Hong, J., Antila, H., Kurland, B., Chung, S., et al. (2021). A probabilistic model for the ultradian timing of REM sleep in mice. PLOS Comput. Biol. 17, e1009316. doi: 10.1371/journal.pcbi.1009316.

      Rosa, R. R., and Bonnet, M. H. (1985). Sleep stages, auditory arousal threshold, and body temperature as predictors of behavior upon awakening. Int. J. Neurosci. 27, 73–83. doi: 10.3109/00207458509149136.

      Sherin, J. E., Elmquist, J. K., Torrealba, F., and Saper, C. B. (1998). Innervation of histaminergic tuberomammillary neurons by GABAergic and galaninergic neurons in the ventrolateral preoptic nucleus of the rat. J. Neurosci. Off. J. Soc. Neurosci. 18, 4705–4721.

      Smith, J., Honig-Frand, A., Antila, H., Choi, A., Kim, H., Beier, K. T., et al. (2023). Regulation of stress-induced sleep fragmentation by preoptic glutamatergic neurons. Curr. Biol. CB , S0960-9822(23)01585–3. doi: 10.1016/j.cub.2023.11.035.

      Stucynski, J. A., Schott, A. L., Baik, J., Chung, S., and Weber, F. (2022). Regulation of REM sleep by inhibitory neurons in the dorsomedial medulla. Curr. Biol. CB 32, 37-50.e6. doi: 10.1016/j.cub.2021.10.030.

      Takahashi, K., Lin, J.-S., and Sakai, K. (2009). Characterization and mapping of sleep-waking specific neurons in the basal forebrain and preoptic hypothalamus in mice. Neuroscience 161, 269–292. doi: 10.1016/j.neuroscience.2009.02.075.

      Weber, F., Hoang Do, J. P., Chung, S., Beier, K. T., Bikov, M., Saffari Doost, M., et al. (2018). Regulation of REM and Non-REM sleep by periaqueductal GABAergic neurons. Nat. Commun. 9, 1–13. doi: 10.1038/s41467-017-02765-w.

      Wiegert, J. S., Mahn, M., Prigge, M., Printz, Y., and Yizhar, O. (2017). Silencing Neurons: Tools, Applications, and Experimental Constraints. Neuron 95, 504–529. doi: 10.1016/j.neuron.2017.06.050.

      Williams, H. L., Hammack, J. T., Daly, R. L., Dement, W. C., and Lubin, A. (1964). RESPONSES TO AUDITORY STIMULATION, SLEEP LOSS AND THE EEG STAGES OF SLEEP. Electroencephalogr. Clin. Neurophysiol. 16, 269–279. doi: 10.1016/0013-4694(64)90109-9.

      Wurts, S. W., and Edgar, D. M. (2000). Circadian and homeostatic control of rapid eye movement (REM) sleep: promotion of REM tendency by the suprachiasmatic nucleus. J. Neurosci. Off. J. Soc. Neurosci. 20, 4300–4310. doi: 10.1523/JNEUROSCI.20-11-04300.2000.

      Zhou, Y., Lai, C. S. W., Bai, Y., Li, W., Zhao, R., Yang, G., et al. (2020). REM sleep promotes experience-dependent dendritic spine elimination in the mouse cortex. Nat. Commun. 11, 4819. doi: 10.1038/s41467-020-18592-5.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      Dormancy/diapause/hibernation (depending on how the terms are defined) is a key life history strategy that allows the temporal escape from unfavorable conditions. Although environmental conditions do play a major role in inducing and terminating dormancy (authors call this energy limitation hypothesis), the authors test a mutually non-exclusive hypothesis (life-history hypothesis) that sex-specific selection pressures, at least to some extent, would further shape the timing of these life-history events. Authors use a metanalytic approach to collect data (mainly on rodents) on various life-history traits to test trade-offs among these traits between sexes and how they affect entry and termination of dormancy.

      Strengths:

      I found the theoretical background in the Introduction quite interesting, to the point and the arguments were well-placed. How sex-specific selection pressures would drive entry and termination of diapause in insects (e.g. protandry), especially in temperate butterflies, is very well investigated. Authors attempt to extend these ideas to endotherms and trying to find general patterns across ectotherms and endotherms is particularly exciting. This work and similar evidence could make a great contribution to the life-history theory, specifically understanding factors that drive the regulation of life cycle timing.

      Weaknesses:

      (1) I felt that including 'ectotherms' in the title is a bit misleading as there is hardly (in fact any?) any data presented on ectotherms. Also, most of the focus of the discussion is heavily mammal (rodent) focussed. I believe saying endotherms in the title as well is a bit misleading as the data is mammalfocused.

      We change the title to : "Evolutionnary trade-offs in dormancy phenology". This is a hybrid article comprising both a meta-analysis and a literature review. Each of these parts brings new elements to the hypotheses presented. The statistical analyses only concern mammals and especially rodent species. But the literature review highlighted links between the evolution of dormancy in ectotherms and endotherms that have not been linked in previous studies. We feel it is important for readers to know that much of the discussion will focus on the comparison of these two groups. But we understand that placing the term ectotherms in the title might suggest a meta-analysis including these two groups.

      In addition, we indicated more specifically in the abstract and at the end of the introduction that the article includes two approaches associated with different groups of animals.

      We also specified in the section « review criteria » that:

      Only one bird species is considered to be a hibernator, and no information is available on sex differences in hibernation phenology (Woods and Brigham 2004, Woods et al. 2019).

      We have also added a "study limitations" section, which explains that although the meta-analysis is limited by the data available in the literature, the information available for the species groups not studied seems to support our results.

      (2) I think more information needs to be provided early on to make readers aware of the diversity of animals included in the study and their geographic distribution. Are they mostly temperate or tropical? What is the span of the latitude as day length can have a major influence on dormancy timings? I think it is important to point out that data is more rodent-centric. Along the line of this point, is there a reason why the extensively studied species like the Red Deer or Soay Sheep and other well-studied temperate mammals did not make it into the list?

      We specified in the abstract and at the end of the introduction that the species studied in the metaanalysis are mainly Holarctic species. We have also added a map showing all the study sites used in the meta-analysis. Finally, we've noted in the methods and added a "study limitation" section at the end of the discussion an explanation for those species that were not studied in the meta-analysis and the consequences for the interpretation of results

      The hypotheses developed in this article are based on the survival benefits of seasonal dormancy thanks to a period of complete inactivity lasting several months. The Red Deer or Soay Sheep remain active above ground throughout the year.

      The effect of photoperiod on phenology is one of the mechanisms that has evolved to match an activity with the favorable condition. In this study, we are not interested in the mechanisms but in the evolutionary pressures that explain the observed phenology. Interspecific variation in the effect of photoperiod results from different evolutionary pressures, which we are trying to highlight. It is therefore not necessary to review mechanisms and effects of photoperiod, themselves requiring a lengthy review.

      We also tested the “physiological constraint hypothesis” on several variables. Temperature and precipitation are factors correlated with sex differences in phenology of hibernation. These factors allow consideration of the geographical differences that influence hibernation phenology.

      (3) Isn't the term 'energy limitation hypothesis' which is used throughout the manuscript a bit endotherm-centric? Especially if the goal is to draw generalities across ectotherms and endotherms. Moreover, climate (e.g. interaction of photoperiod and temperature in temperatures) most often induces or terminates diapause/dormancy in ectotherms so I am not sure if saying 'energy limitation hypothesis' is general enough.

      We renamed this hypothesis the "physiological constraint hypothesis" and we have made appropriate changes in the text so as not to focus physiological constraints solely on energy aspects.

      (4) Since for some species, the data is averaged across studies to get species-level trait estimates, is there a scope to examine within population differences (e.g. across latitudes)? This may further strengthen the evidence and rule out the possibility of the environment, especially the length of the breeding season, affecting the timing of emergence and immergence.

      For a given species, data on hibernation phenology are averaged for different populations, but also for the same population when measurements are taken over several years. To test these hypotheses on a population scale, precise data on reproductive effort would be needed for each population tested, but this concerns very few species (less than 5).

      Testing the effects of temperature and precipitation allows us to take into account the effects of climate on phenology.

      (5) Although the authors are looking at the broader patterns, I felt like the overall ecology of the species (habitat, tropical or temperate, number of broods, etc.) is overlooked and could act as confounding factors.

      Yes, that's why we also tested the physiological constraints hypothesis, including the effect of temperature and precipitation. For the life-history hypothesis, we also tested reproductive effort, which takes into account the number of offspring per year.

      (6) I strongly think the data analysis part needs more clarity. As of now, it is difficult for me to visualize all the fitted models (despite Table 1), and the large number of life-history traits adds to this complexity. I would recommend explicitly writing down all the models in the text. Also, the Table doesn't make it clear whether interaction was allowed between the predictors or not. More information on how PGLS were fitted needs to be provided in the main text which is in the supplementary right now. I kept wondering if the authors have fit multiple models, for example, with different correlation structures or by choosing different values of lambda parameter. And, in addition to PGLS, authors are also fitting linear regressions. Can you explain clearly in the text why was this done?

      To simplify the results, we reduced the number of models to just three: one for emergence and two for immergence. In place of Table 1, we have written the structure of the models used. We have added a sentence to the statistics section: “each PGLS model produces a λ parameter representing the effect of phylogeny ranging between 0 (no phylogeny effect) and 1 (covariance entirely explained by co-ancestry)”. We have tested only three PGLS models and the estimated lambda value for these models is 0.

      (7) Figure 2 is unclear, and I do not understand how these three regression lines were computed. Please provide more details.

      We tested new models and modified existing figures.

      Reviewer #2 (Public Review):

      Summary:

      An article with lots of interesting ideas and questions regarding the evolution of timing of dormancy, emphasizing mammalian hibernation but also including ectotherms. The authors compare selective forces of constraints due to energy availability versus predator avoidance and requirements and consequences of reproduction in a review of between and within species (sex) differences in the seasonal timing of entry and exit from dormancy.

      Strengths:

      The multispecies approach including endotherms and ectotherms is ambitious. This review is rich with ideas if not in convincing conclusions.

      Weaknesses:

      The differences between physiological requirements for gameatogenesis between sexes that affect the timing of heterothermy and the need for euthermy during mammalian hibernator are significant issues that underlie but are under-discussed, in this contrast of selective pressures that determine seasonal timing of dormancy. Some additional discussion of the effects of rapid climate change on between and within species phenologies of dormancy would have been interesting.

      Reviewer #2 (Recommendations For The Authors):

      This review provides a very interesting and ambitious among and within-species comparison of the seasonal timing of entry and exit from dormancy, emphasizing literature from hibernating mammals (sans bats and bears) and with attention to ectotherms. The authors test hypotheses related to the timing of food availability (energy) versus life history considerations (requirements for reproduction, avoiding predation) while acknowledging that these are not mutually exclusive. I offer advice for clarifications and description of the limitations of the data (accuracy of emergence and immergence times), but mainly seek more emphasis for small mammalian hibernators on the contrast for requirements for significant periods of euthermy prior to the emergence in males versus females, a contrast that has energetic and timing consequences in both the active and hibernation seasons.

      A consideration alluded to but not fully explained or discussed is the differences in mammals between species and sexes in the timing of what can be called ecological hibernation, which is the seasonal duration that an animal remains sequestered in its burrow or den, and heterothermic hibernation, between the beginning and end of the use of torpor. The two are not synonymous. When "emergence" is the first appearance above ground, there is a significant missing observation key to the energetic contrasts discussed in this review, that of this costly pre-emergence behavior.

      To explain the difference between heterothermic hibernation and ecological hibernation, we've added a section in review Criteria from materials and methods :

      “In this study, we addressed what can be called ecological hibernation, i.e. the seasonal duration that an animal remains sequestered in its burrow or den, which is assumed to be directly linked to the reduced risk of predation. In contrast, we did not consider heterothermic hibernation, which corresponds to the time between the beginning and end of the use of torpor. So when we mention hibernation, emergence or immergence, the specific reference is to ecological hibernation.”

      In arctic and other ground squirrel species, males remain at high body temperatures after immerging and remaining in their burrows in the fall for several days to a week, and more consistently and importantly, males that will attempt to breed in the spring end torpor but remain constantly in their burrows for as much as one month at great expense whilst undergoing testicular growth, spermatogenesis, spemiation, and sperm capacitation, processes that require continuous euthermy. Female arctic ground squirrels and non-breeding males do not and typically enter their first torpor bout 1-2 days after immergence and first appear above ground 1-3 days after their last arousal in spring.

      The weeks spent euthermic in a cold burrow in spring by males while undergoing reproductive maturation require a significant energetic investment (can equate to the cost of the previous heterothermic period) that contrasts profoundly with the pre-mating energetic investment by females.

      Males cache food in their hibernacula and extend their active season in late summer/fall in order to do so and feed from these caches in spring after resuming euthermy, often emerging at body weights similar to that at immergence. Similar between-sex differences in the timing of hibernation and heterothermy occur in golden-mantled and Columbian ground squirrels and likely most other Urocitellus spp., though less well described in other species. These differences are related to life histories and requirements for male vs. female gameatogenesis and, at the same time, energetic considerations in the costs to males for remaining euthermic while undergoing spermatogenesis and the cost related to whether males undergo gonadal development being dependent on individual body mass and cache size. These issues should be better discussed in this review.

      It is the time required to complete spermatogenesis, spermiation, and maturation of sperm not the time for growth of different sizes of testes that drives the preparation time for males. This is relatively constant among rodents. I challenge the assumption that larger testes take longer to grow than smaller ones.

      We took this comment into account. As we found little evidence of an increase in testicular maturation time with relative testicular size (apart from table 4 in Kenagy and Trombulak, 1986), we no longer tested the effect of relative testicular size on protandry.

      We examined whether the ability to store food before hibernation might reduce protandry. Although food storage in the burrow may be favored for overcoming harsh environments or predation, model selection did not retain the food-storing factor. Thus, the ability to accumulate food in the burrow was not by itself likely to keep males of some species from emerging earlier (e.g. Cricetus cricetus, protandry : 20 day, Siutz et al., 2016). Early emerging males may benefit from consuming higher quality food or in competition with other males (e.g., dominance assertion or territory establishment, Manno and Dobson 2008).

      We developed these aspects in the discussion

      While it is admirable to include ectotherms in such a broad review and modelling, I can't tell what data from how many ectothermic species contributed to the models and summary data included in the figures.

      Too few data on ectotherms were available to include ectotherms in the meta-analysis

      Some consideration should be made to the limitations of the data extracted from the literature of the accuracy of emergence and immergence dates when derived from only observations or trapping data. The most accurate results come from the use of telemetry for location and data logging reporting below vs. above ground positioning and body temperature.

      We added a "study limits" section to the discussion to address all the limits in this commentary.

      L64 "favor reproduction", better to say "allow reproduction", since there is strong evolutionary pressure to initiate reproduction early, often anticipating favorable conditions for reproduction, to maximize the time available for young to grow and prepare for overwintering themselves.

      Also, generally, it is not how "harsh" an environment is but rather how short the growing season is.

      We took this comment into account.

      L80 More simply, individuals that have amassed sufficient energy reserves as fat and caches to survive through winter may opt to initiate dormancy. This may decrease but not obviate predation, since hibernating animals are dug from their burrows and eaten by predators such as bears and ermine.

      In this sentence, we indicated a gap between dormancy phenology and the growing season, which suggests survival benefits of dormancy other than from a physiological point of view. We've changed the sentence to make it clearer : “However, some animals immerge in dormancy while environnemental conditions would allow them (from a physiological point of view) to continue their activity, suggesting other survival benefits than coping with a short growing season”

      L88 other physiological or ecological factors.... (gameatogenesis).

      In this study, we examine possible evolutionary pressures and therefore the environmental factors that may influence hibernation phenology. We focus on reproductive effort because, assuming predation pressure, we would expect a trade-off between survival and reproduction.

      L113 beginning early to afford long active seasons to offspring while not compromising the survival of parents.

      We added to the sentence:

      “For females, emergence phenology may promote breeding and/or care of offspring during the most favorable annual period (e.g., a match of the peak in lactational energy demand and maximum food availability, Fig. 1) or beginning early to afford long active seasons to offspring while not compromising the survival of parents.”

      L117 based on adequate preparation for overwintering and enter dormancy....

      We modified the sentence as follows :

      recovering from reproduction, and after acquiring adequate energy stores for overwintering”

      L123 given that males outwardly invest the least time in reproduction yet generally have shorter hibernation seasons would seem to reject this hypothesis. This changes if you overtly include the time and energy that males expend while remaining euthermic preparing for hibernation, a cost that can be similar to energy expended during heterothermy.

      Males invest a lot of time in reproduction before females emerge (whether for competition or physiological maturation) and some males seem to be subject to long-term negative effects linked to reproductive stress (see Millesi, E., Huber, S., Dittami, J., Hoffmann, I., & Daan, S. (1998). Parameters of mating effort and success in male European ground squirrels, Spermophilus citellus. Ethology, 104(4), 298-313). Both processes may contribute to reducing the duration of male hibernation.

      L125 again, costs to support euthermy in males undergoing reproductive development is an investment in reproduction.

      You're right, but it's difficult to quantify. We tested a model that takes into account the reproductive effort during reproduction and prior to reproduction. We also considered the hypothesis that species living in a cold climate might have a low protandry while having a high reproductive effort due to their ability to feed in the burrow (interaction effect between reproductive effort and temperature). We think these changes answer your comment.

      L134 It isn't growing large testes that takes time, but instead completing spermatogenesis and maturation of sperm in the epdidymides.

      We removed this part.

      L140 Later immergence in male ground squirrels is related to accumulation and defense of cached food, activities that are related to reproduction the next spring. An experimental analysis that would be revealing is to compare immergence times in females that completed lactation to the independence of their litters vs. females that did not breed or lost their litters. Who immerges first?

      Body mass variation from emergence to the end of mating in males seems to explain the delayed immergence of males in species that don't hide food in their burrows for hibernation. For example, in spermophilus citellus, males immege on average more than 3 weeks after females, yet they do not hide food in their burrows for the winter.

      Such a study already exists and shows that non-breeding females immerge earlier than breeding females. We refer to it

      L386: “In mammals, males and females that invest little or not at all in reproduction exhibit advances in energy reserve accumulation and earlier immergence for up to several weeks, while reproductive congeners continue activity (Neuhaus 2000, Millesi et al. 2008a).”

      L164 So you examined literature from 152 species but included data from only 29 species? Did you include data from social hibernators (marmots) that mate before emergence?

      With current models, we have 28 different species. We have few species because very few have data on both sex difference data and information on reproductive effort data (especially for males).

      Data on sex differences in hibernation were not available for social hibernating species.

      L169 Were these data from trapping or observation results? How reliable are these versus the use of information from implanted data loggers or collars that definitively document when euthermy is resumed and/or when immergence and first emergence occurs (through light loggers)?

      We did not focus heterothermic hibernation, but in ecological hibernation. We have no idea of the margin of error for these types of data, but we have discussed these limitations in the "Study limitations" section.

      L180, again, it is the time required to complete spermatogenesis and spermiation not the time for the growth of different sizes of testes that drives the preparation time for males. This is relatively constant among rodents. I challenge the assumption that larger testes take longer to grow than smaller ones.

      We removed this part.

      L200 Males that accumulate caches in fall and then feed from those during the spring pre-emergence euthermic interval and after will often be at their seasonal maximum in body mass. Declining from that peak may not be stressful.

      It has been suggested that reproductive effort in Spermophilus citellus might induce long-term negative effects that delay male immergence.

      Millesi, E., Huber, S., Dittami, J., Hoffmann, I., & Daan, S. (1998). Parameters of mating effort and success in male European ground squirrels, Spermophilus citellus. Ethology, 104(4), 298-313.

      L210 How about altitude, which affects the length of the growing season at similar latitudes?

      We extracted the location of each study site to determine the temperature and precipitation at that precise location (based on interpolated climate surface). We therefore take into account differences in growing season (based on temperature) in altitude between sites.

      L267 How did whether males cache food or not figure into these comparisons? Refeeding before mating occurs during the pre-emergence euthermic interval.

      We removed this part.

      L332, 344 not a "proxy" but functionally related to advantages in mating systems with multiple mating males.

      We removed this part.

      L353 The need for a pre-emergence euthermic interval in male ground squirrels requires costs in the previous active season in accumulating and defending a cache and the proximal costs in spring while remaining at high body temperatures prior to emergence with resulting loss in body mass or devouring of the cache.

      You're right, but in this section, we quickly explain the benefits of food catching compared with other species that don't do so.

      L385 This review should discuss why females are not known to cache and contrast as "income breeders" from "capital breeder" males. What advantages of caches are females indifferent to (no need for a prolonged pre-emergence period) and what costs of accumulating caches do they avoid (prolonged activity period and defense of caches).

      We clarified the case of female emergence.

      L321 : “Thus, an early emergence of males may have evolved in response to sexual selection to accumulate energy reserve in anticipation of reproductive effort. Females, on the contrary, are not subject to intraspecific competition for reproduction and may have sufficient time before (generally one week after emergence) and during the breeding period to improve their body condition.”

      L388 I don't understand the logic of the conclusion that "did not ...adequately explain the late male immergence" in this section. The greater mass loss in males over the mating period is afforded by the presence of a cache that requires later immergence.

      We removed this part.

      L412 Not just congeners that invest less in reproduction, but within species individuals that do not attempt to breed in one or more years and thus have no reproductive costs should be an interesting comparison for differences in phenology from individuals that do breed. Non-breeders are often yearlings but can be a significant overall proportion of males that fail to fatten or cache enough to afford a pre-emergence euthermic period.

      L385: “In mammals, males and females that invest little or not at all in reproduction exhibit advances in energy reserve accumulation and earlier immergence for up to several weeks, while reproductive congeners continue activity (Neuhaus 2000, Millesi et al. 2008a).”

      The sentence refers to individuals who reproduce little or not at all.

      L445 Males that gain weight between emergence and mating may do so by feeding from a cache regardless of how "harsh" an environment is.

      We observe this phenomenon even in species that are not known to hoard food

      “Gains in body mass observed for some individuals, even in species not known to hoard food, may indicate that the environment allows a positive energy balance for other individuals with comparable energy demands.”

      L492 Some insects retreat to refugia in mid-summer to avoid parasitism (Gynaephora).

      Escape from parasites is also a benefit of dormancy.

      Fig 1 - It is difficult to see the differences in black and green colors, esp if color blind.<br /> Maternal effort is front-loaded within the active season (line for "optimal period" shown in midseason).

      Add "energy" underneath c) Prediction (H1) and "reproduction" underneath d) "Prediction (H2). Explain the orange vs black, green colors of triangles.

      We made the necessary changes

      Fig 2 - I don't buy the regression lines as significant in this figure. The red line, cannot have a regression with two sample points and without the left-hand most dot, nothing is significant.

      We deleted this graph.

      Fig 3 - females only?

      We deleted this graph.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      Below I summarize points that should be addressed in a revised version of the manuscript.

      • Page 6, first paragraph: I don't understand by the signals average out to a single state. If the distribution is indeed randomly distributed, a broad signal with low intensity should be present.

      We agree that this statement may cause confusion. We changed the text (marked in bold) to clarify the statement: The mobility of the undocked SBDs will be higher than the diffusion of the whole complex, allowing the sampling of varying interdomain distances within a single burst. However, these dynamic variations are subsequently averaged to a singular FRET value during FRET calculations for each burst, and may appear as a single low FRET state in the histograms.

      • Page 6, third paragraph: how can the donor only be detected in the acceptor channel? Is this tailing out?

      Donor only signal is not detected in the acceptor channel. As described in page 5 and in the Materials & Methods section, the dye stoichiometry value is defined for each burst/dwell using three types of photon counts: donor-based donor emission (FDD), donor-based acceptor emission (FDA) and acceptorbased acceptor emission (FAA).

      When no acceptor fluorophore is present FAA=0 and S=1.

      Some donor photons bleed through into the acceptor channel, but we correct for this by calculating the leakage and crosstalk factors as described in the Materials and Methods (page 20).

      We changed the text (marked in bold) in the manuscript to address the question: The FRET data of both OpuA variants is best explained by a four-state model (Figure 2A,B; fourth and fifth panel) (Supplementary File 3). Two of the four states represent donor-only (S≈1) or acceptor-only (S≈0) dwells. The full bursts belonging to donor-only and acceptor-only molecules were excluded prior to mpH2MM. This means that some molecules transit to a donor-only or acceptor-only state within the burst period, which most likely reflects blinking or bleaching of one of the fluorophores. These donoronly and acceptor-only states were also excluded during further analysis. The other two states reflect genuine FRET dwells that were analyzed by mpH2MM. They represent different conformations of the SBDs.

      • Page 7, "SBD dynamics ..": why was the V149Q mutant only analyzed in the K521C background and not also in the N414C background?

      The two FRET states were best distinguished in OpuA-K521C. Therefore, we decided to focus on OpuA-K521C and not OpuA-N414C. OpuA-V149Q was used to show that reduced docking efficiency does not affect the transition rate constants and relative abundances of the two FRET states, and we regarded it sufficient to test the SBD dynamics in OpuA-K521C only.

      • Page 8, second paragraph: why was the N414C mutant analyzed only from 0 - 600 mM and not also up to 1000 mM?

      In line with the previous answer, our main focus was on OpuA-K521C, since the two FRET states were best distinguished in OpuA-K521C. OpuA-N414C was used to prove that similar states are observed when measuring with fluorophores on the opposite site of the SBD. We studied how the FRET states change in response to different conditions that correspond to different stages of the transport cycle and how it changes in response to different ionic strengths. Initially, 600 mM KCl was used to study the dynamics of the SBD at high ionic strength. Later in this study, we tested a very wide range of different salt concentrations for OpuA-K521C to get detailed insights into the dynamics of the SBDs over a wide ionic strength range. Note that 1 M KCl is a very high, non-physiological ionic strength for the typical habitat of L. lactis and was only used to show that the high FRET state occurs even under very extreme conditions.

      • Page 8, third paragraph: why was the dimer (if it is the source of the FRET signal) only partially disrupted?

      We acknowledge that this is a very good point. However, we purposely did not speculate on this point in the manuscript, because we have limited information on the molecular details of the interaction. As we highlight on page 8, the SBDs experience each other in a very high apparent concentration (millimolar range). This means that the interactions are most likely very weak (low affinity) and not very specific. Such interactions are in the literature referred to as the quinary structure of proteins and they occur at the high macromolecular crowding in the cell and in proteins with tethered domains, and thus at high local concentrations. Such interactions can be screened by high ionic strength. In the revised manuscript, we now present the partially disrupted dimer structure in the context of the quinary structure of a protein (page 11):

      In other words, the high FRET state may comprise an ensemble of weakly interacting states rather than a singular stable conformation, resembling the quinary structure of proteins. The quinary structure of proteins is typically revealed in highly crowded cellular environments and describes the weak interactions between protein surfaces that contribute to their stability, function, and spatial organization (Guin & Gruebele, 2019). Despite the current study being conducted under dilute conditions, the local concentration of SBDs (~4 mM) mimics a densely populated environment and reveal quinary structure.

      • Page 9, second paragraph: according to the EM data processing, only 20% of the particles were used for 3D reconstruction. Why? Does it mean that the remaining 80% were physiologically not relevant? If so, why were the 20% used relevant?

      We note that it is a fundamental part of image processing of single particle cryo-EM data to remove false positives or low-resolution particles throughout the processing workflow. In particular when using a very low and therefore generous threshold during automated particle picking, as we did (t=0.01 and t=0.05 for the 50 mM KCl and 100 mM KCl datasets, respectively), the initial set of particles includes a significant amount of false positives – a tradeoff to avoid excluding particles belonging to low populated classes/orientations. It is thus common that more than 50% of ‘particles’ are excluded in the first rounds of 2D classification. In our case, only 30% and 52% of particles were retained after such first clean-up steps. Subsequently, the particle set is further refined, and additional false positives and low-resolution particles are excluded during extensive rounds of 3D classification. We also note that during the final steps, most of the data excluded represents particles of lower quality that do not contribute to a high-resolution, or belong to low population protein conformations. This does not mean that such a population is not physiological relevant. In conclusion, having only 5-20% of the initial automated picked particles contributing to the reconstruction of the final cryo-EM map is common, with the vast majority of excluded particles being false positives.

      • Page 11, third paragraph: the way the proposed model is selected is also my main criticism. All alternative models do not fit the data. Therefore, the proposed model is suggested. However, I do not grasp any direct support for this model. Either I missed it or it is not presented.

      Concerning the specific model in Figure 5, the reviewer is correct. We do not provide direct evidence for a side-ways interaction. However, we have evidence of transient interactions and our data rule out several scenarios of interaction, leaving 5C as the most likely model. This is also the main conclusion of this paper: In conclusion, the SBDs of OpuA transiently interact in a docking competent conformation, explaining the cooperativity between the SBDs during transport. The conformation of this interaction is not fixed but differs substantially between different conditions.

      Because the interaction is very short-lived it was not possible to visualize molecular details of this interaction. We present Figure 5 to hypothesize the most likely type of interaction, since many possibilities can be excluded with the vast amount of presented data. To make our point more clear that we discuss models and rule out several possibilities but not demonstrate a specific interaction between the SBDs, we now write on page 10 (changes marked in bold): We have shown that the SBDs of OpuA come close together in a short-lived state, which is responsive to the addition of glycine betaine (Figure 4A). Although the occurrence of the state varies between different conditions, it was not possible to negate the high-FRET state completely, not even under very high or low KCl concentrations, or in the presence of 50 mM arginine plus 50 mM glutamate (Figure 4A,B). To evaluate possible interdomain interactions scenarios we consider the following: (1) The SBDs of OpuA are connected to the TMDs with very short linkers of approximately 4 nm, which limit their movement and allow the receptor to sample a relatively small volume near its docking site. (2) in low ionic strength condition OpuA-K521C displays a high FRET state with mean FRET values of 0.7-0.8, which correspond to inter-dye distances of approximately 4 nm. (3) The high FRET state is responsive to glycine betaine, which points toward direct communication between the two SBDs. (4) The distance between the density centers of the SBDs in the cryo-EM reconstructions (based on particles with a low and high FRET state) is 6 nm, which aligns with the dimensions of an SBD (length: ~6 nm, maximal width: ~4 nm). These findings collectively indicate that two SBDs interact but not necessarily in a singular conformation but possibly as an ensemble of weakly interacting states. Hence, we discuss three possible SBD-SBD interaction models to explain the highFRET state:

      Reviewer #2 (Recommendations For The Authors):

      In the abstract and elsewhere the authors suggest that the SBDs physically interact with one another, and that this interaction is important for the transport mechanism, specifically for its cooperativity.

      I feel that this main claim is not well established. The authors convincingly demonstrate that the SBDs largely occupy two states relative to one another and that in one of these states, they are closer than in the other. Unless I have missed (or failed to understand) some major details of the results, I did not find any evidence of a physical interaction. Have the authors established that the high FRET state indeed corresponds to the physical engagement of the SBDs? I feel that a direct demonstration of an interaction is much missing.

      Along the same lines, in the low-salt cryo-EM structure, where the SBDs are relatively closer together, the SBDs are still separated and do not interact.

      See also our response to the final comment of reviewer 1. Furthermore, please carefully consider the following: (1) FRET values of 0.7-0.8 correspond to inter-dye distances of approximately 4 nm. (2) The high FRET state is responsive to glycine betaine, which points toward direct communication between the two SBDs. (3) The cryo-EM reconstruction is the average of all the particles in the final dataset, including both the particles with a low and high FRET state. Further, the local resolution of the SBDs in the cryo-EM map is low, indicative of high degree of flexibility. Thus, a potential interaction is possible within the observed range of flexibility. (4) The distance between the density centers is 6 nm, aligning with the dimensions of an SBD (length: 6 nm, maximal width: 4 nm). These factors collectively indicate SBD interactions, and we present these points now more explicitly in Figure 4 and the last part of the results section (page 9).

      Once the authors successfully demonstrate that direct physical interaction indeed occurs, they will need to provide data that places it in the context of the transport cycle. Do the SBDs swap ligand molecules between them? Do they bind the ligand and/or the transporter cooperatively? What is the role of this interaction?

      We acknowledge the intriguing nature of the posed questions, but they extend beyond the scope of this study. It is extremely challenging to obtain high-resolution structures of highly dynamic multidomain proteins, like OpuA, and to probe transient interactions as we do here for the SBDs of OpuA. We therefore combined cryo-TEM with smFRET studies and perform the most advanced and state-of-theart analysis tools as acknowledged by reviewer 1. We link our observations on the structural dynamics and interactions of the SBDs to a previous study, where we showed that the two SBDs of OpuA interact cooperatively. We do not have further evidence that connect the physical interactions to the transport cycle. In our view, the collective datasets indicate that the here reported physical interactions between the SBDs increase the transport efficiency.

      As far as I understand, the smFRET data have been interpreted on the basis of a negative observation, i.e., that it is "likely" that none of the FRET states corresponds to a docked SBD. To convincingly show this, a positive observation is required, i.e., observation of a docked state.

      The aim of this study was to study interdomain dynamics and not specifically docking. We have previously shown that docking can be visualized via cryo-EM (Sikkema et al., 2020), however the SBDs of OpuA appear to only dock in specific turnover conditions. We now show that the high FRET state of OpuA cannot represent a docked state, but that the SBDs transiently interact (see our response to the first comment). Importantly, a docked state was also not found in the cryo-EM reconstructions at low ionic strength, representing the smFRET conditions where we observe the interactions between the SBDs. The high FRET state occupies 30% of the dwells in this condition, and such a high percentage of molecules would have become apparent during cryo-EM 3D classification in case they would form a docked state. Therefore, we conclude that docking does not occur in low ionic strength apo condition. We discuss this point and our reasoning on page 11 of the revised manuscript.

      In this respect, I find it troubling that in none of the tested conditions, the authors observed a FRET state which corresponds to the docked state. Such a state, which must exist for transport to occur (as mentioned in the authors' previous publications), needs to be demonstrated. This brings me to my next question: why have the authors not measured FRET between the SBDs and the transporter? Isn't this a very important piece that is missing from their puzzle?

      We agree that investigating docking behavior under varied turnover conditions requires focused experiments on FRET dynamics between the SBDs and the transporter. As noted on page 5, OpuA exists as a homodimer, implying that a single cysteine mutation introduces two cysteines in a single functional transporter. To specifically implement a cysteine mutation in only one SBD and one transmembrane domain, it is necessary to artificially construct a heterodimer. We recently published initial attempts in this direction, and this will be a subject for future research but still requires years of work.

      Additionally, I feel that important controls are missing. For example, how will the data presented in Fig1 look if the transporter is labeled with acceptor or donor only? How do soluble SBDs behave?

      In the employed labeling method, donor and acceptor dyes are mixed in a 1:1 ratio and randomly attached to the two cysteines in the transporter. This automatically yields significant fractions of donor only and acceptor only transporters which are always present during the smFRET recordings. We can visualize those molecules on the basis of the dye stoichiometry, which we calculate by using three types of photon counts: donor-based donor emission (FDD), donor-based acceptor emission (FDA) and acceptorbased acceptor emission (FAA).

      Unfiltered plots look as follows (a dataset of OpuA-K521C at 600 mM KCl):

      Author response image 1.

      Donor only and acceptor only molecules have a very well discernible stoichiometry of 1 and 0, respectively. The filtering procedure is described in the materials and methods section, and these plots can be found in the supplementary database. We did not add them to the main text or supplementary materials of the original manuscript, as this is a very common procedure in the field of smFRET. We now include such a dataset in the revised manuscript.

      Soluble SBDs of OpuA have been studied previously (e.g. Wolters et al., 2010 & De Boer et al. 2019). For example, we have shown by SEC-MALLLS that soluble SBDs do not form dimers, which is consistent with our notion that the SBDs interact with low affinity. It is not possible to study interdomain dynamics between soluble SBDs by smFRET, because the measurements are carried out at picomolar concentrations (monomeric conditions). We emphasize that smFRET measurements with native complexes, with SBDs near each other at apparent millimolar concentrations, is physiologically more relevant.

      Additional comments:

      (1) "It could well be that cooperativity and transient interactions between SBDs is more common than previously anticipated" and a similar statement in the abstract. What evidence is there to suggest that the transient interactions between SBDs are a common phenomenon?

      On page 11, we write: Dimer formation of SBPs has been described for a variety of proteins from different structural clusters of substrate-binding proteins [33–38,51–53]. We cite 9 papers that report SBD/SBP dimers. This suggest to us that the phenomenon of interacting substrate-binding proteins could be more common. Moreover, the concentration of maltose-binding protein and other SBPs in the periplasm of Gram-negative bacteria can reach (sub)millimolar concentrations, and low-affinity interactions may play a role not only in membrane protein-tethered SBDs (like in OpuA) but also be important in soluble substrate-receptors. Such low-affinity interactions are rarely studied in biochemical experiments.

      (2) I think that the data presented in 1B-C better suits the supplementary information.

      Figure 1B-D is already a summary of the supplementary information that describes the optimization of OpuA purification. We think it is valuable to show this part of the figure in the main text. A very clean and highly pure OpuA sample is essential for smFRET experiments. Quality of protein preparations and data analysis are key for the type of measurements we report in this paper.

      (3) "the first peak in the SEC profile corresponds...." The peaks should be numbered in the figure to facilitate their identification.

      We have changed the figure as suggested.

      (4) "smFRET is a powerful tool for studying protein dynamics, but it has only been used for a handful of membrane proteins". With the growing list of membrane proteins studied by smFRET I find this an overstatement.

      We removed this sentence in the new version of the manuscript.

      (5) "We rationalized that docking of one SBD could induce a distance shift between the two SBDs in the FRET range of 3-10 nm (Figure 1E)" How and why was this assumed?

      We realize that this is one of the sentences that caused confusion about the aim of this study. In this part of the manuscript, we should not have used docking as an example and we apologize for that. We replaced the sentence by: These variants are used to study inter-SBD dynamics in the FRET range of 310 nm (Figure 1E).

      Also Figure 1E was adjusted to prevent confusion:

      Author response image 2.

      In addition, to avoid any confusion we changed the following sentence on page 4 (changes marked in bold): We designed cysteine mutations in the SBD of OpuA to study interdomain dynamics in the full length transporter.

      (6) "However, the FRET distributions are broader than would be expected from a single FRET state, especially for OpuA-K521C" Have the authors established how a single state FRET of OpuA looks? Is there a control that supports this claim?

      Below we compare two datasets from OpuA-K521C in 600 mM KCl with a typical smFRET dataset from the well-studied substrate-binding protein MBP from E. coli, which resides in a single state. Left: OpuA-K521C; Right: MBP

      Author response image 3.

      We agree that this cannot be assumed from the presented data. Therefore we rewrote this sentence: However, the FRET distributions tail towards higher FRET values, especially OpuA-K521C.

      (7) "V149Q was designed as a mild mutation that would reduce docking efficiency and thereby substrate loading, but leave the intrinsic transport and ATP hydrolysis efficiency intact." I find this statement confusing: How can a mutation reduce docking efficiency yet leave the transport activity unchanged?

      We rewrote the sentences (changes marked in bold): V149Q was designed as a mild mutation that would reduce docking efficiency and thereby substrate loading, but leave the ionic strength sensing in the NBD and the binding of glycine betaine and ATP intact. Accordingly, a reduced docking efficiency should result in a lower absolute glycine betaine-dependent ATPase activity. At the same time the responsiveness of the system to varying KCl, glycine betaine, or Mg-ATP concentrations should not change.

      (8) Along the same lines: "whereas the glycine betaine-, Mg-ATP-, or KCl-dependent activity profiles remain unchanged" vs. "OpuA-V149Q-K521C exhibited a 2- to 3-fold reduction in glycine betainedependent ATPase activity".

      See comment at point 7.

      (9) In general, I find the writing wanting at places, not on par with the high standards set by previous publications of this group.

      We recognize the potential ambiguity in our phrasing. We hope that after incorporating the feedback provided by the reviewers our manuscript will convey our findings in a clearer manner.

      Extra changes to the text:

      (1) Title changed: The substrate-binding domains of the osmoregulatory ABC importer OpuA physically transiently interact

      (2) Second part of the abstract changed: We now show, by means of solution-based single-molecule FRET and analysis with multi-parameter photon-by-photon hidden Markov modeling, that the SBDs transiently interact in an ionic strength-dependent manner. The smFRET data are in accordance with the apparent cooperativity in transport and supported by new cryo-EM data of OpuA. We propose that the physical interactions between SBDs and cooperativity in substrate delivery are part of the transport mechanism.

      (3) Page 6, third paragraph and Figure 2B: the wrong rate number was extracted from table 1. Changed this in the text and figure: 112 s-1  173 s-1. It did not affect any of the interpretations or conclusions.

      (4) Page 8, last paragraph, changed: smFRET was also performed in the absence of KCl and with a saturating concentration of glycine betaine (100 µM). The mean FRET efficiency of the highFRET state of OpuA-K521C increased to 0.78, which corresponds to an inter-dye distance of about 4 nm. This indicates that the dyes at the two SBDs move very close towards each other (Figure 4A) (Table 1) (Supplementary File 34).

      (5) Page 9, second paragraph changed: Due to the inherent flexibility of the SBDs, with respect to both the MSP protein of the nanodisc and the TMDs of OpuA, their resolution is limited. Furthermore, the cryo-EM reconstructions average all the particles in the final dataset, including those with a low and high FRET state. Nevertheless, in both conditions, the densities that correspond to the SBDs can be observed in close proximity (Figure 4D). The distance between the density centers is 6 nm and align with the dimensions of an SBD, providing further evidence for physical interactions between the SBDs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are grateful to the Editors for overseeing the review of our manuscript, and to the two reviewers for their thoughtful comments and suggestions for how it can be improved.

      I submit at this time a revision, as well as a detailed response (below) to each of the points raised in the first round of review.

      We feel the manuscript has been significantly improved by taking the reviewers' comments to heart. In a nutshell, we added new key pieces of data (impact of WIN site inhibition on global translation, rRNA production, as well as the requested cell biology analyses showing nucleolar stress), new analyses of the proteomics to counter potential concerns with normalization, and expanded/revised verbiage in key areas to clarify parts of the text that were confusing or problematic. The main figures have not changed; all new material is included in supplements to figures 2 and 3.

      Public Reviews

      Reviewer #1 (Public Review):

      Building on previous work from the Tansey lab, here Howard et al. characterize transcriptional and translational changes upon WIN site inhibition of WDR5 in MLL-rearranged cancer cells. They first analyze whether C16, a newer generation compound, has the same cellular effects as C6, an early generation compound. Both compounds reduce the expression of WDR5-bound RPGs in addition to the unbound RPG RPL22L1. They then investigate differential translation by ribo-seq and observe that WIN site inhibition reduces the translational RPGs and other proteins related to biomass accumulation (spliceosome, proteasome, mitochondrial ribosome). Interestingly, this reduction adds to the transcriptional changes and is not limited to RPGs whose promoters are bound by WDR5. Quantitative proteomics at two-time points confirmed the downregulation of RPGs. Interestingly, the overall effects are modest, but RPL22LA is strongly affected. Unexpectedly, most differentially abundant proteins seem to be upregulated 24 h after C6 (see below). A genetic screen showed that loss of p53 rescues the effect of C6 and C16 and helped the authors to identify pathways that can be targeted by compounds together with WIN site inhibitors in a synergistic way. Finally, the authors elucidated the underlying mechanisms and analyzed the functional relevance of the RPL22, RPL22L1, p53, and MDM4 axis.

      While this work is not conceptually new, it is an important extension of the observations of Aho et al. The results are clearly described and, in my view, very meaningful overall.

      Major points:

      (1) The authors make statements about the globality/selectivity of the responses in RNA-seq, ribo-seq, and quantitative proteomics. However, as far as I can see, none of these analyses have spike-in controls. I recommend either repeating the experiments with a spike-in control or carefully measuring transcription and translation rates upon WIN site inhibition and normalizing the omics experiments with this factor.

      The reviewer is correct that we did not include spike-in controls in our omics experiments. We would like to emphasize that none of the omics data in this manuscript have been processed in unorthodox ways, and that the major conclusions each have independent corroborating data.

      The selectivity in RPG suppression observed in RNA-Seq, for example, is supported by results from our target engagement (QuantiGene) assays; suppression of RPL22L1 mRNA levels is supported by quantitative and semi-quantitative RT-PCR, by western blotting, and by the results of our proteomic profiling; alternative splicing (and expression) of MDM4—and its dependency on RPL22—is also backed up by similar RT-PCR and western blotting data. The same applies for alternative splicing of RPL22L1.

      That said, we do appreciate the point the reviewer is making here, and have done our best to respond. We do not think it is a prudent investment in resources to repeat the numerous omics assays in the manuscript. We also considered normalizing for bulk transcription and translation rates as suggested, but it is not clear in practice how this would be done, and it could introduce additional variables and uncertainties that may skew the interpretation of results. Instead, to respond to this comment, we made the following changes to the manuscript:

      (1) We now explicitly state, for all omics assays, that spike-in controls were not included. These statements will prompt the reader to make their own assessment of the robustness of each of our findings and interpretations.

      (2) We have added new data to the manuscript (Figure 2—figure supplement 1A–B) measuring the impact of C6 and C16 on bulk translation using the OPP labeling method. These new data demonstrate that WIN site inhibitors induce a progressive yet modest decline in protein synthesis capacity. At 24 hours, there is no significant effect of either agent on protein synthesis levels. By 48 hours, a small but significant effect is observed, and by 96 hours translation levels are ~60% of what they are in vehicle-treated control cells. These new data are important because they support the idea that normalization has not blunted the responses we observe—the magnitude of the effects are consistent between the different assays and tend to cap out at two-fold in terms of RPG suppression, translation efficiency, ribosomal protein levels, and protein synthesis capacity.

      (3) We have included additional analysis regarding the LFQMS, as described below, that specifically addresses the issue of normalization in our proteomics experiments.

      (2) Why are the majority of proteins upregulated in the proteomics experiment after 24 h in C6 (if really true after normalization with general protein amount per cell)? This is surprising and needs further explanation.

      The reviewer is correct in noting that (by LFQMS) ~700 proteins are induced after 24 hours of treatment of MV4:11 cells with C16 (not C6, as stated). The reviewer would like us to examine whether this apparent increase in proteins is a normalization artifact. In response to this comment, we have made the following changes to the manuscript:

      (1) Our new OPP labeling experiments (Figure 2—figure supplement 1A–B) show that there is no significant reduction in overall protein synthesis following 24 hours of C16 treatment. In light of this finding, it is unlikely that normalization artifacts, resulting from diminution of the pool of highly abundant proteins, create the appearance of these 700 proteins being induced. We now explicitly make this point in the text.

      (2) We now clarify in the methods how we seeded identical numbers of cells for DMSO and C16-treated cultures in these experiments, and—consistent with our finding that WIN site inhibitors have little if any effect on protein synthesis or proliferation at the 24 hour timepoint— extracted comparable amounts of proteins from these two treatment conditions (DMSO: 344.75 ± 21.7 µg; C16: 366.50 ± 15.8 µg; [Mean ± SEM]).

      (3) We now include in Figure 3—figure supplement 1A a plot showing the distribution of peptide intensities for each protein detected in each run of LFQMS before and after equal median normalization. This new analysis reveals that the distribution of intensities is not appreciably changed via normalization. Specifically, there is not a reduction in peptide intensities in the unnormalized data from 24 hours of C16 treatment that is reversed or tempered by normalization. This analysis provides further support for the notion that the increase we observe is not a normalization artifact.

      (4) We now include in Figure 3—figure supplement 1B–D a set of new analyses examining the relationship between the initial intensity of proteins in DMSO control samples (a crude proxy for abundance) versus the fold change in response to WIN site inhibitor. This analysis shows that we have as many "highly abundant" (10th decile) proteins increasing as we do decreasing in response to WINi. Thus, it appears as though the wholesale clearance of highly abundant proteins from the cell is not occurring at this early treatment timepoint. In addition, this analysis also shows that ribosomal proteins (RP) are generally the most abundant, most suppressed, proteins and that their fold-change at the protein level at 24 hours is less than two-fold, consistent again with the magnitude of transcriptional effects of C16, as measured by RNA-Seq and QuantiGene. The fact that the drop in RP levels is consistent with expectations based on other analyses provides further empirical support for the notion that protein levels inferred from LFQMS are authentic and not skewed by global changes in the proteome.

      The increase in proteins at this time point, we argue, is thus most likely genuine. It is not surprising that—at a timepoint at which protein synthesis is unaffected—several hundred proteins are induced by a factor of two. How this occurs, we do not know. It may be a transient compensatory mechanism, or it may be an early part of the active response to WIN site inhibitors. Lest the reader be confused by this finding, we have now added text to this section of the manuscript discussing and explaining the phenomenon in more detail.

      (3) The description of the two CRISPR screens (GECKO and targeted) is a bit confusing. Do I understand correctly that in the GECKO screen, the treated cells are not compared with nontreated cells of the same time point, but with a time point 0? If so, this screen is not very meaningful and perhaps should be omitted. Also, it is unclear to me what the advantages of the targeted screen are since the targets were not covered with more sgRNAs (data contradictory: 4 or 10 sgRNAs per target?) than in Gecko. Also, genome-wide screens are feasible in culture for multiple conditions. Overall, I find the presentation of the screening results not favorable.

      In essence, this is a single screen performed in two tiers. In Tier 1, we screened a complete GECKO library (six sgRNA/gene) with the earliest generation (less potent) inhibitor C6, and compared sgRNA representation against the time zero population. This screen would reveal sgRNAs that are specifically associated with response to C6, as well as those that are associated with general cell fitness and viability. We then identified genes connected to these sgRNAs, removed those that are pan essential, and built a custom library for the second tier using sgRNAs from the Brunello library (four sgRNA/gene). We then screened this custom library with both C6 and the more potent inhibitor C16, this time against DMSO-treated cells from the same timepoint.

      We acknowledge that this is not the most streamlined setup for a screen. But our intention was to compare two inhibitors (C6 and C16) and identify high confidence 'hits' that are disconnected from general cell viability, rather than generate an exhaustive list of all genes that, when disrupted, skew the response to WIN site inhibitor. The final result of this screen (Figure 4E) is a gene list that has been validated with two chemically distinct WIN site inhibitors and up to 10 unique sgRNAs per gene. We may not have captured every gene that can modulate response to WIN site inhibitor, but those appearing in Figure 4E are highly validated.

      To answer the reviewer's specific questions: (i) we cannot omit the Tier 1 screen because then there would be no rationale for what was screened in the second Tier; and (ii) the advantage of the custom Tier 2 library is that it allowed us to screen hits from the Tier 1 screen with four completely independent sgRNAs. Although there are not more sgRNAs for each gene in the Tier 2 versus the Tier 1 library, these sgRNAs are different and thus, for C6 at least, hits surviving both screens were validated with up to 10 unique sgRNAs.

      We apologize that the description of the CRISPR screens was not clearer, and have reworked this section of the manuscript to make our intent and our actions clearer.

      (4) Can Re-expression of RPL22 rescue the growth arrest of C6?.

      We have not attempted to complement the RPL22 knock out. But we do note that evidence supporting the idea that loss of RPL22 confers resistance to WIN site inhibitor is strong—six (out of six) sgRNAs against RPL22 were significantly enriched in the Tier 1 screen, and independent knock out of RPL22 with the Synthego multi-guide system in MV4;11 and MOLM13 cells increases the GI50 for C16.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Howard et al reports the development of high-affinity WDR5-interaction site inhibitors (WINi) that engage the protein to block the arginine-dependent engagement with its partners. Treatment of MLL-rearranged leukemia cells with high-affinity WINi (C16) decreases the expression of genes encoding most ribosomal proteins and other proteins required for translation. Notably, although these targets are enriched for WDR5-ChIP-seq peaks, such peaks are not universally present in the target genes. High concordance was found between the alterations in gene expression due to C16 treatment and the changes resulting from treatment with an earlier, lower affinity WINi (C6). Besides protein synthesis, genes involved in DNA replication or MYC responses are downregulated, while p53 targets and apoptosis genes are upregulated. Ribosome profiling reveals a global decrease in translational efficiency due to WINi with overall ribosome occupancies of mRNAs ~50% of control samples. The magnitude of the decrements of translation for most individual mRNAs exceeds the respective changes in mRNA levels genome-wide. From these results and other considerations, the authors hypothesize that WINi results in ribosome depletion. Quantitative mass spec documents the decrement in ribosomal proteins following WINi treatment along with increases in p53 targets and proteins involved in apoptosis occurring over 3 days. Notably, RPL22L1 is essentially completely lost upon WINi treatment. The investigators next conduct a CRISPR screen to find moderators and cooperators with WINi. They identify components of p53 and DNA repair pathways as mediators of WINi-inflicted cell death (so gRNAs against these genes permit cell survival). Next, WINi are tested in combination with a variety of other agents to explore synergistic killing to improve their expected therapeutic efficacy. The authors document the loss of the p53 antagonist MDM4 (in combination with splicing alterations of RPL22L1), an observation that supports the notion that WINi killing is p53-mediated.

      Strengths:

      This is a scientifically very strong and well-written manuscript that applies a variety of state-ofthe art molecular approaches to interrogate the role of the WDR5 interaction site and WINi. They reveal that the effects of WINi seem to be focused on the overall synthesis of protein components of the translation apparatus, especially ribosomal proteins-even those that do not bind WDR5 by ChIP (a question left unanswered is how much the WDR5-less genes are nevertheless WINi targeted). They convincingly show that disruption of the synthesis of these proteins is accompanied by DNA damage inferred by H2AX-activation, activation of the p53pathway, and apoptosis. Pathways of possible WINi resistance and synergies with other antineoplastic approaches are explored. These experiments are all well-executed and strongly invite more extensive pre-clinical and translational studies of WINi in animal studies. The studies also may anticipate the use of WINi as probes of nucleolar function and ribosome synthesis though this was not really explored in the current manuscript.

      Weaknesses:

      A mild deficiency in the current manuscript is the absence of cell biological methods to complement the molecular biological and biochemical approaches so ably employed. Some microscopic observations and confirmation of nucleolar dysfunction and DNA damage would be reassuring.

      We thank the reviewer for their comments. We agree that an absence of cell biological methods was a deficiency in the original manuscript. In response to this comment, we have now added immunofluorescence (IF) analyses, examining the impact of C16 on nucleolar integrity and nucleophosmin (NPM1) distribution (Figure 3—figure supplement 4). These new data clearly show that C16 induces nucleolar stress at 72 hours—as measured by the redistribution of NPM1 from the nucleolus to the nucleoplasm. These new data fill an important gap in the story, and we are grateful to the reviewer for prompting us to perform these experiments.

      As part of the above study, we also probed for gamma-H2AX, expecting that we may see some signs of accumulation in the nucleoli (see comment #4 from Reviewer #2, below). We did not observe this response. Importantly, however, we did see that gamma-H2AX staining occurs only in what are overtly apoptotic cells. This is an important finding, because we had previously speculated that the induction of gamma-H2AX observed by Western blotting reflected part of a bona-fide response to DNA damage elicited by WIN site inhibitors. Instead, the IF data now leads us to conclude that this signal simply reflects the established fact that WIN site inhibitors induce apoptosis in this cell line (Aho et al., 2019). In response to this new finding, we have added additional discussion to the text and have removed or de-emphasized the potential contribution of DNA damage to the mechanism of action of WDR5 WIN site inhibitors. Again, we are grateful for this comment as it has prevented us from continuing to report/pursue erroneous observations.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      There is a typo in "but are are linked to mRNA instability when translation is inhibited".

      Thank you for catching this typo. It has now been corrected.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors report that WINi initially (at 24 hrs) increases the expression of most proteins while decreasing ribosomal proteins, but at 72 hours all proteins are depressed. The transient bump-up of non-translation-related proteins seems odd. A simple resolution to this somewhat strange observation is that there is no real increase in the other proteins, but because of the loss of a large fraction of the most abundant cellular proteins (the ribosomal proteins), the relative fraction of all other proteins is increased; that is, the increase of non-ribosomal proteins may be an artifact of normalization to a lower total protein content. Can this be explored?

      We are grateful to the reviewer for this comment. We have tried our best to respond, as detailed above in response to Reviewer #1 Public Comment #2.

      (2) It would be really nice to assess nucleolar status microscopically. Do nucleoli get bigger? Smaller? Do they have abnormal morphology? Is there nucleolar stress? What happens to rRNA synthesis and processing?

      We agree and thank the reviewer for raising this point. As noted in our response to Reviewer #2, above, we have included new IF that shows: (i) no obvious effect on nucleolar integrity, (ii) redistribution of NPM1 to the nucleoplasm (indicative of nucleolar stress), and (iii) induction of gamma-H2AX staining in apoptotic cells (indicative of apoptosis).

      Additionally, in response to this comment, we also looked at the impact of WIN site inhibitors on rRNA synthesis, using AzCyd labeling. These new data appear in Figure 3—figure supplement 3. Interestingly, these new data show that there is a progressive decline in rRNA synthesis, and that by 96 hours of treatment levels of both 18S and 28S rRNAs are reduced— again by about a factor of two. Our interpretation of this finding is that in response to the progressive decline in RPG transcription there is a secondary decrease in rRNA synthesis. This result is perhaps not surprising, but it does again add an important missing piece to our characterization of WIN site inhibitors and is further support for the concept that inhibition of ribosome production is a dominant part of the response to these agents.

      (3) The WINi elicited DNA damage is incompletely characterized, rather it is inferred from H2AX activation. Comet assays would help to confirm such damage.

      As noted in our response to Reviewer #2, our original inference of DNA damage, prompted by gamma-H2AX activation, is erroneous, and due instead to the ability of WIN site inhibitors to induce apoptosis. We thus did not pursue comet assays, etc., and removed discussion of potential DNA damage from the manuscript.

      (4) Staining and microscopic observation of H2AX would be very useful. Is the WINi provoked DNA damage nucleolar-localized? Does the deficiency of ribosomal proteins lead to localized genotoxic nucleolar stress - or alternatively does the paucity of ribosomes and decreased translation lead to imbalances in other cellular pathways, perhaps including some involved in overall genome maintenance which would provoke more global DNA damage and H2AX staining, not limited to the nucleolus.

      Again, please see our response to the Public Comment from Reviewer #2.

      (5) It would be important to assess the influence and effects of WINi on some p53 mutant, p53-/- and p53 wild-type cell lines. Given their prevalence, p53 status may be expected to alter WINi efficacy.

      The issue of how p53 status impacts the response to WINi is interesting and important, but we feel this is beyond the scope of the current manuscript. It is likely that many factors contribute to the response of cancer cells to these agents, and thus simply surveying some cancer lines for their response and linking this to their p53 status is unlikely to be very informative. Making definitive statements about the contribution of p53, and the differences between wild-type, lossof-function mutants, gain of function mutants, and null mutants will require more extensive analyses and is fertile territory for future studies, in our opinion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a useful study examining the determinants and mechanisms of LRMP inhibi:on of cAMP regula:on of HCN4 channel ga:ng. The evidence provided to support the main conclusions is unfortunately incomplete, with discrepancies in the work that reduce the strength of mechanis:c insights.

      Thank you for the reviews of our manuscript. We have made a number of changes to clarify our hypotheses in the manuscript and addressed all of the poten:al discrepancies by revising some of our interpreta:on. In addi:on, we have provided addi:onal experimental evidence to support our conclusions. Please see below for a detailed response to each reviewer comment.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      The authors use truncations, fragments, and HCN2/4 chimeras to narrow down the interaction and regulatory domains for LRMP inhibition of cAMP-dependent shifts in the voltage dependence of activation of HCN4 channels. They identify the N-terminal domain of HCN4 as a binding domain for LRMP, and highlight two residues in the C-linker as critical for the regulatory effect. Notably, whereas HCN2 is normally insensitive to LRMP, putting the N-terminus and 5 additional C-linker and S5 residues from HCN4 into HCN2 confers LRMP regulation in HCN2.

      Strengths:

      The work is excellent, the paper well written, and the data convincingly support the conclusions which shed new light on the interaction and mechanism for LRMP regulation of HCN4, as well as identifying critical differences that explain why LRMP does not regulate other isoforms such as HCN2.

      Thank you.

      Reviewer #2 (Public Review):

      Summary:

      HCN-4 isoform is found primarily in the sino-atrial node where it contributes to the pacemaking activity. LRMP is an accessory subunit that prevents cAMP-dependent potentiation of HCN4 isoform but does not have any effect on HCN2 regulation. In this study, the authors combine electrophysiology, FRET with standard molecular genetics to determine the molecular mechanism of LRMP action on HCN4 activity. Their study shows that parts of N- and C-termini along with specific residues in C-linker and S5 of HCN4 are crucial for mediating LRMP action on these channels. Furthermore, they show that the initial 224 residues of LRMP are sufficient to account for most of the activity. In my view, the highlight of this study is Fig. 7 which recapitulates LRMP modulation on HCN2-HCN4 chimera. Overall, this study is an excellent example of using time-tested methods to probe the molecular mechanisms of regulation of channel function by an accessory subunit.

      Weaknesses:

      (1) Figure 5A- I am a bit confused with this figure and perhaps it needs better labeling. When it states Citrine, does it mean just free Citrine, and "LRMP 1-230" means LRMP fused to Citrine which is an "LF" construct? Why not simply call it "LF"? If there is no Citrine fused to "LRMP 1-230", this figure would not make sense to me.

      We have clarified the labelling of this figure and specifically defined all abbreviations used for HCN4 and LRMP fragments in the results section on page 14.

      (2) Related to the above point- Why is there very little FRET between NF and LRMP 1-230? The FRET distance range is 2-8 nm which is quite large. To observe baseline FRET for this construct more explanation is required. Even if one assumes that about 100 amino are completely disordered (not extended) polymers, I think you would still expect significant FRET.

      FRET is extremely sensitive to distance (to the 6th power of distance). The difference in contour length (maximum length of a peptide if extended) between our ~260aa fragment and our ~130 aa fragments is on the order of 450Å (45nm), So, even if not extended it is not hard to imagine that the larger fragments show a weaker FRET signal. In fact, we do see a slightly larger FRET than we do in control (not significant) which is consistent with the idea that the larger fragments just do not result in a large FRET.

      Moreover, this hybridization assay is sensitive to a number of other factors including the affinity between the two fragments, the expression of each fragment, and the orientation of the fluorophores. Any of these factors could also result in reduced FRET.

      We have added a section on the limitations of the FRET 2-hybrid assay in the discussion section on page 20. Our goal with the FRET assay was to provide complimentary evidence that shows some of the regions that are important for direct association and we have edited to the text to make sure we are not over-interpreting our results.

      (3) Unless I missed this, have all the Cerulean and Citrine constructs been tested for functional activity?

      All citrine-tagged LRMP constructs (or close derivatives) were tested functionally by coexpression with HCN (See Table 1 and pages 10-11). Cerulean-tagged HCN4 fragments are of course intrinsically not-functional as they do not include the ion conducting pore.

      Reviewer #3 (Public Review):

      Summary:

      Using patch clamp electrophysiology and Förster resonance energy transfer (FRET), Peters and co-workers showed that the disordered N-terminus of both LRMP and HCN4 are necessary for LRMP to interact with HCN4 and inhibit the cAMP-dependent potentiation of channel opening. Strikingly, they identified two HCN4-specific residues, P545 and T547 in the C-linker of HCN4, that are close in proximity to the cAMP transduction centre (elbow Clinker, S4/S5-linker, HCND) and account for the LRMP effect.

      Strengths:

      Based on these data, the authors propose a mechanism in which LRMP specifically binds to HCN4 via its isotype-specific N-terminal sequence and thus prevents the cAMP transduction mechanism by acting at the interface between the elbow Clinker, the S4S5-linker, the HCND.

      Weaknesses:

      Although the work is interesting, there are some discrepancies between data that need to be addressed.

      (1) I suggest inserting in Table 1 and in the text, the Δ shift values (+cAMP; + LRMP; +cAMP/LRMP). This will help readers.

      Thank you, Δ shift values have been added to Tables 1 and 2 as suggested.

      (2) Figure 1 is not clear, the distribution of values is anomalously high. For instance, in 1B the distribution of values of V1/2 in the presence of cAMP goes from - 85 to -115. I agree that in the absence of cAMP, HCN4 in HEK293 cells shows some variability in V1/2 values, that nonetheless cannot be so wide (here the variability spans sometimes even 30 mV) and usually disappears with cAMP (here not).

      With a large N, this is an expected distribution. In 5 previous reports from 4 different groups of HCN4 with cAMP in HEK 293 (Fenske et al., 2020; Liao et al., 2012; Peters et al., 2020; Saponaro et al., 2021; Schweizer et al., 2010), the average expected range of the data is 26.6 mV and 39.9 mV for 95% (mean ± 2SD) and 99% (mean ± 3SD) of the data, respectively. As the reviewer mentions the expected range from these papers is slightly larger in the absence of cAMP. The average SD of HCN4 (with/without cAMP) in papers are 9.9 mV (Schweizer et al., 2010), 4.4 mV (Saponaro et al., 2021), 7.6 mV (Fenske et al., 2020), 10.0 mV (Liao et al., 2012), and 5.9 mV (Peters et al., 2020). Our SD in this paper is roughly in the middle at 7.6 mV. This is likely because we used an inclusive approach to data so as not to bias our results (see the statistics section of the revised manuscript on page 9). We have removed 2 data points that meet the statistical classification as outliers, no measures of statistical significance were altered by this.

      This problem is spread throughout the manuscript, and the measured mean effects are indeed always at the limit of statistical significance. Why so? Is this a problem with the analysis, or with the recordings?

      The exact P-values are NOT typically at the limit of statistical significance, about 2/3rds would meet the stringent P < 0.0001 cut-off. We have clarified in the statistics section (page 10) that any comparison meeting our significance threshold (P < 0.05) or a stricter criterion is treated equally in the figure labelling. Exact P-values are provided in Tables 1-3.

      There are several other problems with Figure 1 and in all figures of the manuscript: the Y scale is very narrow while the mean values are marked with large square boxes. Moreover, the exemplary activation curve of Figure 1A is not representative of the mean values reported in Figure 1B, and the values of 1B are different from those reported in Table 1.

      Y-axis values for mean plots were picked such that all data points are included and are consistent across all figures. They have been expanded slightly (-75 to -145 mV for all HCN4 channels and -65 to -135 mV for all HCN2 channels). The size of the mean value marker has been reduced slightly. Exact midpoints for all data are also found in Tables 1-3.

      The GV curves in Figure 1B (previously Fig. 1A) are averages with the ±SEM error bars smaller than the symbols in many cases owing to relatively high n’s for these datasets. These curves match the midpoints in panel 1C (previously 1B). Eg. the midpoint of the average curve for HCN4 control in panel A is -117.9 mV, the same as the -117.8 mV average for the individual fits in panel B.

      We made an error in the text based on a previous manuscript version about the ordering of the tables that has now been fixed so these values should now be aligned.

      On this ground, it is difficult to judge the conclusions and it would also greatly help if exemplary current traces would be also shown.

      Exemplary current traces have been added to all figures in the revised manuscript.

      (3) "....HCN4-P545A/T547F was insensitive to LRMP (Figs. 6B and 6C; Table 1), indicating that the unique HCN4 C-linker is necessary for regulation by LRMP. Thus, LRMP appears to regulate HCN4 by altering the interactions between the C-linker, S4-S5 linker, and Nterminus at the cAMP transduction centre."

      Although this is an interesting theory, there are no data supporting it. Indeed, P545 and T547 at the tip of the C-linker elbow (fig 6A) are crucial for LRMP effect, but these two residues are not involved in the cAMP transduction centre (interface between HCND, S4S5 linker, and Clinker elbow), at least for the data accumulated till now in the literature. Indeed, the hypothesis that LRMP somehow inhibits the cAMP transduction mechanism of HCN4 given the fact that the two necessary residues P545 and T547 are close to the cAMP transduction centre, remains to be proven.

      Moreover, I suggest analysing the putative role of P545 and T547 in light of the available HCN4 structures. In particular, T547 (elbow) points towards the underlying shoulder of the adjacent subunit and, therefore, is in a key position for the cAMP transduction mechanism. The presence of bulky hydrophobic residues (very different nature compared to T) in the equivalent position of HCN1 and HCN2 also favours this hypothesis. In this light, it will be also interesting to see whether a single T547F mutation is sufficient to prevent the LRMP effect.

      We agree that testing this hypothesis would be very interesting. However, it is challenging. Any mutation we make that is involved in cAMP transduction makes measuring the LRMP effect on cAMP shifts difficult or impossible.

      Our simple idea, now clarified in the discussion, is that if you look at the regions involved in cAMP transduction (HCND, C-linker, S4-S5), there are very few residues that differ between HCN4 and HCN2. When we mutate the 5 non-conserved residues in the S5 segment and the C-linker, along with the NT, we are able to render HCN2 sensitive to LRMP. Therefore, something about the small sequence differences in this region confer isoform specificity to LRMP. We speculate that this happens because of small structural differences that result from those 5 mutations. If you compare the solved structures of HCN1 and HCN4 (there is no HCN2 structure available), you can see small differences in the distances between key interacting residues in the transduction centre. Also, there is a kink at the bottom of the S4 helix in HCN4 but not HCN1. This points a putatively important residue for cAMP dependence in a different direction in HCN4. We hypothesize in the discussion that this may be how LRMP is isoform specific.

      Moreover, previous work has shown that the HCN4 C-linker is uniquely sensitive to di-cyclic nucleotides and magnesium ions. We are hypothesizing that it is the subtle change in structure that makes this region more prone to regulation in HCN4.

      Reviewing Editor (recommendations for the Authors):

      (1) Exemplar recordings need to be shown and some explanation for the wide variability in the V-half of activation.

      Exemplar currents are now shown for each channel. See the response to Reviewer 3’s public comment 2.

      (2) The rationale for cut sites in LRMP for the investigation of which parts of the protein are important for blocking the effect of cAMP is not logically presented in light of the modular schematics of domains in the protein (N-term, CCD, post-CCD, etc).

      There is limited structural data on LRMP and the HCN4 N-terminus. The cut sites in this paper were determined empirically. We made fragments that were small enough to work for our FRET hybridization approach and that expressed well in our HEK cell system. The residue numbering of the LRMP modules is based on updated structural predictions using Alphafold, which was released after our fragments were designed. This has been clarified in the methods section on pages 5-6 and the Figure 2 legend of the revised manuscript.

      (3) Role of the HCN4 C-terminus. Truncation of the HCN4 C-terminus unstructured Cterminus distal to the CNBD (Fig. 4 A, B) partially reverses the impact of LRMP (i.e. there is now a significant increase in cAMP effect compared to full-length HCN4). The manuscript is written in a manner that minimizes the potential role of the C-terminus and it is, therefore, eliminated from consideration in subsequent experiments (e.g. FRET) and the discussion. The model is incomplete without considering the impact of the C-terminus.

      We thank the reviewer for this comment as it was a result that we too readily dismissed. We have added discussion around this point and revised our model to suggest that not only can we not eliminate a role for the distal C-terminus, our data is consistent with it having a modest role. Our HCN4-2 chimera and HCN4-S719x data both suggest the possibility that the distal C-terminus might be having some effect on LRMP regulation. We have clarified this in the results (pages 12-13) and discussion (page 19).

      (4) For FRET experiments, it is not clear why LF should show an interaction with N2 (residues 125-160) but not NF (residues 1-160). N2 is contained within NF, and given that Citrine and Cerulean are present on the C-terminus of LF and N2/NF, respectively, residues 1-124 in NF should not impact the detection of FRET because of greater separation between the fluorophores as suggested by the authors.

      This is a fair point but FRET is somewhat more complicated. We do not know the structure of these fragments and it’s hard to speculate where the fluorophores are oriented in this type of assay. Moreover, this hybridization assay is sensitive to affinity and expression as well. There are a number of reasons why the larger 1-260 fragment might show reduced FRET compared to 125-260. As mentioned in our response to reviewer 2’s public comment 2, we have added a limitation section that outlines the various caveats of FRET that could explain this.

      (5) For FRET experiments, the choice of using pieces of the channel that do not correlate with the truncations studied in functional electrophysiological experiments limits the holistic interpretation of the data. Also, no explanation or discussion is provided for why LRMP fragments that are capable of binding to the HCN4 N-terminus as determined by FRET (e.g. residues 1-108 and 110-230, respectively) do not have a functional impact on the channel.

      As mentioned in the response to comment 2, the exact fragment design is a function of which fragments expressed well in HEK cells. Importantly, because FRET experiments do not provide atomic resolution for the caveats listed in the revised limitations section on page 20-21, small differences in the cut sites do not change the interpretation of these results. For example, the N-terminal 1-125 construct is analogous to experiments with the Δ1-130 HCN4 channel.

      We suspect that residues in both fragments are required and that the interaction involves multiple parts. This is stated in the results “Thus, the first 227 residues of LRMP are sufficient to regulate HCN4, with residues in both halves of the LRMP N-terminus necessary for the regulation” (page 11). We have also added discussion on this on page 21.

      (6) A striking result was that mutating two residues in the C-linker of HCN4 to amino acids found in HCN channels not affected by LRMP (P545A, T547F), completely eliminated the impact of LRMP on preventing cAMP regulation of channel activation. However, a chimeric channel, (HCN4-2) in which the C-linker, the CNBD, and the C-terminus of HCN4 were replaced by that of HCN2 was found to be partially responsive to LRMP. These two results appear inconsistent and not reconciled in the model proposed by the authors for how LRMP may be working.

      As stated in our answer to your question #3, we have revised our interpretation of these data. If the more distal C-terminus plays some role in the orientation of the C-linker and the transduction centre as a whole, these data can still be viewed consistent with our model. We have added some discussion of this idea in our discussion section.

      (7) Replacing the HCN2 N-terminus with that from HCN4, along with mutations in the S5 (MCS/VVG) and C-linker (AF/PT) recapitulated LRMP regulation on the HCN2 background. The functional importance of the S5 mutations is not clear as no other experiments are shown to indicate whether they are necessary for the observed effect.

      We have added our experiments on a midpoint HCN2 clone that includes the S5 mutants and the C-linker mutants in the absence of the HCN4 N-terminus (ie HCN2 MCSAF/VVGPT) (Fig. 7). And we have discussed our rationale for the S5 mutations as we believe they may be responsible for the different orientations of the S4-S5 linker in HCN1 and HCN4 structures that are known to impact cAMP regulation.

      Reviewer #1 (Recommendations For The Authors):

      A) Comments:

      (1) Figure 1: Please show some representative current traces.

      Exemplar currents are now shown for each channel in the manuscript.

      (2) Figure 1: There appears to be a huge number of recordings for HCN4 +/- cAMP as compared to those with LRMP 1-479Cit. How was the number of recordings needed for sufficient statistical power decided? This is particularly important because the observed slowing of deactivation by cAMP in Fig. 1C seems like it may be fairly subtle. Perhaps a swarm plot would make the shift more apparent? Also, LRMP 1-479Cit distributions in Fig. 1B-C look like they are more uniform than normal, so please double-check the appropriateness of the statistical test employed.

      We have revised the methods section (page 7) to discuss this, briefly we performed regular control experiments throughout this project to ensure that a normal cAMP response was occurring. Our minimum target for sufficient power was 8-10 recordings. We have expanded the statistics section (page 9) to discuss tests of normality and the use of a log scale for deactivation time constants which is why the shifts in Fig. 1D (revised) are less apparent.

      (3) It would be helpful if the authors could better introduce their logic for the M338V/C341V/S345G mutations in the HCN4-2 VVGPT mutant.

      See response to the reviewing editor’s comment 7.

      B) Minor Comments:

      (1) pg. 9: "We found that LRMP 1-479Cit inhibited HCN4 to an even greater degree than the full-length LRMP, likely because expression of this tagged construct was improved compared to the untagged full-length LRMP, which was detected by co-transfection with GFP." Co-transfection with GFP seems like an extremely poor and a risky measure for LRMP expression.

      We agree that the exact efficiency of co-transfection is contentious although some papers and manufacturer protocols indicate high co-transfection efficiency (Xie et al., 2011). In this paper we used both co-transfection and tagged proteins with similar results.

      (2) pg 9: "LRMP 1-227 construct contains the N-terminus of LRMP with a cut-site near the Nterminus of the predicted coiled-coil sequence". In Figure 2 the graphic shows the coiledcoil domain starting at 191. What was the logic for splitting at 227 which appears to be the middle of the coiled-coil?

      See response to the reviewing editor’s comment 2.

      (3) Figure 5C: Please align the various schematics for HCN4 as was done for LRMP. It makes it much easier to decipher what is what.

      Fig. 5 has been revised as suggested.

      (4) pg 12: I assume that the HCN2 fragment chosen aligns with the HCN4 N2 fragment which shows binding, but this logic should be stated if that is the case. If not, then how was the HCN2 fragment chosen?

      This is correct. This has been explicitly stated in the revised manuscript (page 14).

      (5) Figure 7: Add legend indicating black/gray = HCN4 and blue = HCN2.

      This has been stated in the revised figure legend.

      (6) pg 17: Conservation of P545 and T547 across mammalian species is not shown or cited.

      This sentence is not included in the revised manuscript, however, for the interest of the reviewer we have provided an alignment of this region across species here.

      Author response image 1.

      Reviewer #2 (Recommendations For The Authors):

      (1) It is not clear whether in the absence of cAMP, LRMP also modestly shifts the voltagedependent activity of the channels. Please clarify.

      We have clarified that LRMP does not shift the voltage-dependence in the absence of cAMP (page 10). In the absence of cAMP, LRMP does not significantly shift the voltagedependence of activation in any of the channels we have tested in this paper (or in our prior 2020 paper).

      (2) Resolution of Fig. 8b is low.

      We ultimately decided that the cartoon did not provide any important information for understanding our model and it was removed.

      (3) Please add a supplementary figure showing the amino acid sequence of LRMP to show where the demarcations are made for each fragment as well as where the truncations were made as noted in Fig 3 and Fig 4.

      A new supplementary figure showing the LRMP sequence has been added and cited in the methods section (page 5). Truncation sites have been added to the schematic in Fig. 2A.

      (4) In the cartoon schematic illustration for Fig. 3 and Fig.4, the legend should include that the thick bold lines in the C-Terminal domain represent the CNBD, while the thick bold lines in the N-Terminal domain represent the HCN domain. This was mentioned in Liao 2012, as you referenced when you defined the construct S719X, but it would be nice for the reader to know that the thick bold lines you have drawn in your cartoon indicate that it also highlights the CNBD or the HCN domain.

      This has been added to figure legends for the relevant figures in the revised manuscript.

      (5) On page 12, missing a space between "residues" and "1" in the parenthesis "...LRMP L1 (residues1-108)...".

      Fixed. Thank you.

      (6) Which isoform of LRMP was used? What is the NCBI accession number? Is it the same one from Peters 2020 ("MC228229")?

      This information has been added to the methods (page 5). It is the same as Peters 2020.

      Reviewer #3 (Recommendations For The Authors):

      (1) "Truncation of residues 1-62 led to a partial LRMP effect where cAMP caused a significant depolarizing shift in the presence of LRMP, but the activation in the presence of LRMP and cAMP was hyperpolarized compared to cAMP alone (Fig. 3B, C and 3E; Table 1). In the HCN4Δ1-130 construct, cAMP caused a significant depolarizing shift in the presence of LRMP; however, the midpoint of activation in the presence of LRMP and cAMP showed a non-significant trend towards hyperpolarization compared to cAMP alone (Fig. 3C and 3E; Table 1)".

      This means that sequence 62-185 is necessary and sufficient for the LRMP effect. I suggest a competition assay with this peptide (synthetic, or co-expressed with HCN4 full-length and LRMP to see whether the peptide inhibits the LRMP effect).

      We respectfully disagree with the reviewer’s interpretation. Our results, strongly suggest that other regions such as residues 25-65 (Fig. 3C) and C-terminal residues (Fig. 6) are also necessary. The use of a peptide could be an interesting future experiment, however, it would be very difficult to control relative expression of a co-expressed peptide. We think that our results in Fig. 7E-F where this fragment is added to HCN2 are a better controlled way of validating the importance of this region.

      (2) "Truncation of the distal C-terminus (of HCN4) did not prevent LRMP regulation. In the presence of both LRMP and cAMP the activation of HCN4-S719X was still significantly hyperpolarized compared to the presence of cAMP alone (Figs. 4A and 4B; Table 1). And the cAMP-induced shift in HCN4-S719X in the presence of LRMP (~7mV) was less than half the shift in the absence of LRMP (~18 mV)."

      On the basis of the partial effects reported for the truncations of the N-terminus of HCN4 162 and 1-130 (Fig 3B and C), I do not think it is possible to conclude that "truncation of the distal C-terminus (of HCN4) did not prevent LRMP regulation". Indeed, cAMP-induced shift in HCN4 Δ1-62 and Δ1-130 in the presence of LRMP were 10.9 and 10.5 mV, respectively, way more than the ~7mV measured for the HCN4-S719X mutant.

      As you rightly stated at the end of the paragraph:" Together, these results show significant LRMP regulation of HCN4 even when the distal C-terminus is truncated, consistent with a minimal role for the C-terminus in the regulatory pathway". I would better discuss this minimal role of the C-terminus. It is true that deletion of the first 185 aa of HCN4 Nterminus abolishes the LRMP effect, but it is also true that removal of the very Cterm of HCN4 does affect LRMP. This unstructured C-terminal region of HCN4 contains isotype-specific sequences. Maybe they also play a role in recognizing LRMP. Thus, I would suggest further investigation via truncations, even internal deletions of HCN4-specific sequences.

      Please see the response to the reviewing editor’s comment 3.

      (3) Figure 5: The N-terminus of LRMP FRETs with the N-terminus of HCN4.

      Why didn't you test the same truncations used in Fig. 3? Indeed, based on Fig 3, sequences 1-25 can be removed. I would have considered peptides 26-62 and 63-130 and 131-185 and a fourth (26-185). This set of peptides will help you connect binding with the functional effects of the truncations tested in Fig 3.

      Please see the response to the reviewing editor’s comment 2 and 5.

      Why didn't you test the C-terminus (from 719 till the end) of HCN4? This can help with understanding why truncation of HCN4 Cterminus does affect LRMP, tough partially (Fig. 4A).

      Please see the response to the reviewing editor’s comment 3.

      (4) "We found that a previously described HCN4-2 chimera containing the HCN4 N-terminus and transmembrane domains (residues 1-518) with the HCN2 C-terminus (442-863) (Liao et al., 2012) was partially regulated by LRMP (Fig. 7A and 7B)".

      I do not understand this partial LRMP effect on the HCN4-2 chimera. In Fig. 6 you have shown that the "HCN4-P545A/T547F was insensitive to LRMP (Figs. 6B and 6C; Table 1), indicating that the unique HCN4 C-linker is necessary for regulation by LRMP". How can be this reconciled with the HCN4-2 chimera? HCN4-2, "containing" P545A/T547F mutations, should not perceive LRMP.

      Please see the response to the reviewing editor’s comment 6.

      (5) "we next made a targeted chimera of HCN2 that contains the distal HCN4 N-terminus (residues 1-212) and the HCN2 transmembrane and C-terminal domains with 5 point mutants in non-conserved residues of the S5 segment and C-linker elbow (M338V/C341V/S345G/A467P/F469T)......Importantly, the HCN4-2 VVGPT channel is insensitive to cAMP in the presence of LRMP (Fig. 7C and 7D), indicating that the HCN4 Nterminus and cAMP-transduction centre residues are sufficient to confer LRMP regulation to HCN2".

      Why did you insert also the 3 mutations of S5? Are these mutations somehow involved in the cAMP transduction mechanism?

      You have already shown that in HCN4 only P545 and T547 (Clinker) are necessary for LRMP effect. I suggest to try, at least, the chimera of HCN2 with only A467P/F469T. They should work without the 3 mutations in S5.

      Please see the response to the reviewing editor’s comment 7.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • Line 144, after eq. (1). Vectors d_i need to be defined. Are these the mapping of vectors e_i due to the active deformation? It would be useful to state then that d_3 is aligned with r'.

      Thank you for your suggestion, and the definition has been added to lines 146-149 for a better understanding of the model.

      • Line 144.Authors state a_i(0,0,Z)=0. Shouldn't this be true also for any angle, i.e., a_i(0,Theta, Z)=0?

      Thank you, we have revised it in line 144.

      • Line 156. G_0 is defined as Diag(1,g_0(t), 1), which seems to be using cylindrical coordinates. Previously, in line 147, vector argument X of \chi is defined with Cartesian coordinates (X,Y,Z). Shouldn't these be also cylindrical?

      We are very sorry for this error, our initial configuration is defined with cylindrical coordinates, we have revised it in the manuscript line 151.

      • Line 162. "where alpha and beta lie in the range [-pi/2, pi/2]" has already been indicated.

      Thank you for your mention, we have deleted duplicate information in line 166.

      • Line 171. W is defined as the strain energy density, while in equation (2), symbol W is the total energy (which depends on the previous W). Letters for total elastic and strain energy must be distinguished.

      Thank you, we have changed the letter for total energy in Eq.(2).

      • Line 176. "we take advantage of the weakness of" -> "we take advantage of the small value of".

      We have revised it in line 179.

      • Line 177. Why is there a subscript i in p_i? If these do not correspond to penalty p, but to parameters in eqn (3), the latter should have been introduced before this line.

      We have revised this error in line 180.

      • Line 186. "as the overall elongation \zeta". This parameter, axial extension, has not been defined yet.

      Thank you for your mention, the definition of \zeta is now given in line 146.

      • Figure 4. Why are the values of g_0 from the elastic model and equations (30)-(32) so non-smooth? Clarify what is being fit and what is the input in the latter equations. Final external radius R_3? Final internal radius R_1'?

      (1) To mimic the embryo, we consider a multi-layered cylindrical body so that the shear modulus of each layer is different. The continuity of both deformations and stresses is imposed (see Eq.(26)-Eq.(30). This is the usual treatment for complex morpho-elastic systems. Obviously, $g_0$ originates from the actomyosin cortex so it appears only in the corresponding layer. Finally, all physical quantities such as deformations and stresses must be continuous.

      (2) The final outer radius is R_3, which represents the outer radius of C. elegans embryos. In addition to R_3, what we need to consider in this model are R_1’=0.7, R_1’=0.768, R_2=0.8 and R_2’=0.96, these definitions have been added in the caption of Appendix 2—figure 1.

      • Line 663, equation (19). Parameter mu is multiplying penalisation term with p, while in equation (2) mu is only affecting the elastic part.

      These two different ways of expressing the energy function will ultimately affect the value of p, but the two p are not the same quantities, so they will not affect our results. To avoid misunderstandings, we will replace p in equation (19) with q.

      Reviewer #2 (Recommendations For The Authors):

      As mentioned in my public summary, I find the writing really not adequate. I provide here a list of specific points that the authors should in my opinion address. As a general comment, I would delete many instances of 'the'.

      First, here are figures and whole paragraphs that do not seem to bring anything to the understanding of the phenomenon of C. elegans elongation, notably, Figs. 2, 3C-H, 5m, and 6. Figures 6G and 7 are the only figures containing results it seems. Some elements of the figures are repeated, for example, the illustration of the system's cross-section in Figs 3 and 5.

      Thank you for your suggestion, we have made some adjustments to our images to remove some of the duplicate information.

      Second, and this is my most important criticism: the mechanism of elongation by releasing elastic stress introduced by muscle contraction is not explained in clear terms anywhere in the text. At least, I was unable to understand it. On p 10 you write "This energy exchange causes the torsion-bending energy to convert into elongation energy, (...)" How this is done is not explained. I assume that the reference state is somehow changed through muscle contraction. The new reference state probably has a longer axis than the one before, but this would then be a plastic deformation and not purely elastic as claimed by the authors (ll 76: "This work aims to answer this paradox within the framework of finite elasticity without invoking cell plasticity (...)"). Is torsion important for this process or is it 'just' another way to store elastic energy in the system?

      We perfectly explain most of the exchange of energy between bending, torsion and elongation: indeed, we quantify all aspects of this transformation as the elastic elongation energy, and the dissipation processes which will cost energy. The dissipation evaluated here concerns the rotation of the worm due to the muscle geometry and the viscous friction at the inner surface of the egg. Torsion seems to appear in the late stages and only in some cases. As we show, it comes from a torque induced by the muscles which are not vertical. vertical. Finally, our quantitative predictions of the modelling which recovers most of the experimental published results.

      Third, there are a number of strange phrasings and the notation is not helpful in places.

      We feel sorry for that, the manuscript is now more precise.

      Fourth, the title promises to explain how cyclic muscle contractions reinforce acto-myosin motors. I can't see this done in this work.

      The fact that the acto-myosin is reorganized between two sequences of contraction justifies the title. The complete reorganization of the actomyosin network would require a chemico-mechanical model that is not achieved here, perhaps in future work as data become available.

      In addition:

      We have chosen to respond globally rather than point by point to the referee’s recommendations.

      Typographic errors and vocabulary

      All English corrections and typos are now included in the main text.

      Figures and captions:

      Figures and captions have been improved.

      • Figure 1: Make the caption and the illustration more coherent. For example, only two cell types are distinguished; in the caption, you mention lateral cells, in the sketch seam cells. What is the difference between acto-myosin and muscle contraction? Muscle contraction is also auto-myosin-based.

      (1) The caption for Fig.1 is revised.

      (2) From a mechanical point of view, actomyosin bundles in C elegans are orthoradial, whereas muscles are essentially parallel to the main axis of the body are essentially parallel to the main axis of the body, so the geometry is completely different and of extreme importance for deformation. Muscle contractions are quasi-periodic, we do not know the dynamics of the attached molecular motor of myosin. So of course, both contain actin and myosin (not exactly the same proteins), but our model is sensitive to more macroscopic properties.

      • Figure 2: I do not find this figure helpful. I might expect such a figure in a grant proposal, but much less in an article.

      Figure 2 shows the strategy of our work, we hope that readers can see at a glance what kind of analysis has been done through this figure: since our work is divided into several parts, readers can also unravel the logic through this scheme after reading the whole manuscript. So, this diagram is a guide, and it may be helpful and necessary.

      • Figure 3: Figure 3 A, right: What is the dashed line? B You indicate fibers, but your model does not contain fibers, does it? How do I get from the cube to the deformed object? What is the relation of C-H with the rest of the work? Furthermore, you mention seam cells in Fig. 1, but they are absent here. Why can you neglect them? Why introduce them in the first place? E What is a plant vine? F-H What rods are you referring to? Plants do not have muscles, right?

      We have modified this figure, and the original Figure 3 now corresponds to Figures 3 and 4.

      (1) The dashed line is the centerline after deformation.

      (2) The referee is wrong: our model represents the fibers by a higher shear modulus for the actomyosin cortex and for the muscles (see Table Appendix 1) and G_1 reflects the activities of the muscle and actin fibers.

      (3) The cube in Figure 3 is a mathematical 3D volume element that is subjected to stresses. Hyperelasticity modelling is based on such a representation.

      (4) C-H(new version: Fig.4 A-F): These images show similar deformations: bending and torsion as our C. elegans study. These figures indicate that such deformations are quite common in nature, even if the underlying mechanism is different.

      (5) This is a point we have already mentioned: we ignore the difference between the different types of epidermal cells and average their role in the early and second stages of elongation.

      (6) The plant vine is the 'botanical vine', see Goriely's article and book.

      (7) F-H(new version: Fig.4 D-F) do not have fixed rods, we set a curvature and torsion to fit the actual biological behavior.

      (8) Plants do not have muscles, but they grow, and our formalism for growth, pre-strain and material plasticity is very similar to the hyper-elasticity formalism.

      • Figure 4: Fig .4 A: "The central or inner part (0 < 𝑅 < 𝑅2, shear modulus 𝜇𝑖) except the muscles which are stiffer." I do not understand.

      In the new version, this figure corresponds to Fig.5. The shear modulus of the intrinsic part is very small, but the muscles are harder so we have to consider them separately, we have revised this sentence to avoid misunderstanding.

      • Figure 5: Fig 5 A and D: The schematic of the cross-section has appeared already in the previous figure. No need to repeat it here. The same holds for the schematic of the cylindrical embryo. Caption: "But, the yellow region is not an actual tissue layer and it is simply to define the position of muscles." Why do you introduce the yellow region at all? I do not think that it clarifies anything. "Deformation diagram, when left side muscles M_1 and M_2." Something seems to be missing here. Similarly in the next sentence. "the actin fiber orientation changes from the 'loop' to the 'slope'" Do the rings break up and form a helix?

      In the new version, this figure corresponds to Fig.6.

      (1) We have made revisions to these figures.

      (2) The yellow part can show the accurate location of four muscles, which is important for our model and further calculations.

      (3) We have revised this sentence in the caption of Fig. 6.

      (4) Actin rings do not change to a helix pattern, they will be only sloping.

      • Figure 6: Fig 6 A-C These panels do not go beyond Fig 5B. Fig 6D: what are these images supposed to show? They are not really graphs, but microscopy images. The caption is not helpful to understand, what the reader is supposed to see here. Fig 6F: do you really want to plot a linear curve?

      In the new version, Fig.5 and Fig.6 respectively correspond to Fig.6 and Fig.7.

      (1) Fig.6 shows the simulated images, and Fig.7 A-C is the real calculation results, they are different.

      (2) Fig.7 D can show the real condition during C. elegans late elongation, here, we would like to show the torsion of the C. elegans.

      (3) Yes, it is our result.

      Discussions concerning the biological referee questions:

      Ll 75: “how the muscle contractions couple to the acto-myosin activity" Again I find this misleading because muscle contraction relies on auto-myosin activity. Probably, you can find a better expression to refer to the activity of the actomyosin network in the epidermis. Do you propose any mechanism for how muscle contraction increases epidermal contractility? This does not seem to be the mechanism that you propose for elongation, is it?

      The actomyosin activity will not stop because of the muscle contraction. Obviously, these two processes cannot be independent. The energy released by a muscle contraction event can and must contribute to the reorganization of the actomyosin network that occurs during the elongation process. Indeed, despite the fact that the embryo elongates, the density of actin cables appears to be maintained, which automatically requires a redistribution of actin monomers. We propose a scenario in which muscle contraction increases actomyosin contractility via energy conversion. We show that after unilateral contraction there is an energy release for this once all dissipation factors are eliminated. We invite the reviewer to re-examine Figure 2 and invite biologists to seriously evaluate the density of molecular motors attached to the circumferential actin cable throughout the stretch process.

      Ll 133: "we decide to simplify the geometrical aspect because of the mechanical complexity" This is hardly a justification. Why is it appropriate?

      Yes, we would like to offer the reader the simplest modelling with a limiting technicity and a limited number of unknown parameters.

      L 135: "active strains" Why not active stress?

      The two are equivalent, the choice is dictated by the simplicity of deriving quantitative results for comparison with experiments.

      L 170: "hyperelastic" Please, explain this term.

      It is the elasticity of very soft samples subjected to large deformations. For classic references, see the books of Ogden, Holzapfel and Goriely, all of which are mentioned in our paper.

      Major criticism

      Eq. 3 and Ll 227: "𝑝1 is the ratio between the free available myosin population and the attached ones divided by the time of recruitment" Why is the time of recruitment the same for all motors? "inverse of the debonding time" Is it the same as the unbinding rate? Why use the symbol p_2 for it? What is p_3?

      The model proposed to justify the increase in the activity of the actomyosin motors during the first phase is a mean-field model: thus all quantities are averaged: we are not considering the theory of a single molecular motor, but a collection in a dynamic environment, so we do not need stochasticity here. Equation (3) concerns the compressive pre-strain, which by definition is a quantity varying between $0$ and $1$ and $X_g=1-G$. ... The debonding time is not the same as the debonding rate. The term $p_3$ indicates saturation and is derived from the law of mass action. The good agreement with the experimental data is shown in Fig.5 (A) and (B). An equivalent model has been developed by (M. Serra et al.).

      Serra M, Serrano Nájera G, Chuai M, et al. A mechanochemical model recapitulates distinct vertebrate gastrulation modes[J]. Science Advances, 2023, 9(49)

      Ll 275: "This energy exchange causes the torsion-bending energy to convert into elongation energy, leading to a length increase during the relaxation phase, as shown in Fig.1 of Appendix 5." You have posed the puzzle of how contraction leads to elongation, and now that you resolve the puzzle, you simply say that torsion and bending energy are converted into elongation. How? Usually, if I deform an elastic object, it will return to its original configuration after releasing the external forces. Why is this not the case here?

      Furthermore, the central result of your work is presented in an Appendix!?

      We agree with the referee that an elastic object will return to its initial configuration by releasing stress, i.e. by giving up its accumulated elastic energy to the environment. But the elastic energy has to go somewhere, such as heat. We do not dare to say that the temperature of the worm increases during the muscle contractions.

      In fact, the referee's comment also assumes that full relaxation of the stresses is possible, so the object is not a multi-layered specimen and/or it is not enclosed in a box. Most living species are under stress, usually called residual stress. Our skin is under stress. Our fingerprints result from an elastic instability of the epidermis, occurring on foetal life as our brain circumvolutions or our vili. . So, it is obvious that stresses are maintained in multilayered living systems. Closer to the case of C. elegans, the existence of stresses has been demonstrated by experiments with laser ablation fractures in the first stage. The fact that the fractures open proves the existence of stress: if not, there is no opening and only a straight line.

      Ll 379: "Although a special focus is made on late elongation, its quantitative treatment cannot avoid the influence of the first stage of elongation due to the acto-myosin network, which is responsible for a prestrain of the embryo." This statement is made repeatedly through the manuscript, but I do not understand, why you could not use an initial state without pre-strain.

      This is the basic concept of hyperelasticity. The reference state must be free of stress, so we cannot evaluate the first muscle contraction without treating the first elongation stage.

      Grammar, vocabulary and writing errors

      ll 31: "the influence of mechanical stresses (...) becomes more complex to be identified and quantified" Is the influence of mechanical stress too complex or too difficult to be identified/quantified?

      We have revised it in line 31, “The superposition of mechanical stresses, cellular processes (e.g., division, migration), and tissue organization is often too complex to identify and quantify.”

      Ll 41: "The embryonic elongation of C. elegans represents an attractive model of matter reorganization without a mass increase before hatching." Maybe "Embryonic elongation of C. elegans before hatching represents an attractive model of matter reorganization in the absence of growth.".

      We have revised it in line 41.

      L 42: "It happens after the ventral enclosure (...)" Maybe "It happens after ventral enclosure (...)".

      We have revised it in line 42.

      Ll 52: "The transition is well defined since the muscle participation makes the embryo rather motile impeding any physical experiments such as laser ablation (...)" Ablation of what?

      We have revised it in line 53:The transition is well defined, because the muscle involvement makes the embryo rather motile, and any physical experiments such as laser fracture ablation of the epidermis, which could be performed and achieved in the first period (\cite{vuong2017interplay}), become difficult,.

      Ll 59: "a hollow cylinder composed of four parts (seam and dorso-ventral cells)" It is not clear, what the four parts are - in the parenthesis, two are mentioned.

      We have revised it in line 59. Fig.1 shows the whole structure, dorsal, ventral and seam cells form four parts of the epidermis.

      L 78: "several important issues at this stage remain unsettled" At which stage?

      It means the late elongation stage, we have added this information in line 78.

      Ll 85: "but how it works at small scales remains a challenge." Maybe "but how it works at small scales remains to be understood.".

      We have revised it in line 86.

      Ll 99: "the osmolarity of the interstitial fluid" The comes out of the blue. Before you only talked about mechanics, why now osmolarity? Also, the interstitial fluid is only mentioned now. It is important for the dissipative effects that you discuss later, right? If yes, then you should probably introduce it earlier.

      For a better understanding, we have change osmolarity into viscosity in line 99.

      l 120: "The cortex is composed of three distinct cells" Maybe "distinct cell types".

      Thank you, and we have revised it in line 120.

      L 121: "cytoskeleton organization and actin network configurations" What is the difference between cytoskeleton organization and actin network configuration? Also, either both should be plural or both singular, I guess.

      (1) Cytoskeleton (which involves microtubules) forms the epidermis of C. elegans embryos, and the actin network surrounds the epidermis.

      (2) Thank you for your suggestion, we have revised it in line 121.

      L 130: "which will be introduced hereafter" Maybe "which will be used hereafter".

      We have revised it in line 130.

      Ll 148: "The geometric deformation gradient" You usually denote vectors in bold face, so \chi should be bold, right? Define d_i in Eq.(1).

      Yes, we have added this information in line 147.

      L 172: "auxiliary energy density" Please, explain this term.

      We have changed "auxiliary energy density" into "associated energy density" in line 175. Energy density is the amount of energy stored in a given system or region of space per unit volume, the associated energy density in our manuscript can help us to do some calculations.

      Ll 188: "Similar active matter can be found in biological systems, from animals to plants as illustrated in Fig.3(C)-(E), they have a structure that generates internal stress/strain when growing or activity. (...)" Why such a general statement during the presentation of the results? The second part of the sentence seems to be incomplete.

      Answers: We would like to show our method is general, and can be used in many situations. We have revised the wrong sentence in line 192.

      Ll 243: "a bending deformation occurs on the left for active muscles localized on left" Maybe "bending to the left occurs if muscles on the left are activated".

      Thank you, we have revised it in line 247.

      L 250: "we assume them are perfectly synchronous" Maybe "we assume them to contract simultaneously". We have revised it in line 252.

      L 258: "the muscle and acto-myosin activities are assumed to work almost simultaneously." Before it was simultaneously, now only almost!? What does almost mean?

      Sorry, we would like to express the same meaning in theses two sentences, we have deleted the word ‘almost’ in line 261.

      Ll 294: "one can hypothesize several scenarios" After that, only one scenario is described it seems.

      Thank you, we have revised this sentence in line 299.

      L 341: "and then is more viscous than water" Maybe "and that is more viscous than water".

      We have revised it in line 345.

      L 373: "before the egg hatch" Maybe "before the embryo (or larva) hatches"?

      We have revised the sentence in line 367.

      L 409: "elephant trunk elongated" maybe "elephant trunk elongation".

      We have revised it in line 412.

      Ll 417: "As one imagines, it is far from triviality (...)" Does this remake help in any way to understand better C. elegans elongation? Also maybe "it is far from trivial".

      We have revised it in line 423.

      Ll 428: "can map the initial stress-free state B_0 to a state B_1, which reflects early elongation process" Maybe: "maps the initial stress-free state B_0 to a state B_1, which describes early elongation".

      We have revised it in line 428.

      L 429: "After in the residually stressed (...)" Maybe "Subsequently, we impose an incremental strain filed G_1 that maps the state B_1 to the state B_2, which represents late elongation".

      We have revised it in line 429.

      l 763: "Modelling details of without pre-strain case" Maybe "Case without pre-strain" or "Modelling in the absence of pre-strain" Similarly for l 784.

      We have revised them in line 763 and line 784.

      Some questions of definition and understanding

      Ll 71: "We can imagine that once the muscle is activated on one side, it can only contract, and then the contraction forces will be transmitted to the epidermis on this side." I do not understand the sentence. Muscle activation leads to contraction, there is nothing to imagine here. Maybe you hypothesize that the muscles are attached to the epidermis such that muscle contraction leads to epidermis deformation?

      Yes, four muscle bands are attached to the epidermis, as shown in Fig.1. The deformation does not concern only the epidermis but the whole embryo during the bending events. We have modified the sentence to avoid misunderstanding, the sentence change to “Once the muscle is activated on one side, it can only contract, and then the contraction forces will be transmitted to the epidermis on this side.” in line 71.

      Ll 110: "However, it is less widely known that its internal striated muscles share similarities with skeletal muscles found in vertebrates in terms of both function and structure" Is it important for what you report, whether this fact is widely known?

      Yes, it is our opinion.

      Ll 112: "the role of the four axial muscles (...) is nearly contra-intuitive" Is it or is it not? If yes, why?

      Yes it is. Muscles exert contractions, so compressive deformations. Their localization are along the axis of symmetry (up to a small deviation) so they cannot mechanically realize the expected elongation, contrary to the orthoradial actomyosin network.

      However, elongation of the C. elegans is observed experimentally, so yes, we think the result contraintuitive.

      L 116: "fully heterogeneous cylinder" What is this?

      It means that the C. elegans embryo does not have the same elastic properties in different parts (or layers).

      L 129: "will collaborate to facilitate further elongation" To facilitate or to drive? If the former, what drives elongation?

      Contraction of muscles and actin bundles together drive elongation

      Ll 141: "the deformation in each section can be quantified since the circular geometry is lost with the contractions" The deformation could also be quantified if the sections remained circular, right?

      Yes. However, circularity is lost during each bending event.

      Ll 151: "we need to evaluate the influence of the C. elegans actin network during the early elongation before studying the deformation at the late stage. So, the deformation gradient can be decomposed into: (...) where (...) is the muscle-actomyosin supplementary active strain in the late period" I thought you were now studying the early stage?

      In this part, we are outlining how we can study the whole elongation (early and late), not just the early elongation stage. To evaluate the deformation induced by the first contraction of the muscles, we need to know the state of stress of the worm prior to this event, so we also need to recover the early period using the same formalism for the same structure.

      L 160: "When considering a filamentary structure with different fiber directions" Which filamentary structure are you talking about?

      Fig.3 B shows this model and the filamentary structure, which contains the actin and muscle fibers.

      Ll 174: "When the cylinder involves several layers with different shear modulus 𝜇 and different active strains, the integral over 𝑆 covers each layer" I do not understand this sentence. Also, you should probably write 'moduli' instead of modulus.

      This implies that when integrating over the whole cross-section S, we need to take into account each layer independently with its own shear modulus and sum the results.

      L 176: "weakness of 𝜀" Do you mean \epsilon << 1?

      Yes

      Ll 178: "Given that the Euler-Lagrange equations and the boundary conditions are satisfied at each order, we can obtain solutions for the elastic strains at zero order 𝐚(𝟎) and at first order 𝐚(𝟏)." Are you thinking about different orders in an \epsilon expansion or the early and the late stages of elongation?

      Answers: Different orders are considered only for the late elongation study, the early elongation is treated exactly so do not need a correction in \epsilon.

      L 197: "fracture ablation" Please, define.

      This is an experiment in which a laser is used to make a cut in a small-scale object of study and then the internal stresses are obtained based on the morphology of the cut, please see the Ref ‘Assessing the contribution of active and passive stresses in C. elegans elongation’. We have added this definition in line 200.

      Ll 203: What motivated your choice of notations for the radii R_2'? The inner part of the cylinder is fluid? But above you wrote about a solid cylinder. Why should the inner part be compressible?

      (1) We need to define the location of actin cables, which concentrate at the outer periphery.

      (2) Our model is a hollow cylinder, and the inner part of the cylinder contains internal organs, tissues, fluids, and so on, so we consider it to be a compressible extremely soft material (Line 213).

      Ll 212: "𝑟(𝑅) is the radius after early elongation." And during?

      R is variable, r(R) depends on R but also on time t, it represents the radius of C. elegans embryos after the onset of elongation, i.e., after acto-myosin and muscle activities begin.

      L 232: \tau_p is probably t_p?

      Yes.

      L 240: "quite simultaneously" Please, be precise.

      In practice, it is difficult to define the concept of simultaneous occurrence unless there is rigorous experimental data to show it, but all we can get in the Ref ‘Remodelage des jonctions sous stress mécanique’, is that it occurs almost simultaneously, which we define as quite simultaneously.

      Ll 246: "a short period" What does short mean? Why is it relevant?

      From the experimental observations and data, we know that each contraction occurs very rapidly: a few seconds so we define a short period for one contraction.

      L 263: "the bending of the model will be increased" Is it really the model that is bent?

      Yes, the bending deformation predicted by the model, we have revised in line 266.

      Ll 265: "we observed a consistent torsional deformation (Fig.6(E)) that agrees with the patterns seen in the video" In which sense do these configurations agree? I do not see any similarity between panels D and E.

      Both show a torsion deformation.

      L 267: "torsion as the default of symmetry of the muscle axis" I do not understand.

      We discuss two cases in this research, one where the muscle follows the axis of the C. elegans in the initial configuration, and the other where the muscle has a slight angle of deflection, and we have added more information in the manuscript (line 270).

      Ll 274: "Each contraction of a pair increases the energy of the system under investigation, which is then rapidly released to the body." Do you mean the elastic energy stored in the epidermis and central part of the embryo?

      Yes, the whole body.

      Ll 284: "The activation of actin fibers 𝑔𝑎1 after muscle relaxation can be calculated and determined by our model." Have you done it?

      Yes, we can obtain the value of g_a1, and then calculate the elongation.

      Ll 286 I do not understand, why you write about mutants at this place. Am I supposed to have already understood the basic mechanism of elongation? Why do you now write about the first stage?

      I would like to show our formalism can model wild-type and mutant C.elegans, and the comparison results are good.

      L 302: "The result is significantly higher than our actual size 210𝜇𝑚." How was significance assessed? Your actual size is probably more than 210µm.

      Here, we have considered two situations, one is that the accumulated energy is totally applied to the elongation so that the length will be much larger than the experimental result of 210 µm, the length value that we have obtained by calculation. In the other case, we have considered the energy dissipation, which leads to 210 µm.

      L 433: "where 𝜆 is the axial extension due to the pre-strained" Maybe ""where 𝜆 is the axial extension due to the pre-stress".

      In our manuscript, we define the pre-strain, not the pre-stress.

      L 438: "active filamentary tensor" Please, define.

      Active filamentary tensor defines the tensor representing the activities of a cylindrical model composed of different orientations fibers.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study convincingly shows that the less common D-serine stereoisomer is transported in the kidney by the neutral amino acid transporter ASCT2 and that it is a noncanonical substrate for sodium-coupled monocarboxylate transporter SMCTs. With a multihierarchical approach, this important study further shows that Ischemia-Reperfusion Injury in the kidney causes a specific increment in renal reabsorption carried out, in part, by ASCT2.

      Public Reviews:

      Reviewer #1 (Public Review):

      Most amino acids are stereoisomers in the L-enantiomer, but natural D-serine has also been detected in mammals and its levels shown to be connected to a number of different pathologies. Here, the authors convincingly show that D-serine is transported in the kidney by the neutral amino acid transporter ASCT2 and as a non-canonical substrate for the sodium-coupled monocarboxylate transporter SMCTs. Although both transport D-serine, this important study further shows in a mouse model for acute kidney injury that ASCT2 has the dominant role.

      Strengths:

      The paper combines proteomics, animal models, ex vivo transport analyses, and in vitro transport assays using purified components. The exhaustive methods employed provide compelling evidence that both transporters can translocate D-serine in the kidney.

      Weakness:

      In the model for acute kidney injury, the SMCTs proteins were not showing a significant change in expression levels and were rather analysed based on other, circumstantial evidence. Although its clear SMCTs can transport D-serine its physiological role is less obvious compared to ASCT2.

      We greatly value the reviewer's efforts and feedback in reviewing our manuscript. We acknowledge the reviewer's observation that the changes indicated by our proteomic results are not markedly pronounced. To reinforce our findings, we have incorporated an analysis of gene alterations at the single-cell level (snRNA-seq) from the publicly accessible IRI mouse model data (Figure supplement 7). The snRNA-seq data align with our proteomic data in terms of the general trend of gene/protein alterations, but reveal more substantial changes in both ASCT2 and SMCTs. These discrepancies might stem from the different quantification methods used, suggesting a possible underestimation in our label-free proteomic quantification. The differences we see between the functional changes in transporters and their quantification in proteomics can be explained by the unique challenges posed by membrane proteins. Post-translational modifications and the complex nature of multiple transmembrane domains often impact the accurate measurement of these proteins in proteomic studies. This complexity can lead to a mismatch between the actual functional changes occurring in the transporters and their perceived abundance or alterations as detected by proteomic methods (Figure 4A) (Schey KL et al. Biochemistry 2015, doi: 10.1021/bi301604j). However, this label-free quantitative proteomics approach is well-suited for our study, given its screening efficiency, compatibility with animal models, and the absence of a labeling requirement. We may consider incorporating alternative quantitative proteomic methods in future for a more thorough comparison. We have included these considerations in lines 351-356 of the revised manuscript.

      Manuscript lines 351-356

      “When evaluating the extent of gene/protein alterations between the control and IRI conditions, we observed that the gene alterations of both Asct2 and Smcts, as revealed by snRNAsequencing, are more pronounced than the protein alteration ratios obtained from proteomics. This discrepancy may stem from difficulty in the quantification method, especially for membrane transport proteins in label-free quantitative proteomics.”

      Regarding the roles of ASCT2 and SMCTs in renal D-serine transport, snRNA-seq showed that ASCT2 expression in the controls is less than 10% of the cell population. We suggest that ASCT2 contributes to D-serine reabsorption because of its high affinity and SMCTs (SMCT1 and SMCT2) would play a role in D-serine reabsorption in the cells without ASCT2 expression. In addition, we included other factors (the turnover rate and the presence of local canonical substrates) that may determine the capability of D-serine reabsorption. We have included this suggestion in the Discussion lines 386-404.

      Manuscript lines 386-404

      “Kinetics analysis of D-serine transport revealed the high affinity by ASCT2 (Km 167 µM) (Foster et al., 2016) and low affinity by SMCT1 (Km 3.39 mM; Figure 5E). In addition to transport affinity, the expression levels and co-localization of multiple transporters within the same cells are critical for elucidating the physiological roles of transporters or transport systems (Sakaguchi et al., 2024). In our proteome data, the chromatogram intensities of Smct1 (2.9 x 109 AU) and Smct2 (1.6 x 108 AU) were significantly higher than that of Asct2 (1.5 x 107 AU) in control mice (Table 1: abundance in Sham). While direct intensity comparisons between different proteins in mass spectrometry analyses are not precise, they can provide a general indication of relative protein amounts. This finding aligns with the snRNA-seq data, where Asct2 expression was found to be minimal, present in less than 10% of cell populations under both control and IRI conditions, suggesting that many cells do not express Asct2. Conversely, Smct1 and Smct2 show high and ubiquitous expression in control conditions, but their levels are markedly reduced in IRI conditions (Figure supplement 7). Our ex vivo assays demonstrate that both ASCT2 and SMCTs mediate D-serine transport (Figure 7B). Consequently, Asct2 may contribute to D-serine reabsorption due to its high affinity, whereas Smcts, owing to their abundance, particularly in cells lacking Asct2, likely play a significant role in D-serine reabsorption. Moreover, factors such as transport turnover rate (Kcat) and the presence of local canonical substrates are also vital in defining the overall contribution of Dserine transport systems.”

      Reviewer #2 (Public Review):

      Summary:

      The manuscript "A multi-hierarchical approach reveals D-1 serine as a hidden substrate of sodium-coupled monocarboxylate transporters" by Wiriyasermkul et al. is a resubmission of a manuscript, which focused first on the proteomic analysis of apical membrane isolated from mouse kidney with early Ischemia-Reperfusion Injury (IRI), a well-known acute kidney injury (AKI) model. In the second part, the transport of D-serine by Asct2, Smct1, and Smct2 has been characterized in detail in different model systems, such as transfected cells and proteoliposomes.

      Strengths:

      A major problem with the first submission was the explanation of the link between the two parts of the manuscript: it was not very clear why the focus on Asct2, Smct1, and Smct2 was a consequence of the proteomic analysis. In the present version of the manuscript, the authors have focused on the expression of membrane transporters in the proteome analysis, thus making the reason for studying Asct2, Smct1, and Smct2 transporters more clear. In addition, the authors used 2D-HPLC to measure plasma and urinary enantiomers of 20 amino acids in plasma and urine samples from sham and Ischemia-Reperfusion Injury (IRI) mice. The results of this analysis demonstrated the value of D-serine as a potential marker of renal injury. These changes have greatly improved the manuscript and made it more convincing.

      We deeply appreciate the reviewer’s comments on the manuscript. We have responded to the recommendations one by one in the later section.

      Reviewer #3 (Public Review):

      Summary:

      The main objective of this work has been to delve into the mechanisms underlying the increment of D-serine in serum, as a marker of renal injury.

      Strengths:

      With a multi-hierarchical approach, the work shows that Ischemia-Reperfusion Injury in the kidney causes a specific increment in renal reabsorption of D-serine that, at least in part, is due to the increased expression of the apical transporter ASCT2. In this way, the authors revealed that SMCT1 also transports D-serine.

      The experimental approach and the identification of D-serine as a new substrate for SMCT1 merit publication in Elife.

      The manuscript also supports that increased expression of ASCT2, even together with the parallel decreased expression of SMCT1, in renal proximal tubules underlies the increased reabsorption of D-serine responsible for the increment of this enantiomer in serum in a murine model of Ischemia-Reperfusion Injury.

      Weaknesses:

      Remains to be clarified whether ASCT2 has substantial stereospecificity in favor of D- versus L-serine to sustain a ~10-fold decrease in the ratio D-serine/L-serine in the urine of mice under Ischemia-Reperfusion Injury (IRI).

      It is not clear how the increment in the expression of ASCT2, in parallel with the decreased expression of SMCT1, results in increased renal reabsorption of D-serine in IRI.

      We thoughtfully appreciate the reviewer’s comment on the manuscript. Considering the alteration of D-/L-serine ratios, there are several factors including protein expression levels at both apical and basolateral sides, properties of the transporters (e.g. transport affinities, substrate stereoselectivities), and the expression of DAAO (D-amino acid oxidase) which selectively degrades D-amino acids. Moreover, the mechanism becomes more complicated when the transport systems of L- and D-enantiomers are different and have distinct stereoselectivities as in the case of serine. Future studies are required to complete the mechanism. However, we would like to explore the mechanism based on the current knowledge.

      From this study, we identified ASCT2 and SMCTs (SMCT1 and SMCT2) as D-serine transport systems. We showed that SMCT1 prefers D-serine. Although we did not analyze ASCT2 stereoselectivity, based on the previous studies, ASCT2 recognizes both D- and Lserine with high affinities and slightly prefers L-enantiomer (Km of 18.4 µM for L-serine in oocyte expression system (Utsunomiya-Tate et al. J Biol Chem 1996) and 167 µM for Dserine in oocyte expression system (Foster et al. Plos ONE 2016), and the IC50 of 0.7 mM for L-serine and 4.9 mM for D-serine (in HEK293 expression systems, Foster et al. PLOS ONE 2016). The proteomics showed an increase of ASCT2 (1.6-fold increase) and a decrease of SMCTs (1.7-fold decrease in SMCT1, and 1.3-fold decrease in SMCT2) in IRI conditions. The table below summarizes D-serine transport by ASCT2 and SMCTs.

      In the case of L-serine, ASCT2 and B0ATs (in particular B0AT3) have been revealed as L-serine transport systems in the kidneys (Bröer et al. Physiol Rev 2008; Singer et al. J Biol Chem 2009). Proteomics showed that B0ATs have higher expression levels than ASCT2 supporting the idea that B0ATs are the main L-serine transport system (Table S1: Abundance of B0AT1 = 1.34E+09, B0AT3 = 2.13E+08, ASCT2 = 1.46E+07). In IRI conditions, B0AT3 decreased 1.8 fold and B0AT1 decreased 1.1 fold. From these results, we included the contribution of B0ATs in L-serine transport in Author response table 1.

      Author response table 1.

      Taken together, we suggest that high ratios of D-/L-serine in IRI conditions are a combinational result of 1) increase of D-serine reabsorption by ASCT2 enhancement and SMCTs reduction and 2) decrease of L-serine reabsorption by B0ATs. We have included this suggestion in the Discussion lines 438-451.

      Manuscript lines 438-451

      “The enantiomeric profiles of serine revealed distinct plasma D/L-serine ratio, with low rations in the normal control but elevated ratios in IRI, despite the weak stereoselectivity of ASCT2 (Figure 1B). This observation suggested differential renal handling of D-serine compared to L-serine. While we identified SMCTs as a D-serine transport system, it has been reported that L-serine reabsorption is mediated by B0AT3 (Singer et al., 2009). We propose that the alterations in plasma and urinary D/L-serine ratios are the combined outcomes of: 1) transport systems for L-serine, and 2) transport systems for D-serine. In normal kidneys, the low plasma D/L-serine ratios could result from the efficient reabsorption of L-serine by B0AT3, coupled with the DAAO activity that degrades intracellular D-serine reabsorbed by SMCTs. In IRI conditions, our enantiomeric amino acid profiling revealed low plasma L-serine and high urinary L-serine (Figure supplements 1B, 2B). Additionally, the proteomic analysis indicated a reduction in B0AT3 levels (4h IRI/sham = 0.56 fold; 8h IRI/sham = 0.65 fold; Table S1). These observations suggest that the low L-serine reabsorption in IRI is a result of B0AT3 reduction.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is a thorough study that was reviewed previously under the old system. I think the authors have strengthened their findings and have no further suggestions.

      We appreciate reviewer 1 for his/her effort and comments, which greatly contributed to improving this manuscript.

      Reviewer #2 (Recommendations For The Authors):

      The experiments seem to me to have been well performed and the data are readily available.

      Weaknesses:

      More than weakness I would speak of discussion points: I have a few suggestions that may help to make the paper more accessible to a general audience.

      (1) In the Introduction, when the authors introduce the term "micromolecules", it would be beneficial to provide a precise definition or clarification of what they mean by this term. Adding a brief explanation may help the reader to better understand the context.

      Following the reviewer’s comment, we have included the explanation of the micromolecule and membrane transport proteins in lines 41-43.

      Manuscript lines 41-43

      “Membrane transport proteins function to transport micromolecules such as nutrients, ions, and metabolites across membranes, thereby playing a pivotal role in the regulation of micromolecular homeostasis.”

      (2) In line 91, I suggest specifying that this is a renal IRI model.

      Following the reviewer’s comment, we have added the information that it is a renal IRI model of AKI (lines 90-92).

      Manuscript lines 90-92

      “We applied 2D-HPLC to quantify the plasma and urinary enantiomers of 20 amino acids of renal ischemia-reperfusion injury (IRI) mice, a model of AKI and AKI-to-CKD transition (Sasabe et al., 2014; Fu et al., 2018).”

      (3) Lines 167-168 state that Asct2 is localised to the apical side of the renal proximal tubules. Is there any expression of Asct2 in other nephron segments?

      To our knowledge, there is no report of ASCT2 expression in other nephron segments. Our immunofluorescent data of the ASCT2 staining in the whole kidney at the low magnification and another region of Figure 3 (below) as well as immunohistochemistry from Human Protein Atlas (update: Jun 9th, 2023) did not show a strong signal of ASCT2 expression in other regions besides the proximal tubules. Thus, we conclude that ASCT2 is mainly expressed in proximal tubules, but not in other nephron regions.

      Author response image 1.

      (4) Lines 225-226: Have the authors expressed the candidate genes in HEK293 cells with ASCT2 knockdown?

      This experiment was done by expressing the candidate genes in the presence of endogenous ASCT2. We have added the information in lines 225-227 to emphasize this process.

      Manuscript lines 225-227

      “Based on this finding, we utilized cell growth determination assay as the screening method even in the presence of endogenous ASCT2 expression. HEK293 cells were transfected with human candidate genes without ASCT2 knockdown.”

      (5) Lines 254-255: why was D-serine transport enhanced by ASCT2 knockdown in FlpInTRSMCT1 or 2 cells?

      We appreciate the reviewer to point out this data. We apologize for causing the confusion in the text. The total amount of D-serine uptake in the cells did not enhance but the net uptake (uptake subtracted from the background) was increased. This enhancement is a result of the lower background by ASCT2 knockdown. We have revised the texts and explained this result in more detail (lines 256-258).

      Manuscript lines 256-258

      “In the cells with ASCT2 knockdown, the background level was lower, thereby enhancing the D-[3H]serine transport contributed by both SMCT1 and SMCT2 (the net uptake after subtracted with background) (Figure 5C).”

      (6) Line 265: The low affinity of SMCT1 for D-serine alone makes it an unlikely transporter for urinary D-serine.

      We admitted the reviewer’s concern about the low affinity of SMCT1. However, Km at mM range is widely accepted for several low-affinity amino acid transporters such as proton-coupled amino acid transporter PAT1 (Km = 2 – 5 mM; Miyauchi et al. Biochem J 2010), cationic amino acid transporter CAT2A (Km = 3 – 4 mM; Closs et al. Biochem 1997), and large-neutral amino acid transporter LAT4 (Km = 17 mM; Bodoy et al. J Biol Chem 2005). In the kidneys, many compounds are well-known to be reabsorbed by the low-affinity but high-capacity (high-expression) transporters. Similarly, D-serine was reported to be reabsorbed by the low-affinity transporter (Kragh-Hansen and Sheikh, J Physiol 1984; Shimomura et al. BBA 1988; Silbernagl et al. Am J Physiol Renal Physiol 1999). Moreover, amino acid profile showed urinary D-serine in the range of 100 – 200 µM (Figure supplement 2). This concentration range could drive SMCT1 function (Figure 5). Combined with the high and ubiquitous expression of SMCT1, we propose that SMCT1 is a low-affinity but highcapacity D-serine transporter in the kidneys.

      snRNA-seq is a method that can directly compare the expression levels between different genes within the same cells. From Figure supplement 7, expression of SMCT1 is much more abundant than ASCT2. ASCT2 was presented in less than 10% of cell population. It is possible that 90% of the cells that do not express ASCT2 use SMCT1 to reabsorb Dserine.

      We have revised the Discussion regarding this comment (lines 386-404).

      Manuscript lines 386-404

      “Kinetics analysis of D-serine transport revealed the high affinity by ASCT2 (Km 167 µM) (Foster et al., 2016) and low affinity by SMCT1 (Km 3.39 mM; Figure 5E). In addition to transport affinity, the expression levels and co-localization of multiple transporters within the same cells are critical for elucidating the physiological roles of transporters or transport systems (Sakaguchi et al., 2024). In our proteome data, the chromatogram intensities of Smct1 (2.9 x 109 AU) and Smct2 (1.6 x 108 AU) were significantly higher than that of Asct2 (1.5 x 107 AU) in the control mice (Table 1: abundance in Sham). While direct intensity comparisons between different proteins in mass spectrometry analyses are not precise, they can provide a general indication of relative protein amounts. This finding aligns with the snRNA-seq data, where Asct2 expression was found to be minimal, present in less than 10% of cell populations under both control and IRI conditions, suggesting that many cells do not express Asct2. Conversely, Smct1 and Smct2 show high and ubiquitous expression in control conditions, but their levels are markedly reduced in IRI conditions (Figure supplement 7). Our ex vivo assays demonstrate that both ASCT2 and SMCTs mediate D-serine transport (Figure 7B). Consequently, Asct2 may contribute to D-serine reabsorption due to its high affinity, whereas Smcts, owing to their abundance, particularly in cells lacking Asct2, likely play a significant role in D-serine reabsorption. Moreover, factors such as transport turnover rate (Kcat) and the presence of local canonical substrates are also vital in defining the overall contribution of Dserine transport systems.”

      (7) Line 316: The authors state that there is a high tubular D-serine reabsorption in IRI and in line 424 there is an inactivation of DAAO during the pathology. This suggests that there is a reabsorption of D-serine mediated by a transport system in the basolateral membrane domain of proximal tubular cells. Do the authors have any information about this transporter?

      We agree with the reviewer that transporters at the basolateral membrane are important to complete the D-serine reabsorption in the kidney, and have included this issue in the original manuscript. We stated that transport systems at the basolateral side are necessary to be analyzed in order to complete the picture of D-serine transport systems in the kidney (lines 481-483 of the revised manuscript). However, we did not have any strong candidates for basolateral D-serine transport systems. Because we analyzed the proteome of BBMV, which concentrates on the apical membrane proteins, the analysis did not detect several transporters at the basolateral side.

      (8) In lines 462-463, the authors state: "It is suggested that PAT1 is less active at the apical membrane where the luminal pH is neutral". However, the pH of urine in the proximal tubules is normally acidic due to the high activity of NH3. I suggest rewording this sentence.

      Thank you for your comment. Proximal tubule (PT) is the first and the main region to maintain acid-base homeostasis in the kidney. In PT cells, NH3 secretes H+ to titrate luminal HCO3- and creates CO2, which is absorbed into PT cells and produces "new intracellular HCO3-", which is subsequently reabsorbed into the blood. Although ion fluxes in PT is to maintain the pH homeostasis, the pH regulation in both luminal and intracellular PT cells is highly dynamic. We totally agree with the reviewer and to follow that, we have revised the text by emphasizing the pH around PT segments, rather than the final urine pH, and leaving the discussion open for the possibility of PAT1 function in PT of normal kidneys (lines 474481).

      Manuscript lines 474-481

      “PAT1, a low-affinity proton-coupled amino acid transporter (Km in mM range), has been found at both sub-apical membranes of the S1 segment and inside of the epithelia (The Human Protein Atlas: https://www.proteinatlas.org; updated on Dec 7th, 2022) (Sagné et al., 2001; Vanslambrouck et al., 2010). PAT1 exhibits optimum function at pH 5 - 6 but very low activity at pH 7 (Miyauchi et al., 2005; Bröer, 2008b). Future research is required to address the significance of PAT1 on D-serine transport in the proximal tubule segments where pH regulation is known to be highly dynamic (Boron, 2006; Nakanishi et al., 2012; Bouchard and Mehta, 2022; Imenez Silva and Mohebbi, 2022).”

      Reviewer #3 (Recommendations For The Authors):

      The authors proposed that the increased expression of ASCT2, even together with the decreased expression of SMCT1/2, causes the increased renal reabsorption of D-serine that occurs in IRI. In the discussion, the main argument to sustain this hypothesis is the higher apparent affinity for D-serine of ASCT2 (<200 uM Km) versus SMCT1 (3.4 mM Km). In the Discussion section (page 18- 1st complete paragraph), the authors indicate that the Mass Spec intensities of SMCT1 and 2 are two and one order of magnitude higher respectively than that of ASCT2. This suggests that SMCT1 is clearly more expressed than ASCT2 in control conditions. IRI increments ASCT2 protein expression in brush-border membrane vesicle from kidney 1.6 folds and decreases that of SMCT1 0.6 folds. How this fold changes, even taking into account the lower Km of ASCT2 versus SMCT1 would explain the dramatic changes in the D-/L-serine ratios in plasma and urine in IRI? The authors might discuss whether other transport characteristics, even unknown (e.g., a higher turnover rate of ASCT2 vs SMCT1), would also contribute to the higher D-serine reabsorption in IRI.

      SMCT1 shows some enantiomer selectivity for D- vs L-serine. At 50 uM concentration the transport is almost double for D. vs L-serine, but is ASCT2 stereoselective between the two enantiomers of serine? Some of the authors of this manuscript showed in a previous paper that the basolateral transporter Asc1 also participates in the accumulation of D-serine in serum caused by renal tubular damage. (Serum D-serine accumulation after proximal renal tubular damage involves neutral amino acid transporter Asc-1. Suzuki M et al. Sci Rep. 2019 Nov 13;9(1):16705 (PMID: 31723194)). Asc1 shows no stereoselectivity between L- and D-serine. Can the authors discuss possible mechanisms resulting in increased renal reabsorption of Dserine than L-serine in IRI with the participation of transporters with modest stereoselectivity for D- vs L-serine?

      We appreciate the reviewer’s comments on the degree of protein alteration in proteomics, the functional contributions of ASCT2 and SMCTs, and the alteration of D/L ratios. We have included the possibilities of the technical concerns and the discussion on the roles of ASCT2 and SMCTs as follows.

      • Regarding the expression levels, proteomics and snRNA-seq showed the same tendency that ASCT2 increase and SMCTs decrease in IRI conditions. However, the degrees of alterations are more contrast in snRNA-seq. This may be due to the difference in quantification methods and probably points out the underestimated quantification of membrane transport proteins in label-free proteomics. The accuracy of protein quantifications in the label-free proteomics are often impacted by the presence of post-translational modifications and multiple trans-membrane domains like in the case of the membrane transport proteins (Schey KL et al. Biochemistry 2015, doi: 10.1021/bi301604j). Alternative methods of quantitative proteomics may be added in the future for a more thorough comparison. We have added this issue in lines 351-356 of the revised version.

      Manuscript lines 351-356

      “When evaluating the extent of gene/protein alterations between the control and IRI conditions, we observed that the gene alterations of both Asct2 and Smcts, as revealed by snRNA-sequencing, are more pronounced than the protein alteration ratios obtained from proteomics. This discrepancy may stem from difficulty in the quantification method, especially for membrane transport proteins in label-free quantitative proteomics.”

      • For the functional contributions of ASCT2 and SMCTs in the kidney, we admitted the reviewer’s concern about the low affinity of SMCT1. Following the reviewer’s comment, we have included other factors besides transport affinities, e.g. expression levels and turnover rates of the transporters. From the results of both proteomics and snRNA-seq, ASCT2 expression is significantly lower than SMCTs in the normal conditions. snRNA-seq showed that ASCT2 was presented in less than 10% of the cell population (Figure supplement 7). We propose that most of the cells that do not express ASCT2 may use SMCT1 to reabsorb D-serine. This topic was included in the revised manuscript lines 386-404.

      Manuscript lines 386-404

      “Kinetics analysis of D-serine transport revealed the high affinity by ASCT2 (Km 167 µM) (Foster et al., 2016) and low affinity by SMCT1 (Km 3.39 mM; Figure 5E). In addition to transport affinity, the expression levels and co-localization of multiple transporters within the same cells are critical for elucidating the physiological roles of transporters or transport systems (Sakaguchi et al., 2024). In our proteome data, the chromatogram intensities of Smct1 (2.9 x 109 AU) and Smct2 (1.6 x 108 AU) were significantly higher than that of Asct2 (1.5 x 107 AU) in the control mice (Table 1: abundance in Sham). While direct intensity comparisons between different proteins in mass spectrometry analyses are not precise, they can provide a general indication of relative protein amounts. This finding aligns with the snRNA-seq data, where Asct2 expression was found to be minimal, present in less than 10% of cell populations under both control and IRI conditions, suggesting that many cells do not express Asct2. Conversely, Smct1 and Smct2 show high and ubiquitous expression in control conditions, but their levels are markedly reduced in IRI conditions (Figure supplement 7). Our ex vivo assays demonstrate that both ASCT2 and SMCTs mediate D-serine transport (Figure 7B). Consequently, Asct2 may contribute to D-serine reabsorption due to its high affinity, whereas Smcts, owing to their abundance, particularly in cells lacking Asct2, likely play a significant role in D-serine reabsorption. Moreover, factors such as transport turnover rate (Kcat) and the presence of local canonical substrates are also vital in defining the overall contribution of D-serine transport systems.”

      • As for the dramatic alterations of D/L-serine ratios juxtaposed with minimal changes in ASCT2 and SMCTs expression level, we cautiously refrain from drawing a definitive conclusion regarding the entire mechanism. This caution is grounded in the scientific understanding of a comprehensive elucidation of both L-serine transport systems and D-serine transport systems at both apical and basolateral membranes. Nevertheless, we would like to suggest a mechanism at the apical membrane based on the current knowledge.

      For D-serine transport systems, we found ASCT2 and SMCTs contributions in this study. Meanwhile, L-serine was previously reported to be mediated mainly by the neutral amino acid transporters B0AT3 (in particular B0AT3; Bröer et al. Physiol Rev 2008; Singer et al. J Biol Chem 2009). Hence, the mechanism behind the alterations of D/L-serine ratios should include B0AT3 functions as well. In IRI conditions, B0AT3 decreased 1.8 fold. We suggest that high ratios of D-/L-serine in IRI conditions are a combined outcome of 1) increase of D-serine reabsorption by ASCT2 enhancement and SMCTs reduction, and 2) decrease of L-serine reabsorption by B0AT3. We have included this suggestion in the Discussion lines 438-451.

      Manuscript lines 438-451

      “The enantiomeric profiles of serine revealed distinct plasma D/L-serine ratios, with low ratios in the normal control but elevated ratios in IRI, despite the weak stereoselectivity of ASCT2 (Figure 1B). This observation suggested the differential renal handling of D-serine compared to L-serine. While we identified SMCTs as a Dserine transport system, it has been reported that L-serine reabsorption is mediated by B0AT3 (Singer et al., 2009). We propose that the alterations in plasma and urinary D/Lserine ratios are the combined outcomes of: 1) transport systems for L-serine, and 2) transport systems for D-serine. In normal kidneys, the low plasma D/L-serine ratios could result from the efficient reabsorption of L-serine by B0AT3, coupled with the DAAO activity that degrades intracellular D-serine reabsorbed by SMCTs. In IRI conditions, our enantiomeric amino acid profiling revealed low plasma L-serine and high urinary L-serine (Figure supplements 1B, 2B). Additionally, the proteomics analysis indicated a reduction in B0AT3 levels (4h IRI/sham = 0.56 fold; 8h IRI/sham = 0.65 fold; Table S1). These observations suggest that the low L-serine reabsorption in IRI is a result of B0AT3 reduction.”

      • In the case of Asc-1, it was reported to be a D-serine transporter in the brain (Rosenberg et al. J Neurosci 2013). Suzuki et al. 2019 showed the increase of Asc-1 in cisplatin-induced tubular injury. Notably, the mRNA of Asc-1 is predominantly found in Henle’s loop, distal tubules, and collecting ducts but not in proximal tubules, and its protein expression level is dramatically low in the kidney (Human Protein Atlas: update on Jun 19, 2023). Furthermore, in this study, Asc-1 expression was not detected in the brush border membrane proteome. Consequently, we have decided not to include Asc-1 in the Discussion of this study, which primarily focuses on the proximal tubules.
    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to express our gratitude to the reviewers for their comments, which helped us to improve the quality of our manuscript. Below are the responses to each comment. We hope that these responses will satisfy the reviewers.

      Reviewer #1

      Evidence, reproducibility and clarity

      Summary: The nonsense-mediated mRNA decay (NMD) is and RNA quality pathway that eliminates mRNAs containing premature termination codons. Its mechanism has been studied for several decades but despite enormous progress we still don't have a satisfactory model that would explain most of the published observations. In particular, the mechanism has been proposed to differ substantially between yeast and metazoa. Yeast Nmd4 protein was previously shown to be involved in NMD, to interact with UPF1 and exhibit similarities with metazoan SMG6 and SMG5/7, that are normally believed to be specific for metazoan NMD (Dehecq et al., EMBO J, 2018). Barbarin-Bocahu et al now describe the crystal structure of the complex between the yeast UPF1 RNA helicase and Nmd4. Importantly, the authors show that interaction is required for NMD activity and increases the ATPase activity of UPF1. Barbarin-Bocahu et al equally show that this interaction and its role in NMD is conserved in the human UPF1-SMG6 complex, thus providing additional novel evidence for universal conservation of the NMD mechanism in eukaryotes. The manuscript carefully combines biochemistry, biophysics with functional in vivo studies. In my opinion, all the experiments are very well executed, generally convincing and interpretations appear correct, so the manuscript is certainly suitable for publication. I have included some suggestions below that I believe could strengthen the manuscript and enhance our confidence in the findings.

      We are grateful for the useful suggestions that have enabled us to improve our manuscript.

      Major comments:*

      *Page 7 - "Since the D1353A mutation completely abolishes the enzymatic activity of SMG6 (34), this strongly suggests that the PIN domain of Nmd4 is not endowed with endonucleolytic activity. " Could/was the endonucleolytic activity of NMD4 be tested?

      We agree with this important point. Our statement is based on previous site directed mutagenesis experiments on the PIN domain of human SMG6 (Galvan et al; 2006; EMBO Journal; PMID : 17053788 / Eberle et al; 2008; Nat. Struct. Mol. Biol.; PMID : 19060897), which showed that D1353 is the critical residue of SMG6 active site involved in the endonuclease enzymatic activity. Given that in yeast Nmd4 proteins, the corresponding residue is hydrophobic (Leu112 in S. cerevisiae Nmd4 and Phe114 in Kluyveromyces lactis Nmd4) and therefore cannot participate directly in catalysis, we assume that yeast Nmd4 proteins have no endonucleolytic activity.

      Furthermore, despite decades of research in this field, no endonucleolytic activity has been described as being involved in the NMD pathway of S. cerevisiae (the model system in which the NMD mechanism was discovered in the 1970's), whereas it has been well characterized in the NMD pathway of metazoans for more than twenty years (Gatfield and Izaurralde; Nature; 2004; PMID : 15175755 / Huntzinger et al; RNA; 2008; PMID : 18974281 / Eberle et al; Nat. Struct. Mol. Biol.; 2009; PMID : 19060897 / Lykke-Andersen et al; Genes Dev.; 2014; PMID : 25403180). Our attempts to demonstrate an endonucleolytic activity of purified Nmd4 in vitro were not successful. This negative result could be due to many reasons, including loss of enzymatic activity in the tested buffer, the absence of an important cofactor or the choice of the tested RNA. For these reasons, we prefer not to include this type of negative result in the current manuscript.

      We hope that, on the basis of the above informations, the reviewer will agree that further substantial efforts to demonstrate a hypothetical endonucleolytic activity of Nmd4 are unlikely to be fruitful. Moreover, we believe that even if yeast Nmd4 turns out to behave as an endonuclease, this fact does not change the main message of the manuscript.

      Page 10 - The two proteins bind RNA with reasonable affinity. The complex binds polyU RNA with Kd of 0.44 μM . The authors suggest, based on structure superpositions, that RNA fragments bound to the PIN domain and Upf1-HD have opposite orientations. But since they have the complex ready to crystallize, did they attempt to determine the structure with of the complex with RNA? The complex is quite small (~100 kDa with RNA) but it could be even visible by cryo-EM. I don't insist that such a structure needs to be included but it might perhaps be easy to do and would surely strengthen the story. If it is too difficult, it could at least be mentioned that it was tried?

      We agree that it would be interesting to determine the crystal structure of the complex with a short RNA fragment. Unfortunately, despite extensive efforts, we could not obtain crystals of the complex in the presence of RNA. This is probably due to the large movements of the RecA2 and 1B domains relative to the RecA1 domain observed in former studies upon RNA binding to Upf1. We have mentioned that we tried to crystallize this complex in the absence or the presence of a short oligonucleotide in our revised manuscript.

      As far as single-particle cryo-EM is concerned, we are aware that recent advances in this field should make it possible to determine the structure of the Nmd4-Upf1-RNA complex, but we do not yet have the necessary expertise in this technique. Despite the interesting information that such a structure could provide, we therefore consider that this would require a very significant investment and that it is beyond the scope of this manuscript.

      I think it is important to demonstrate that the structure-based mutants don't significantly impact the overall structure of the proteins (e.g. glycine residues are mutated within helices). At least gel filtration profiles with gels of the WT and mutated proteins should be shown in SI.

      Thank you very much for highlighting this point. We fully agree that it is important to demonstrate that the Upf1 and Nmd4 mutants used in the in vitro experiments (pull-down and ATPase assays) are not affected in their overall folding. As suggested by the reviewer, we have included gel filtration chromatograms for WT and mutant proteins (Figures S2A for Upf1-HD proteins and S2B for His6-ZZ-Nmd4 proteins). These chromatograms clearly show that the different mutants behave very similarly to the WT proteins during purification, demonstrating that the overall structures of the mutants are very similar to those of the wild-type proteins. We have also included the Coomassie blue stained SDS-PAGE analysis of the proteins present in the main peak to show the purity of the final proteins.

      Perhaps the main finding of this manuscript is the conservation of the UPF1-Nmd4 interaction in human UPF1-SMG6. But the interaction is only demonstrated by co-IP with ectopically expressed human proteins in human cells that contain all the other human proteins as well. It would probably be more convincing to demonstrated the interaction in pull-downs with purified proteins as done for the yeast complex.

      Thank you for highlighting what we consider to be one of the most interesting findings presented in our manuscript. We agree that pull-down experiments using pure protein fragments expressed in E. coli would have been ideal to further confirm our co-IP results and to validate that mutations do not affect the overall structure of SMG6. Unfortunately, despite considerable efforts, we were unable to express sufficient quantities of the SMG6-[207-580] fragment or shorter versions as soluble proteins in E. coli. Indeed, Elena Conti's laboratory had the same experience according to a statement in a paper on SMG6 (Chakrabarti et al; 2014 Nucleic Acids Research; PMID: 25013172), indicating that this region protein is very difficult to work with. As we have not yet set up protein over-expression techniques in human cells or baculovirus-infected insect cells in our laboratory, we have not been able to try these expression systems to express these SMG6 domains. These are the reasons why we decided to demonstrate this interaction by co-IP experiment using ectopically expressed tagged proteins in human cells and all appropriate controls.

      In addition, using purified proteins would enable testing whether the mutations in SMG6 don't affect the overall structure of the mutants compared to the WT.

      We agree that this is an important issue. Several bioinformatics tools, including AlphaFold2 (identifier: AF-Q86US8-F1), predict that the human SMG6-[207-580] fragment is largely unstructured (see panel A of figure below). Furthermore, the pLDDT values or confidence scores for this region in the AlphaFold2 model are very low (below 50), indicating that the structure of this region is poorly predicted (see panel B of figure below). Therefore, biophysical techniques to assess that the overall structure of this fragment is not affected by the introduced mutations are very limited. However, we did not observe reduced levels of SMG6 mutants compared with WT in human cells expressing these variants (Fig. 4B and S4), so we believe that these mutants behave similarly to the wild-type fragment, as is often postulated by scientists for in cellulo studies. Furthermore, if these mutants drastically affect the overall structure of SMG6, we would expect NMD to be strongly affected, resulting in a notable accumulation of NMD RNA substrates in our in cellulo experiments when the effect of the double mutant (M2) is compared to that of the SMG6 WT protein (Fig. 4C). This was not the case. On the basis of all these elements, we assume that the overall structure of the SMG6 protein is not affected by these mutations.

      Figure for reviewing purpose : Model of the three-dimensional structure of human SMG6 protein generated by AlphaFold2.

      A. Model of human SMG6 protein (green) with the region 207-580 used in our study colored in red.

      B. Model of human SMG6 protein (green) colored according to the pLDDT values. Orange : pLDDT 90.

      Since the detected similarity to Nmd4 is only in a region covering residues 440-470, why is the tested construct much larger (207-580) including extra, large disordered regions.

      For in cellulo studies, it has previously been shown that the SMG6-[207-580] fragment is expressed as a stable protein in human cells and is responsible for the phospho-independent interaction between UPF1 and SMG6 (Chakrabarti et al; 2014; Nucleic Acids Research; PMID: 25013172). As our aim was not to reduce this SMG6 region to a shorter peptide but to conduct an amino acid-level analysis by site-directed mutagenesis, we decided to perform our experiments using the same SMG6 domain as Conti's laboratory and to mutate conserved residues on this fragment.

      Finally, the most convincing way to show and characterize the human UPF1-SMG6 interaction would be an X-ray structure. It might be feasible to crystallize human UPF1 HD domain with a SMG6 peptide. Or at least an Alphafold model could be included? I had a quick try just with the Colabfold and using the HD domain and the SMG6 peptide, Alphafold can predict convincingly the binding of the region around W456 and in some models even around R448. I think that this would strengthen the conclusions in this part of the manuscript.

      We agree that determination of the crystal structure of human UPF1 HD linked to this region of SMG6 protein interaction would have further supported our conclusions on the conservation of UPF1-Nmd4 interaction in human UPF1-SMG6. However, due to the SMG6 expression problems mentioned above, we were unable to reconstitute the human complex in vitro, which precluded crystallization assays.

      Based on this suggestion, we generated a model of human UPF1-HD bound to the 421-480 region of human SMG6 using AlphaFold2 Colabfold. Of the various models proposed (25 in total), most are very similar and show that the side chains of R448 and W456 of SMG6 bind to regions of human UPF1 corresponding to the region of the yeast protein that interacts with R210 and W216 of Nmd4. This model is consistent with our hypothesis and we have decided to include it in the revised manuscript as suggested (Fig. EV6). We thank the reviewer for this constructive comment.

      We have added the following text to mention this model : « Based on this observation, we generated a model of the complex between human UPF1-HD and the region 421-480 of SMG6 using AlphaFold2 software (1,2). In this model, the SMG6 fragment binds to the same region of UPF1-HD as the Nmd4 « arm » (Fig. EV6). In particular, the R448 and W456 side chains of SMG6 match almost perfectly with R210 and W216 side chains of S. cerevisiae Nmd4, suggesting that this conserved region from SMG6 is involved in the interaction between the SMG6 and UPF1-HD proteins. »

      Does the SMG6 addition also increases the ATPase activity of UPF1?

      This is a very good point and we agree that the results of such an experiment may have further supported our conclusions about the conservation of the Upf1-Nmd4 interaction in human UPF1-SMG6. Unfortunately, due to the SMG6 protein expression problems mentioned above, we could not perform these in vitro experiments.

      Minor comments: Examples of electron density omit maps of the key interaction interfaces should be shown in Supplementary Information for the reader to be able to judge the crystallography data quality.

      Following this suggestion, we have added two panels showing electron density omit maps of residues at the interface in Fig. S1. We hope that this will convince the reader of the quality of our crystallographic data. We have also added the following sentence to the main text : « The overall quality of the electron density map allowed us to unambiguously identify the residues of the two proteins involved in the formation of the complex (Fig. S1A-B). »

      I suggest to add the Kd values to ITC panels for clarity in main and EV figures.

      We have taken this suggestion into account for figures 2A and EV5.

      On page 10: What experiment is this referring to : "This is in agreement with our ITC experiments (carried out in the absence of a non-hydrolyzable ATP analog), which revealed no major synergistic effect between the two proteins for RNA binding." Results in EV4A? Or some other not shown data? The results in EV4A do show an increase in RNA binding when both proteins are in a complex.

      Thank you for your comment. We realize that this sentence was not clear. We refer to the ITC data for the interaction of Upf1-HD, Nmd4 or the complex with RNA (Fig. EV5A). These data show a 2.3-fold increase in the affinity of Upf1 for RNA in the presence of Nmd4, which we consider to be a notable effect but not a major one. Based on the second reviewer's comments that our comparison between Nob1 and the PIN domain of Nmd4 is not convincing, we have decided to delete this speculative section, which did not address an important point in our current study. We will address this point using more direct and sophisticated methods in future work.

      On page 16, "organsms" should be" organisms"

      Typo corrected.

      In certain figure legends the panel labels (A,B,C..) are missing (e.g. Fig 3, EV1, EV5).

      We apologize for this problem ,which was due to a conversion problem when preparing the PDF file of the submitted article. This problem has now been corrected.

      The PIN domain structure was solved only to determine the structure of the complex? I only found it mentioned in the methods and no other mention of this structure in the main text. Maybe one sentence could be added to the results to explain why this structure was solved and how it compares to the complex structure.

      We agree that we forgot to explain why we solved the structure of the PIN domain of Nmd4. The point was to help in the determination of the structure of the complex. We have added the following sentence to the main text to explain this point: « We also determined the 1.8 Å resolution crystal structure of the PIN domain of Nmd4 (residues 1 to 167) to help us determine the structure of the Nmd4/Upf1-HD complex. As this structure is virtually identical to the structure of the PIN domain of Nmd4 in the complex (rmsd of 0.5 Å over 163 C𝛼 atoms between the two structures), we will only describe the structure of this domain in the Upf1-Nmd4 complex. »

      Significance

      This is a important study, providing detailed insight into the function on Nmd4, SMG6 and UPF1 NMD. The results also point towards a conserved mechanism on NMD between yeast and human. I would like to highlight the quality of the experiments. This study will be of great interest to people working on NMD but also more broadly to scientists working on RNA, helicases and structural biologists.

      We are very grateful for the reviewer's comments about the broad interest and overall quality of our work.

      Reviewer #2

      Evidence, reproducibility and clarity

      In this study, the authors solved the crystal structure of the UPF1 helicase domain in complex with Nmd4. Through the structure and biochemical studies, they uncovered a region responsible for Nmd4 binding to UPF1, also important for their function in NMD. In the end, the authors also extended their findings to the human SMG6, proposing a conserved mechanism for Nmd4 and SMG6.

      The mechanism of UPF1 functioning during NMD is a long-existing question. For decades, people have been trying to find out the roles of all the NMD factors during this process. This study visualized the first direct connection between UPF1 and the putative SMG6 homolog, Nmd4. Undoubtedly, it will aid our understanding of how the whole process works.

      One of the limitations of this study is the conservation between Nmd4 and SMG6. Although they both have a PIN domain, Nmd4 is inactive while SMG6 is active. During NMD, SMG6 is thought to work to cut the mRNA, thus promoting the degradation of the non-functional mRNA. Therefore, Nmd4 and SMG6 may only share a similar binding mode with UPF1, however, they do not share similar functions. This study might only apply to yeast study.

      We respectfully disagree with this comment. The role of SMG6 in NMD cannot be attributed solely to the endonuclease activity of the SMG6 PIN domain alone. Indeed, recruitment of the SMG6 PIN domain alone to an mRNA is not sufficient to destabilize it (Nicholson et al; 2014; Nucleic Acids Research; PMID: 25053839). This clearly indicates that other regions of SMG6 are critical for NMD. In our manuscript, we unveil the conservation of the Upf1-Nmd4 interaction in human UPF1-SMG6 (and probably more generally in metazoans) and show that this interaction plays a role in the optimal removal of NMD substrates. We strongly believe that our results are not only applicable to the study of yeast, but will fuel future studies in human cells aimed at describing the mechanistic details of the human NMD pathway.

      comments: the study write in a very clear way, and most of the experiments are clear and sound. I do not have any major comments. I only have a few minor comments, listed below:

      We are very grateful for the reviewer's comments about the overall quality of our manuscript and of the experimental work.

      1:The authors also solved the PIN domain of the SMG6. This is a result worth showing in the main figure.

      In our study, we did not solve the structure of the human SMG6 PIN domain. This was done by Dr. Conti's group in 2006 (Galvan et al; 2006; EMBO Journal; PMID : 17053788). This is the reason why we do not include this in the main figure. However, we have solved the crystal structure of Nmd4 PIN domain alone to help us determine the structure of the complex. Since it is very similar to the structure of the Nmd4 PIN domain in the complex with Upf1, we do not describe this structure in details. Following up the suggestion from another reviewer, we have included the following sentence mentioning that we have also determined the structure of Nmd4 PIN domain in the main text : « We also determined the 1.8 Å resolution crystal structure of the PIN domain of Nmd4 (residues 1 to 167) to help us determine the structure of the Nmd4/Upf1-HD complex. As this structure is virtually identical to the structure of the PIN domain of Nmd4 in the complex (rmsd of 0.5 Å over 163 C𝛼 atoms between the two structures), we will only describe the structure of this domain in the Upf1-Nmd4 complex. »

      2:It would be easier to read if the authors could add all the binding constants directly into the ITC panels.

      We have taken this suggestion into account for figures 2A and EV5.

      3:I am confused with His6-ZZ. Is ZZ a protein tag?

      The ZZ protein is a tag consisting of a tandem of the Z-domain from Staphylococcus aureus protein A. This domain binds to the Fc region of IgG and has been shown to improve expression levels and stability of recombinant proteins. In our case, it proved crucial to obtain mg amounts of the yeast Nmd4 protein and to enhance considerably its stability. We have added the following sentence in the « Materials and methods » section of the manuscript : « The ZZ-tag consists in a tandem of the Z-domain from Staphylococcus aureus protein A and was used as an enhancer of protein expression and stability. »

      4:The comparison between Nob1 and the PIN domain of Nmd4 is not convincing for me. Since the PIN domain is not required for the binding between Nmd4 and UPF1, the conformation of the PIN domain could be a result of the crystal packing. Thus, it is still possible that Nmd4 and UPF1 bind to the same RNA. To this end, I challenge the conclusion the authors have made on the mRNA binding part.

      We agree with your comment. Since this comparison is purely speculative and is not a major focus of our study, we decided to remove this section. We will address this point using more direct and sophisticated methods in future work aimed at elucidating this aspect.

      5: "Showing that Nmd4 stabilizes Upf1-HD on RNA in the absence of ATP and that Upf1 is the main RNA binding factor in the Nmd4/Upf1-HD complex." As mentioned above, I don't think one can make the conclusion UPF1 is the main RNA binding factor; there shouldn't be a main and minor. Meanwhile, what will happen if you add ATP in? Or AMPPNP? Or ADP?

      We agree with your comment that our current data do not allow to conclude precisely about the role of Upf1 as major RNA binding factor. We have replaced this sentence by the following one : « Whether this increase in affinity is due to a synergistic effect between both proteins or to an allosteric stimulation of one partner on the RNA binding property of the second partner remains to be clarified. ».

      Regarding the role of the nucleotides on RNA binding properties of the Upf1 helicase domain or the complex, we faced precipitation problems when mixing high concentrations Upf1 and nucleotides for ITC experiments, making difficult to determine Kd values for the interaction between Upf1 and RNA in the presence of nucleotides. However, in a previous study (Dehecq et al; 2018; EMBO J; PMID : 30275269), we observed that AMPPNP did not affect the amount of Nmd4 and Upf1-HD co-precipitated by an RNA oligonucleotide, indicating that nucleotide does not significantly affect the interaction of the complex with RNA.

      6: "But also that a physical interaction between Upf1-HD and the PIN domain exists in vitro, although we were unable to detect it using our various interaction assays." This also confused me, since one cannot detect the interaction in any assay, how could you be so confident there is a physical interaction? Have you tested assays which are good for weak binding?

      We understand that this sentence may be confusing. The tests we have used to determine whether there is a physical interaction between the PIN domain of Nmd4 and Upf1-HD are ITC and pull-down. These are excellent methods for detecting stable interactions with dissociation constants (Kd) in the nanomolar to tens of micromolar range. These two methods did not indicate any direct interaction between the PIN domain of Nmd4 and Upf1-HD. However, we observed that the PIN domain of Nmd4 stimulates the ATPase activity of Upf1-HD to the same extent as the « arm » of Nmd4. This is an indirect indication that the Nmd4 PIN may interact with Upf1-HD, otherwise a stimulatory effect would not be expected. Our radioactivity-based ATPase assay is very sensitive, allowing the detection of a stimulatory effect due to a transient interaction between the PIN domain of Nmd4 and Upf1-HD, which, as indicated above, could not be detected with the interaction assays used. We would also like to point out that in our ATPase conditions, Upf1-HD (0.156 µM) is incubated with a 20-fold molar excess (3.12 µM) of its partners (Nmd4-FL, Nmd4 « arm » or Nmd4 PIN). Such an excess cannot be used in our interaction tests. This could explain the stimulatory effect detected for the PIN domain of Nmd4 in our ATPase assay.

      We have clarified this section by adding the following sentences: « We were unable to detect such an interaction using our different interaction assays (pull-down and ITC), which are optimal for studying interactions with dissociation constants (Kd) in the nanoM to tens of microM range. We therefore assume that a transient low-affinity interaction (high Kd value not detected by our binding assays) exists between Upf1-HD and PIN Nmd4 and can only be detected by highly sensitive assays such as our radioactivity-based ATPase assay, which was performed with a 20-fold molar excess of PIN Nmd4 domain over Upf1-HD. »

      7: Figure 4B should be done in the context of the full length of SMG6 and UPF1.

      **Referees cross-commenting**

      *This session contains comments from both Rev1 and Rev2*

      Rev1:

      There seems to be a contradiction in comments on Figure 4B. I agree with Reviewer 2 that using FL proteins will be informative to see whether the FL proteins indeed interact (or not in the case of the mutants).

      If one wants to use this experiment to map the interacting regions, then I think that the UPF1 HD domain and the short conserved region of SMG6 should be used. The long fragment SMG6 207-580 is not ideal for either. The short constructs would be more suited for a pull-down experiments (like done for the yeast proteins).

      Rev2

      Response to reviewer #1, It is necessary to use the full-length protein (FL protein) to map the interface unless they have pre-existing information to support mapping down to short fragments.

      In addition, performing further structural work would be beyond the scope of this study. Given the additional time and effort required, I do not recommend doing so for this study.

      Rev1:

      As I said, I agree with using the FL proteins. The pre-existing information supporting the mapping comes from sequence alignments with the yeast structure and the mutagenesis. This is further confirmed by Alphafold modeling which in my opinion should be included. As I mentioned in my review, I don't insist on further structural work

      Thank you very much for this comment and the discussions between reviewers, which show that we didn't explain our experimental strategy clearly. Human UPF1 has been shown to interact with SMG6 in both phospho-dependent and phospho-independent modes. In our manuscript, we focus on characterizing the phospho-independent interaction. For this reason, we cannot perform this experiment using the full-length version of SMG6 and UPF1, otherwise the effects of our point mutants on the UPF1-SMG6 interaction could be masked by the phospho-dependent interaction occurring between domain 14.3.3 of SMG6 and the C-terminus of Upf1. To circumvent this problem, we were inspired by former in cellulo studies, which have shown that the SMG6-[207-580] fragment is expressed as a stable protein in human cells and is responsible for the phospho-independent interaction between UPF1 and SMG6 (Chakrabarti et al; 2014; Nucleic Acids Research; PMID: 25013172). Similarly, the helicase domain of UPF1 was found to be sufficient for this phospho-independent interaction with human SMG6 (Nicholson et al; 2014; Nucleic Acids Research; PMID: 25053839). These are the reasons why we decided to use this protein domains in our in cellulo studies to test the effect of our point mutants on the interaction. As indicated above in an answer to one comment to reviewer #1, as our aim was not to reduce this SMG6 region to a shorter peptide but to conduct an amino acid-level analysis by site-directed mutagenesis, this is also why we decided to perform our experiments using the same SMG6 domain as Conti's laboratory and to mutate conserved residues on this fragment. We have also included the AlphaFold2 model of the complex between human UPF1 and SMG6 in our revised version.

      To clarify this point, we have amended the relevant section as follows: « To determine whether this motif might be involved in the interaction between SMG6 and UPF1-HD proteins, we ectopically expressed the region comprising residues 207-580 of human SMG6 fused to a C-terminal HA tag (SMG6-[207-580]-HA) and human UPF1-HD (residues 295-921 fused to a C-terminal Flag tag; UPF1-HD-Flag) in human HEK293T cells, as these regions have previously been shown to be responsible for the phosphorylation-independent interaction between these two proteins. Compared to the full-length UPF1 and SMG6 proteins, these constructs also preclude our findings of any interference from the phosphorylation-dependent interaction occurring between the C-terminus of UPF1 and the 14-3-3 domain of SMG6. »

      8: "The NMD mechanism not only targets mRNAs but also small nucleolar RNAs (snoRNAs) and long noncoding RNAs (lncRNAs) harboring bona fide stop codons but in a specific context such as short upstream open reading frame (uORF), long 3'-UTRs, low translational efficiency or exon-exon junction located downstream of a stop codon." "First, for mRNAs with long 3'-UTRs, the 3'-faux UTR model posits that a long 3 spatial distance between a stop codon and the mRNA poly(A) tail destabilizes NMD substrates by preventing the interaction between the eRF1-eRF3 translation termination complex bound to the A- site of a ribosome recognizing a stop codon and the poly(A)-binding protein (Pab1 or PABP in S. cerevisiae and human, respectively)." These are difficult to read.

      Thank you for this suggestion to improve the clarity of our manuscript. We have tried to make these sentences easier to read as follow:

      « The NMD mechanism also targets mRNAs, small nucleolar RNAs (snoRNAs) and long noncoding RNAs (lncRNAs) carrying normal stop codons located in a specific context (short upstream open reading frame or uORF, long 3'-UTRs, low translational efficiency or exon-exon junction located downstream of a stop codon (3-11)). »

      « The first model, the 3'-faux UTR model posits that for mRNAs with long 3'-UTRs, a long spatial distance between a stop codon and the mRNA poly(A) tail destabilizes NMD substrates. Indeed, it would prevent the physical interaction between the eRF1-eRF3 translation termination complex recognizing a stop codon in the A-site of the ribosome and the poly(A)-binding protein (Pab1 or PABP in S. cerevisiae and human, respectively) bound to the 3' poly(A) tail (12-14). »

      9: please add the Ramachandran plot values.

      Thank you for pointing out this omission. These values have been included in Table EV1.

      __Significance __

      NMD is one of the major topics in the field of gene translational regulation research. this study will be of interest to a broad audience. i am an expert in the structure study in translation. However, I have limited experience in the in vivo study of NMD substrates.

      We are very grateful for the reviewer's comments about the broad interest and the overall quality of our work.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this study, the authors solved the crystal structure of the UPF1 helicase domain in complex with Nmd4. Through the structure and biochemical studies, they uncovered a region responsible for Nmd4 binding to UPF1, also important for their function in NMD. In the end, the authors also extended their findings to the human SMG6, proposing a conserved mechanism for Nmd4 and SMG6.

      The mechanism of UPF1 functioning during NMD is a long-existing question. For decades, people have been trying to find out the roles of all the NMD factors during this process. This study visualized the first direct connection between UPF1 and the putative SMG6 homolog, Nmd4. Undoubtedly, it will aid our understanding of how the whole process works.

      One of the limitations of this study is the conservation between Nmd4 and SMG6. Although they both have a PIN domain, Nmd4 is inactive while SMG6 is active. During NMD, SMG6 is thought to work to cut the mRNA, thus promoting the degradation of the non-functional mRNA. Therefore, Nmd4 and SMG6 may only share a similar binding mode with UPF1, however, they do not share similar functions. This study might only apply to yeast study.

      comments: the study write in a very clear way, and most of the experiments are clear and sound. I do not have any major comments. I only have a few minor comments, listed below:

      1:The authors also solved the PIN domain of the SMG6. This is a result worth showing in the main figure.

      2:It would be easier to read if the authors could add all the binding constants directly into the ITC panels.

      3:I am confused with His6-ZZ. Is ZZ a protein tag?

      4:The comparison between Nob1 and the PIN domain of Nmd4 is not convincing for me. Since the PIN domain is not required for the binding between Nmd4 and UPF1, the conformation of the PIN domain could be a result of the crystal packing. Thus, it is still possible that Nmd4 and UPF1 bind to the same RNA. To this end, I challenge the conclusion the authors have made on the mRNA binding part.

      5: "Showing that Nmd4 stabilizes Upf1-HD on RNA in the absence of ATP and that Upf1 is the main RNA binding factor in the Nmd4/Upf1-HD complex." As mentioned above, I don't think one can make the conclusion UPF1 is the main RNA binding factor; there shouldn't be a main and minor. Meanwhile, what will happen if you add ATP in? Or AMPPNP? Or ADP?

      6: "But also that a physical interaction between Upf1-HD and the PIN domain exists in vitro, although we were unable to detect it using our various interaction assays." This also confused me, since one cannot detect the interaction in any assay, how could you be so confident there is a physical interaction? Have you tested assays which are good for weak binding?

      7: Figure 4B should be done in the context of the full length of SMG6 and UPF1.

      8: "The NMD mechanism not only targets mRNAs but also small nucleolar RNAs (snoRNAs) and long noncoding RNAs (lncRNAs) harboring bona fide stop codons but in a specific context such as short upstream open reading frame (uORF), long 3'-UTRs, low translational efficiency or exon-exon junction located downstream of a stop codon." "First, for mRNAs with long 3'-UTRs, the 3'-faux UTR model posits that a long 3 spatial distance between a stop codon and the mRNA poly(A) tail destabilizes NMD substrates by preventing the interaction between the eRF1-eRF3 translation termination complex bound to the A- site of a ribosome recognizing a stop codon and the poly(A)-binding protein (Pab1 or PABP in S. cerevisiae and human, respectively)." These are difficult to read.

      9: please add the Ramachandran plot values.

      Significance

      NMD is one of the major topics in the field of gene translational regulation research. this study will be of interest to a broad audience. i am an expert in the structure study in translation. However, I have limited experience in the in vivo study of NMD substrates.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, the authors report a molecular mechanism for recruiting syntaixn 17 (Syn17) to the closed autophagosomes through the charge interaction between enriched PI4P and the C-terminal region of Syn17. How to precisely control the location and conformation of proteins is critical for maintaining autophagic flux. Particularly, the recruitment of Syn17 to autophagosomes remains unclear. In this paper, the author describes a simple lipid-protein interaction model beyond previous studies focusing on protein-protein interactions. This represents conceptual advances.

      We would like to thank Reviewer #1 for the positive evaluation of our study.

      Reviewer #2 (Public Review):

      Summary:

      Syntaxin17 (STX17) is a SNARE protein that is recruited to mature (i.e., closed) autophagosomes, but not to immature (i.e., unclosed) ones, and mediates the autophagosome-lysosome fusion. How STX17 recognizes the mature autophagosome is an unresolved interesting question in the autophagy field. Shinoda and colleagues set out to answer this question by focusing on the C-terminal domain of STX17 and found that PI4P is a strong candidate that causes the STX17 recruitment to the autophasome.

      Strengths:

      The main findings are: 1) Rich positive charges in the C-terminal domain of STX17 are sufficient for the recruitment to the mature autophagosome; 2) Fluorescence charge sensors of different strengths suggest that autophagic membranes have negative charges and the charge increases as they mature; 3) Among a battery of fluorescence biosensors, only PI4P-binding biosensors distribute to the mature autophagosome; 4) STX17 bound to isolated autophagosomes is released by treatment with Sac1 phosphatase; 5) By dynamic molecular simulation, STX17 TM is shown to be inserted to a membrane containing PI4P but not to a membrane without it. These results indicate that PI4P is a strong candidate that STX17 binds to in the autophagosome.

      We would like to thank Reviewer #2 for pointing out these strengths.

      Weaknesses:

      • It was not answered whether PI4P is crucial for the STX17 recruitment in cells because manipulation of the PI4P content in autophagic membranes was not successful for unknown reasons.

      As we explained in the initial submission, we tried to deplete PI4P in autophagosomes by multiple methods but did not succeed. In this revised manuscript, we added the result of an experiment using the PI 4-kinase inhibitor NC03 (Figure 4―figure supplement 1), which shows no significant effect on the autophagosomal PI4P level and STX17 recruitment.

      Author response image 1.

      The PI 4-kinase inhibitor NC03 failed to suppress autophagosomal PI4P accumulation and STX17 recruitment. HEK293T cells stably expressing mRuby3–STX17TM (A) or mRuby3–CERT(PHD) (B) and Halotag-LC3 were cultured in starvation medium for 1 h and then treated with and without 10 μM NC03 for 10 min. Representative confocal images are shown. STX17TM- or CERT(PHD)-positive rates of LC3 structures per cell (n > 30 cells) are shown in the graphs. Solid horizontal lines indicate medians, boxes indicate the interquartile ranges (25th to 75th percentiles), and whiskers indicate the 5th to 95th percentiles. Differences were statistically analyzed by Welch’s t-test. Scale bars, 10 μm (main), 1 μm (inset).

      • The molecular simulation study did not show whether PI4P is necessary for the STX17 TM insertion or whether other negatively charged lipids can play a similar role.

      As the reviewer suggested, we performed the molecular dynamics simulation using membranes with phosphatidylinositol, a negatively charged lipid. STX17 TM approached the PI-containing membrane but was not inserted into the membrane within a time scale of 100 ns in simulations of all five structures. This data suggests that PI4P, which is more negatively charged than PI, is required for STX17 insertion. Thus, we have included these data in Figure 5E and F and added the following text to Lines 242–244. “Moreover, if the membrane contained phosphatidylinositol (PI) instead of PI4P, STX17 approached the PI-containing membrane but was not inserted into the membrane (Figure 5E, F, Video 3)."

      Author response image 2.

      (E) An example of a time series of simulated results of STX17TM insertion into a membrane consisting of 70% phosphatidylcholine (PC), 20% phosphatidylethanolamine (PE), and 10% phosphatidylinositol (PI). STX17TM is shown in blue. Phosphorus in PC, PE and PI are indicated by yellow, cyan, and orange, respectively. Short-tailed lipids are represented as green sticks. The time evolution series are shown in Video 3. (F) Time evolution of the z-coordinate of the center of mass (z_cm) of the transmembrane helices of STX17TM in the case of membranes with PI. Five independent simulation results are represented by solid lines of different colors. The gray dashed lines indicate the locations of the lipid heads. A scale bar indicates 5 nm.

      • The question that the authors posed in the beginning, i.e., why is STX17 recruited to the mature (closed) autophagosome but not to immature autophagic membranes, was not answered. The authors speculate that the seemingly gradual increase of negative charges in autophagic membranes is caused by an increase in PI4P. However, this was not supported by the PI4P fluorescence biosensor experiment that showed their distribution to the mature autophagosome only. Here, there are at least two possibilities: 1) The increase of negative charges in immature autophagic membranes is derived from PI4P. However the fluorescence biosensors do not bind there for some reason; for example, they are not sensitive enough to recognize PI4P until it reaches a certain level, or simply, their binding does not occur in a quantitative manner. 2) The negative charge in immature membranes is not derived from PI4P, and PI4P is generated abundantly only after autophagosomes are closed. In either case, it is not easy to explain why STX17 is recruited to the mature autophagosome only. For the first scenario, it is not clear how the PI4P synthesis is regulated so that it reaches a sufficient level only after the membrane closure. In the second case, the mechanism that produces PI4P only after the autophagosome closure needs to be elucidated (so, in this case, the question of the temporal regulation issue remains the same).

      We thank the reviewers for pointing this out. While the probe for weakly negative charges (1K8Q) labeled both immature and mature autophagosomes, the probes for intermediate charges (5K4Q and 3K6Q) and PI4P labeled only mature autophagosomes (Figure 2F, Figure 2–figure supplement 1B). Thus, we think that the autophagosomal membrane rapidly and drastically becomes negatively charged, and at the same time, PI4P is enriched. Although immature membranes may have weak negative charges, we did not examine which lipids contribute to the negative charges. Thus, we have added the following sentences to the Discussion part.

      “Our data of the 1K8Q probe suggest that immature autophagosomal membranes may also have slight negative charges (Figure 2E). Although the source of the negative charge of immature autophagosomes is currently unknown, it may be derived from low levels of PI4P, which is undetectable by the PI4P probes and/or other negatively charged lipids such as PI and PS (Schmitt et al., EMBO Rep, 2022).” (Lines 279–283) “In any case, it would be important to elucidate how PI 4-kinase activity or PI4P synthesis is upregulated during autophagosome maturation.” (Lines 302–303)

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors set out to address the question of how the SNARE protein Syntaxin 17 senses autophagosome maturation by being recruited to autophagosomal membranes only once autophagosome formation and sealing is complete. The authors discover that the C-terminal region of Syntaxin 17 is essential for its sensing mechanism that involves two transmembrane domains and a positively charged region. The authors discover that the lipid PI4P is highly enriched in mature autophagosomes and that electrostatic interaction with Syntaxin 17's positively charged region with PI4P drives recruitment specifically to mature autophagosomes. The temporal basis for PI4P enrichment and Syntaxin 17 recruitment to ensure that unsealed autophagosomes do not fuse with lysosomes is a very interesting and important discovery. Overall, the data are clear and convincing, with the study providing important mechanistic insights that will be of broad interest to the autophagy field, and also to cell biologists interested in phosphoinositide lipid biology. The author's discovery also provides an opportunity for future research in which Syntaxin 17's c-terminal region could be used to target factors of interest to mature autophagosomes.

      Strengths:

      The study combines clear and convincing cell biology data with in vitro approaches to show how Syntaxin 17 is recruited to mature autophagosomes. The authors take a methodical approach to narrow down the critical regions within Syntaxin 17 required for recruitment and use a variety of biosensors to show that PI4P is enriched on mature autophagosomes.

      We would like to thank Reviewer #3 for the positive comments.

      Weaknesses:

      There are no major weaknesses, overall the work is highly convincing. It would have been beneficial if the authors could have shown whether altering PI4P levels would affect Syntaxin 17 recruitment. However, this is understandably a challenging experiment to undertake and the authors outlined their various attempts to tackle this question.

      We thank Reviewer #3 for pointing this out. Please see our above response to Reviewer #2 (Public Review).

      In addition, clear statements within the figure legends on the number of independent experimental repeats that were conducted for experiments that were quantitated are not currently present in the manuscript.

      As pointed out by Reviewer #3, we have added the number of independent experimental repeats in the figure legends.

      Reviewer #1 (Recommendations For The Authors):

      This paper is well written and all experiments were conducted with a high standard. Several minor issues should be addressed before final publication.

      (1) To further confirm the charge interaction, a charge screening experiment should be performed for Fig. 2A.

      We have asked Reviewer #1 through the editor what this experiment meant and understood that it was to see the effects of high salt concentrations. We monitored the association of GFP-STX17TM with liposomes in the presence or absence of 1 M NaCl and found that it was blocked in a high ionic buffer. This data supports the electrostatic interaction of STX17 with membranes. We have included this data in Figure 2B and added the following sentences to Lines 124–126.

      “The association of STX17TM with PI4P-containing membranes was abolished in the presence of 1 M NaCl (Figure 2B). These data suggest that STX17 can be recruited to negatively charged membranes via electrostatic interaction independent of the specific lipid species.”

      Author response image 3.

      GFP–STX17TM translated in vitro was incubated with rhodamine-labeled liposomes containing 70% PC, 20% PE and 10% PI4P in the presence of 1 M NaCl or 1.2 M sucrose. GFP intensities of liposomes were quantified and shown as in Figure 1C (n > 30).

      (2) The authors claim that "Autophagosomes become negatively charged during maturation", based on experiments using membrane charge probes. Since it's mainly about the membrane, it's better to refine the claim to "The membrane of autophasosomes becomes...", which would be more precise and close to the topic of this paper.

      We would like to thank the reviewer for pointing this out. This point is valid. As recommended, we have collected the phrases “Autophagosomes become negatively charged during maturation” to “The membrane of autophagosomes becomes negatively charged during maturation” (Line 72, 118, 262, 969 (title of Figure2), 1068 (title of Figure2–figure supplyment1)).

      (3) The authors should add more discussion regarding the "specificity" for recruiting Syn17 through the charge interaction. Particularly, how Syn17 could be maintained before the closure of autophagosomes? For the MD simulations in Fig. 5, the current results don't add much to the manuscript. The cell biology experiments have demonstrated the conclusion. The authors could try to find more details about the insertion by analyzing the simulation movies. Do membrane packing defects play a role during the insertion process? A similar analysis was conducted for alpha-synuclein (https://pubmed.ncbi.nlm.nih.gov/33437978/).

      Regarding the mechanism of STX17 maintenance in the cytosol, we do not think that other molecules, such as chaperones, are essential because purified recombinant mGFP-STX17TM used in this study is soluble. However, it does not rule out such a mechanism, which would be a future study.

      In the paper by Liu et al. (PMID: 33437978), small liposomes with diameters of 25–50 nm are used. Therefore, there are packing defects in the highly curved membranes, to which alpha-synuclein helices are inserted in a curvature-dependent manner. On the other hand, autophagosomes are much larger (~1 um in diameter) and almost flat for STX17 molecules, so we think it is unlikely that STX17 recognizes the packing defect.

      Reviewer #2 (Recommendations For The Authors):

      • The two (and other) possibilities with regards to the interpretation of the negative charge/PI4P result in autophagic membranes are hoped to be discussed.

      As mentioned above, we have added the following sentences to the Discussion section. “Our data of the 1K8Q probe suggest that immature autophagosomal membranes may also have slight negative charges (Figure 2E). Although the source of the negative charge of immature autophagosomes is currently unknown, it may be derived from low levels of PI4P, which is undetectable by the PI4P probes and/or other negatively charged lipids such as PI and PS (Schmitt et al., EMBO Rep, 2022).” (Lines 279–283)

      “In any case, it would be important to elucidate how PI 4-kinase activity or PI4P synthesis is upregulated during autophagosome maturation.” (Lines 302–303)

      • Fluorescence biosensors are convenient to give an overview of the intracellular distribution of various lipids, but some of them show false-negative results. For example, evectin-2-PH for PS binds to endosomes but not to the plasma membrane, even though the latter contains abundant PS. With regards to PI4P, some biosensors illuminate both the Golgi and autophagosome, while others do not appear to bind the Golgi. Moreover, fluorescence biosensors for PI(3,5)P2 and PI(3,4)P2, which are also candidates for the STX17 insertion issue, are less reliable than others (e.g., those for PI3P and PI(4,5)P2). These problems need to be considered.

      We agree with Reviewer #2 that fluorescence biosensors are not perfect for detecting specific lipids. Based on the Reviewer’s suggestion, we have included a comment on this in the Discussion section as follows (Lines 265–268).

      “Given the possibility that fluorescence lipid probes may give false-negative results, a more comprehensive biochemical analysis, such as lipidomics analysis of mature autophagosomes, would be imperative to elucidate the potential involvement of other negatively charged lipids.”

      • A negative control for the PI4P biosensor, i.e., a mutant lacking the PI4P binding ability, is better to be tested to confirm the presence of PI4P in autophagosomes.

      We would like to thank the Reviewer for this comment. We conducted the suggested experiment and confirmed that the CERT(PHD)(W33A) mutant, which is deficient for PI4P binding (Sugiki et al., JBC. 2012), was diffusely present in the cytosol and did not localize to STX17-positive autophagosomes. This data supports our conclusion that PI4P is indeed present in autophagosomes. We have included this data in Figure 3–figure supplement 2A and explained it in the text (Lines 164–166).

      Author response image 4.

      Mouse embryonic fibroblasts (MEFs) stably expressing GFP–CERT(PHD)(W33A) and mRuby3–STX17TM were cultured in starvation medium for 1 h. Bars indicate 10 μm (main images) and 1 μm (insets).

      • As a control to the molecular dynamic simulation study, STX17 TM insertion into a membrane containing other negative charge lipids, especially PI, needs to be tested. PI is a negative charge lipid that is likely to exist in autophagic membranes (as suggested by the authors' past study).

      We thank the reviewers for this suggestion. As mentioned above (Reviewer #2, Public Review), we performed the molecular dynamics simulation using membranes containing PI and added the results in Figure 5E and F and Video 3.

      • If the putative role of PI4P could be shown in the cellular context, the authors' conclusion would be much strengthened. I wonder if overexpression of PI4P fluorescence biosensors, especially those that appear to bind to the autophagosome almost exclusively, may suppress the recruitment of STX17 there.

      We would like to thank the Reviewer for asking this question. In MEFs stably overexpressing PI4P probes driven by the CMV promoter, STX17 recruitment was not affected. Thus, simple overexpression of PI4P probes does not appear to be effective in masking PI4P in autophagosomes.

      Another idea is to use an appropriate molecule (e.g., WIPI2, ATG5) and to recruit Sac1 to autophagic membranes by using the FRB-FKBP system or the like. I hope these and other possibilities will be tested to confirm the importance of PI4P in the temporal regulation of STX17 recruitment.

      We tried the FRB-FKBP system using the phosphatase domain of yeast Sac1 fused to FKBP and LC3 fused to FRB, but unfortunately, this system failed to deplete PI4P from the autophagosomal membrane.

      Reviewer #3 (Recommendations For The Authors):

      A few areas for suggested improvement are:

      (1) It would be helpful if the authors could clarify for all figures how many independent experiments were conducted for all experiments, particularly those that have quantitation and statistical analyses.

      As pointed out by Reviewer #3, we have added the number of independent experimental repeats in the figure legends.

      The authors made several attempts to modulate PI4P levels on autophagosomes although understandably this proved to be challenging. A couple of suggestions are provided to address this area:

      (2) Given the reported role of GABARAPs in PI4K2a recruitment and PI4P production on autophagosomes, as well as autophagosome-lysosome fusion (Nguyen et al (2016) J Cell Biol) it would be worthwhile to assess whether GABARAP TKO cells have reduced PI4P and reduced Stx17 recruitment

      According to the Reviewer’s suggestion, we examined the localization of STX17 TM and the PI4P probe CERT(PHD) in ATG8 family (LC3/GABARAP) hexa KO HeLa cells that were established by the Lazarou lab (Nguyen et al., JCB 2016). As in WT cells, STX17 TM and CERT(PHD) were still colocalized with each other in hexa KO cells, suggesting that neither STX17 recruitment nor PI4P enrichment depends on ATG8 family proteins (note: the size of autophagosomes in HeLa cells is smaller than in MEFs, making it difficult to observe autophagosomes as ring-shaped structures). We have included this result in Figure 3–figure supplement 2(F) and explained it in the text (Lines 194–196, 198).

      Author response image 5.

      (F) WT and ATG8 hexa KO HeLa cells stably expressing GFP–STX17TM and transiently expressing mRuby3–CERT(PHD) were cultured in starvation medium. Bars indicate 10 μm (main images) and 1 μm (insets).

      (3) Can the authors try fusing Sac1 to one of the PI4P probes (CERT(PHD)) that were used, or alternatively to the c-terminus of Syntaxin 17? This approach would help to recruit Sac1 only to mature autophagosomes and could therefore prevent the autophagosome formation defect observed when fused to LC3B that targeted Sac1 to autophagosomes as they were forming. Understandably, this approach might seem a bit counterintuitive since the phosphatase is removing PI4P which is what is recruiting it but it could be a viable approach to keep PI4P levels low enough on mature autophagosomes so that Syntaxin 17 is no longer recruited. A Sac1 phosphatase mutant might be needed as a control.

      We would like to thank the Reviewer for these suggestions. We tried the phosphatase domain of yeast Sac1 or human SAC1 fused with STX17TM, but unfortunately, these fusion proteins did not deplete PI4P from autophagosomes.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable findings on the roles of the axon growth regulator Sema7a in the formation of peripheral sensory circuits in the lateral line system of zebrafish. The evidence supporting the claims of the authors is solid, although further work directly testing the roles of different sema7a isoforms would strengthen the analysis. The work will be of interest to developmental neuroscientists studying circuit formation.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work, Dasguta et al. have dissected the role of Sema7a in fine tuning of a sensory microcircuit in the posterior lateral line organ of zebrafish. They attempt to also outline the different roles of a secreted verses membrane-bound form of Sema7a in this process. Using genetic perturbations and axonal network analysis, the authors show that loss of both Sema7a isoforms causes abnormal axon terminal structure with more bare terminals and fewer loops in contact with presynaptic sensory hair cells. Further, they show that loss of Sema7a causes decreased number and size of both the pre- and post-synapse. Finally, they show that overexpression of the secreted form of Sema7a specifically can elicit axon terminal outgrowth to an ectopic Sema7a expressing cell. Together, the analysis of Sema7a loss of function and overexpression on axon arbor structure is fairly thorough and revealed a novel role for Sema7a in axon terminal structure. However, the connection between different isoforms of Sema7a and the axon arborization needs to be substantiated. Furthermore, an autocrine role for Sema7a on the presynaptic cell is not ruled out as a contributing factor to the synaptic and axon structure phenotypes.

      Finally, critical controls are absent from the overexpression paradigm.

      Comments: Thank you for your valuable comments. We have analyzed the hair cell scRNA transcriptome data of zebrafish neuromasts from published works and have not identified known expression of receptors of the Sema7A protein, particularly PlexinC1 and Integrin β1 molecules (reference 4 and 15) in hair cells. This result suggests that the Sema7A protein molecule, either secreted or membrane-bound, does not possess its cognate receptor to elicit an autocrine function on the hair cells. Moreover, the GPI-anchored Sema7A lacks a cytosolic domain. So it is unlikely that Sema7A signaling directly induces the formation of presynaptic ribbons. We propose that the decrease in average number and area of synaptic aggregates likely reflects decreased stability of the synaptic structures owing to lack of contact between the sensory axons and the hair cells, which has been identified in zebrafish neuromasts (reference 38).

      Thank you for pointing missing critical control experiments. Additional control experiments (lines 333-346) with a new figure (Figure 5) have been added.

      These issues weaken the claims made by the authors including the statement that they have identified differential roles for the GPI-anchored verses secreted forms of Sema7a on synapse formation and as a chemoattractant for axon arborization respectively.

      Comments: We have rephrased our statement and argue in lines 428-430 that our experiments “suggest a potential mechanism for hair cell innervation in which a local Sema7Asec diffusive cue likely consolidates the sensory arbors at the hair cell cluster and the membrane-anchored Sema7A-GPI molecule guides microcircuit topology and synapse assembly.”

      The manuscript itself would benefit from the inclusion of details in the text to help the reader interpret the figures, tools, data, and analysis.

      Comments: We have made significant revisions to the text and figures to improve clarity and consistency of the manuscript.

      Reviewer #2 (Public Review):

      In this work, Dasgupta et al. investigates the role of Sema7a in the formation of peripheral sensory circuit in the lateral line system of zebrafish. They show that Sema7a protein is present during neuromast maturation and localized, in part, to the base of hair cells (HCs). This would be consistent with pre-synaptic Sema7a mediating formation and/or stabilization of the synapse. They use sema7a loss-of-function strain to show that lateral line sensory terminals display abnormal arborization. They provide highly quantitative analysis of the lateral line terminal arborization to show that a number of specific topological parameters are affected in mutants. Next, they ectopically express a secreted form of Sema7a to show that lateral line terminals can be ectopically attracted to the source. Finally, they also demonstrate that the synaptic assembly is impaired in the sema7a mutant. Overall, the data are of high quality and properly controlled. The availability of Sema7a antibody is a big plus, as it allows to address the endogenous protein localization as well to show the signal absence in the sema7a mutant. The quantification of the arbor topology should be useful to people in the field who are looking at the lateral line as well as other axonal terminals. I think some results are overinterpreted though. The authors state: "Our findings demonstrate that Sema7A functions both as a juxtracrine and as a secreted cue to pattern neural circuitry during sensory organ development." However, they have not actually demonstrated which isoform functions in HCs (also see comments below).

      Comments: Thank you for making this point. To investigate the presence of both sema7a transcripts in the hair cells of the lateral-line neuromasts, we used the Tg(myo6b:actb1EGFP) transgenic fish to capture the labeled hair cells by fluorescence-activated cell sorting (FACS) and isolated total RNA. Using transcript specific DNA oligonucleotide primers, we have identified the presence of both sema7a transcript variants in the hair cell of the neuromast. Even though we have not developed transcript specific knockout animals, we speculate that the presence of both transcript variants in the hair cell implies that they function in distinct fashion. We have changed our interpretation in lines 32-34 to “Our findings propose that Sema7A likely functions both as a juxtracrine and as a secreted cue to pattern neural circuitry during sensory organ development.”

      In future we will utilize the CRISPR/Cas9 technique to target the unique C-terminal domain of the GPI-anchored sema7a transcript variant. We believe that this will only perturb the formation of the full-length Sema7A protein and help us determine the role of the membrane-bound Sema7AGPI molecule as well as the Sema7Asec in sensory arborization and synaptic assembly.

      In addition, they have to be careful in interpreting their topology analysis, as they cannot separate individual axons. Thus, such analysis can generate artifacts. They can perform additional experiments to address these issues or adjust their interpretations.

      Comments: Thank you for this insightful comment. In a previous eLife publication from our laboratory, we utilized the serial blockface scanning electron micrograph (SBFSEM) technique to characterize the connectome of the neuromast microcircuit where patterns of innervation of all the individual axons can be delineated in five-days-old larvae (reference 8). However, the collective behavior of all the sensory axons that build the innervation network remained enigmatic, especially in a living animal during development. In this paper we addressed how the sensory-axon collective behaves around the clustered hair cells and build the innervation network in living animals during diverse developmental stages. Our analyses have not only identified how the axons associates with the hair cell cluster as the organ matures, but also discovered distinct topological features in the arbor network that emerges during organ maturation, which may influence assembly of postsynaptic aggregates (lines 384-403, Figure 6G-I). We believe that our quantitative approach to capture collective axonal behaviors and their topological attributes during circuit formation have highlighted the importance of understanding network assembly during sensory organ development.

      Reviewer #3 (Public Review):

      Summary:

      This study demonstrates that the axon guidance molecule Sema7a patterns the innervation of hair cells in the neuromasts of the zebrafish lateral line, as revealed by quantifying gain- and loss-of function effects on the three-dimensional topology of sensory axon arbors over developmental time. Alternative splicing can produce either a diffusible or membrane-bound form of Sema7a, which is increasingly localized to the basolateral pole of hair cells as they develop (Figure 1). In sema7a mutant zebrafish, sensory axon arbors still grow to the neuromast, but they do not form the same arborization patterns as in controls, with many arbors overextending, curving less, and forming fewer loops even as they lengthen (Figure 2,3). These phenotypes only become significant later in development, indicating that Sema7a functions to pattern local microcircuitry, not the gross wiring pattern. Further, upon ectopic expression of the diffusible form of Sema7a, sensory axons grow towards the Sema7a source (Figure 4). The data also show changes in the synapses that form when mutant terminals contact hair cells, evidenced by significantly smaller pre- and post-synaptic punctae (Figure 5). Finally, by replotting single cell RNA-sequencing data (Figure 6), the authors show that several other potential cues are also produced by hair cells and might explain why the sema7a phenotype does not reflect a change in growth towards the neuromast. In summary, the data strongly indicate that Sema7a plays a role in shaping connectivity within the neuromast.

      Strengths:

      The main strength of this study is the sophisticated analysis that was used to demonstrate fine-level effects on connectivity. Rather than asking "did the axon reach its target?", the authors asked "how does the axon behave within the target?". This type of deep analysis is much more powerful than what is typical for the field and should be done more often. The breadth of analysis is also impressive, in that axon arborization patterns and synaptic connectivity were examined at 3 stages of development and in three-dimensions.

      Weaknesses:

      The main weakness is that the data do not cleanly distinguish between activities for the secreted and membrane-bound forms of Sema7a, which the authors speculate may influence axon growth and synapse formation respectively. The authors do not overstate the claims, but it would have been nice to see some additional experimentation along these lines, such as the effects of overexpressing the membrane-bound form,

      Comments: We have accepted this useful suggestion. In lines 333-346 and in Figure 5 we have demonstrated the impact of overexpressing the membrane-bound transcript variant on arborization pattern of the sensory axons.

      Some analysis of the distance over which the "diffusible" form of Sema7a might act (many secreted ligands are not in fact all that diffusible), or

      Comments: We have reported this in lines 311-317 and in Figure 4F,G.

      Some live-imaging of axons before they reach the target (predicted to be the same in control and mutants) and then within the target (predicted to be different).

      Comments: We have accepted this useful suggestion. We demonstrate the dynamics of the sensory arbors that are attracted to an ectopic Sema7Asec source in lines 325-332, Figure 4I,J; Figure 4—figure supplement 2A, and Videos 13-16.

      Clearly, although the gain-of-function studies show that Sema7a can act at a distance, other cues are sufficient. Although the lack of a phenotype could be due to compensation, it is also possible that Sema7a does not actually act in a diffusible manner within its natural context. Overall, the data support the authors' carefully worded conclusions. While certain ideas are put forward as possibilities, the authors recognize that more work is needed. The main shortcoming is that the study does not actually distinguish between the effects of the two forms of Sema7a, which are predicted but not actually shown to be either diffusible or membrane linked (the membrane linkage can be cleaved). Although the study starts by presenting the splice forms, there is no description of when and where each splice form is transcribed.

      Comments: We have utilized the HCR™ RNA-FISH Technology to generate transcript specific probes. To generate transcript-specific HCR probes to distinctly detect the sema7aGPI (NM_001328508) and the sema7asec (NM_001114885) transcripts, Molecular Instruments could design only 11 probes against the sema7aGPI transcript and only one probe against the sema7asec transcript (personal correspondence with Mike Liu, PhD, Head of Operations and Product Development Lead Molecular Instruments, Inc.). The HCR probe against the sema7aGPI transcript showed a very faint signal. Unfortunately, the HCR probe against the sema7asec transcript failed to detect the presence of any transcript. For robust detection of transcripts, the protocol demands a minimum of 20 probes. We believe that the very low number of probes against our transcripts is the primary reason for the absence of a signal.

      We therefore utilized fluorescence-activated cell sorting (FACS) to capture the labeled hair cells and isolated total RNA to perform RT-PCR using transcript specific DNA oligonucleotide primers. We identified the presence of both the secreted and the membrane-bound transcripts at four-days-old neuromasts (lines 80-84, Figure 1B-D).

      Additionally, since the mutants are predicted to disrupt both forms, it is a bit difficult to disentangle the synaptic phenotype from the earlier changes in circuit topology - perhaps the change at the level of the synapse is secondary to the change in topology.

      Comments: Thank you for the insightful suggestion. We have analyzed the relationship between the sensory arbor network topology and the distribution of postsynaptic structures (lines 384-403, Figure 6G-I). We identified that the distribution of the postsynaptic aggregates is closely associated with the topological attributes of the sensory circuit. We further clarify the potential origin of disrupted synaptic assemblies in sema7a-/- mutants in lines 380-382 and lines 417-420.

      Further, the authors do not provide any data supporting the idea that the membrane bound form of Sema7a acts only locally. Without these kinds of data, the authors are unable to attribute activities to either form.

      Comments: We have accepted this useful suggestion and have prepared the Figure 5 with the necessary details.

      The main impact on the field will be the nature of the analysis. The field of axon guidance benefits from this kind of robust quantification of growing axon trajectories, versus their ability to actually reach a target. This study highlights the value of more careful analysis and as a result, makes the point that circuit assembly is not just a matter of painting out paths using chemoattractants and repellants, but is also about how axons respond to local cues. The study also points to the likely importance of alternative splice forms and to the complex functions that can be achieved using different forms of the same ligand.

      Reviewer #4 (Public Review):

      Summary:

      The work by Dasgupta et al identifies Sema7a as a novel guidance molecule in hair cell sensory systems. The authors use the both genetic and imaging power of the zebrafish lateralline system for their research. Based on expression data and immunohistochemistry experiments, the authors demonstrate that Sema7a is present in lateral line hair cells. The authors then examine a sema7a mutant. In this mutant, Sema7a proteins levels are nearly eliminated. Importantly, the authors show that when Sema7a is absent, afferent terminals show aberrant projections and fewer contacts with hair cells. Lastly the authors show that ectopic expression of the secreted form of Sema7a is sufficient to recruit aberrant terminals to non-hair cell targets. The sema7a innervation defects are well quantified. Overall, the paper is extremely well written and easy to follow.

      Strengths:

      (1) The axon guidance phenotypes in sema7a mutants are novel, striking and thoroughly quantified.

      (2) By combining both loss of function sema7a mutants and ectopic expression of the secreted form of Sema7a the authors demonstrate the Sema7a is both necessary and sufficient to guide sensory axons

      Weaknesses:

      (1) Control. There should be an uninjected heatshock control to ensure that heatshock itself does not cause sensory afferents to form aberrant arbors. This control would help support the hypothesis that exogenously expressed Sema7a (via a heatshock driven promoter) is sufficient to attract afferent arbors.

      Comments: Thank you for the suggestion. We have added the uninjected heatshock control experiment in Figure 5 and described experimental details in the text, lines 343-345.

      (2) Synapse labeling. The numbers obtained for postsynaptic labeling in controls do not match up with the published literature - they are quite low. Although there are clear differences in postsynaptic counts between sema7a mutants and controls, it is worrying that the numbers are so low in controls. In addition, the authors do not stain for complete synapses (pre- and post-synapses together). This staining is critical to understand how Sema7a impacts synapse formation.

      Comments: Thank you for raising this issue. We believe the low average numbers of the postsynaptic punctae in control neuromasts arise from lack of formation of postsynaptic aggregates beneath the immature hair cells, which are abundant in early stages of neuromast maturation. We have performed exhaustive analysis on the formation of pre- and postsynaptic structures and have identified how their distribution changes along neuromast development in control larvae. We have further analyzed how such distribution is perturbed in the sema7a-/- mutants. We do not think analyzing the complete synapse structure will add much to our understanding of how Sema7A influence synapse formation and maintenance.

      (3) Hair cell counts. The authors need to provide quantification of hair cell counts per neuromast in mutant and control animals. If the counts are different, certain quantification may need to be normalized.

      Comments: We have added the raw data with the hair cell counts in both control and sema7a-/- mutants across developmental stages. The homozygous sema7a-/- mutants have slightly less hair cells and we have normalized all our topological analyses by the corresponding hair cell numbers for each neuromast in each experiment (lines 669-675).

      (4) Developmental delay. It is possible that loss of Sema7a simply delays development. The latest stage examined was 4 dpf, an age that is not quite mature in control animals. The authors could look at a later age, such as 6 dpf to see if the phenotypes persist or recover.

      Comments: The homozygous sema7a-/- mutants are unviable and die at 6 dpf. We therefore restricted our analysis till 4 dpf. The association of the sensory arbors with the clustered hair cells gradually decreases as the neuromasts mature from 2 dpf to 4dpf in the sema7a-/- mutants (lines 174-176, Figure 2I). Moreover, in the sema7a-/- mutants the sensory axons throw long projections that keep getting farther away from the clustered hair cells as the neuromast matures from 2 dpf to 4 dpf (lines 166-168, Figure 2H; Figure 2—figure supplement 1K,L). These observations suggest that if the phenotypes in the sema7a-/- mutants were due to developmental delays, then we should have seen a recovery of disrupted arborization patterns over time. But instead, we observe a further deterioration of the arborization patterns and other architectural assemblies. These findings confirm that the observed phenotypes in the sema7a-/- mutants are not due to delayed development of the larvae, but a specific outcome for the loss of Sema7A protein.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major concerns:

      Issue 1: One of the most interesting conclusions in this manuscript is the function of the GPIanchored vs. secreted form of Sema7a in axon structure and synapse formation. In lines 357360 of the discussion (for example) the authors state that they have shown that the GPIanchored form of Sema7a is responsible for contact-mediated synapse formation while the secreted form functions as a chemoattractant for axon arbor structure. "We have discovered dual modes of Sema7A function in vivo: the chemoattractive diffusible form is sufficient to guide the sensory arbors toward their target, whereas the membrane-attached form likely participates in sculpting accurate neural circuitry to facilitate contact-mediated formation and maintenance of synapses." However, the data do not support this conclusion. Specifically, no analysis is done showing unique expression of either isoform in hair cells and no functional analysis is done to conclusively determine which isoform is important for either phenotype.

      Comments: We have shown that both sema7a transcripts are expressed in the hair cells of four-day-old neuromasts (lines 78-84, Figure 1C,D). Ectopic expression of the sema7asec transcript variant robustly attracts the lateral-line sensory arbors toward itself, whereas ectopic expression of the sema7aGPI variant fails to impart sensory guidance from a distance, suggesting that the membrane-bound form likely participates in contact-mediated neural guidance. These experiments decisively show, for the first time in zebrafish, the dual modes of Sema7A function in vivo. However, we agree that the sema7aGPI transcript-specific knockout animal would be essential to conclusively prove that the membrane-attached form is primarily involved in forming accurate neural circuitry and contact-mediated formation and maintenance of synapses. Hence, we have very carefully stated in lines 427-428 that “the membrane-attached form likely participates in sculpting accurate neural circuitry to facilitate contact-mediated formation and maintenance of synapses”. We will follow up on this suggestion in our upcoming manuscript that will incorporate transcript-specific genetic ablations.

      Though the authors present RT-PCR analysis of sema7a isoforms, it is not interpretable. The second reverse primer will also recognize the full-length transcript (from what I can gather) so it does not simply show the presence of the secreted form. Is there a unique 3'UTR for the short transcript that can be used? Additionally, for the GPI-anchored version can you use a forward primer that is not present in the short isoform? This would shed some light on the respective levels of both transcripts.

      Comments: The C-termini of the two transcript variants are distinct and we have designed distinct primers that will selectively bind to each transcript (lines 503-511). Since, we have not performed quantitative polymerase chain reaction (qPCR), relative levels of each transcript are hard to determine.

      Alternatively, and perhaps of more use, in situ hybridization using unique probes for each isoform would allow you to determine which are actually present in hair cells.

      Comments: We have tried this approach and explained the point earlier (refer to lines 203212 of this response letter).

      To decisively state that these isoforms have unique functions in axon terminal structure and synapse formation, other experiments are also essential. For example, RNA-mediated rescue analyses using both isoforms would tell you which can rescue the axonal structure and synapse size/number phenotypes. Overexpression of the GPI-anchored form, like the secreted form in Figure 4, would allow you to determine if only the secreted form can cause abnormal axon extension phenotypes. Expression of both forms in hair cells (using a myo6b promotor for example) would allow assessment of their role in presynapse formation.

      Comments: We have ectopically expressed the sema7aGPI transcript variant near the sensory arbor network and observed that Sema7A-GPI fails to impart sensory axon guidance from a distance.

      Thank you for suggesting the rescue experiments. We are in the process of generating CRISPR/Cas9-mediated transcript-specific knockout animals. We are currently preparing another manuscript that incorporates the above-mentioned rescue experiments to dissect the role of each transcript in regulating arbor topology and synapse formation.

      For the overexpression experiments, expression of mKate alone (with and without heat shock) is also a critical control to include.

      Comments: We have incorporated two control experiments: (1) larvae injected with hsp70:sema7asec-mKate2 plasmid that were not heat shocked and (2) Uninjected larvae that were heatshocked. We think these two controls are sufficient to demonstrate that the abnormal arborization patterns are not artifacts generated due to plasmid injection and heatshocking.

      Issue 2: A second concern is the lack of data showing support cell and hair cell formation and function is unaffected. Analysis of support and hair cell number with loss of Sema7a as well as simple analyses of mechanotransduction (FM4-64) would help alleviate concerns that phenotypes are due to disrupted neuromast formation and basic hair cell function rather than a specific role for Sema7a in this process.

      Comments: We have measured the hair cell numbers in both control and sema7a-/- mutants across developmental stages. We have added this to our submitted raw data.

      We have utilized the styryl fluorophore FM4-64 to test the mechanotransduction function of the hair cells in sema7a-/- mutants. We have detailed our finding in lines 137141 and in Figure 2—figure supplement 1C,D.

      Expression analysis of Sema7a receptors would also help strengthen the argument for a specific effect on lateral line afferent axons.

      Comments: Thank you for this suggestion. Currently, we do not possess an RNA transcriptome dataset for the lateral line ganglion. This deficit limits a systematic screen for lateral-line sensory neuronal gene expressions either through antibody stains or via HCRmediated in situ techniques. In future we plan to develop an RNA transcriptome for the lateral-line ganglion and identify potential binding partners for Sema7A.

      Issue 3: The manuscript could also be improved to include more detail in some areas and less in others. In general, each section has a fairly long lead up but lacks important experimental details that would help the reader interpret the data. For example:

      Figure 1: What is the label for the lateral line axons? Is it a specific transgenic? The legend states that 3 asterisks indicate p<0.0001. What about the other asterisk combinations?

      Comments: We have clarified these issues in lines 118-121 and in lines 906-907.

      Figure 2: For the network analysis, are the traces for all axons that branch to innervate the neuromast?

      Comments: Yes, we have traced the entire arbor containing all the axons that branched from the lateral line nerve and extended toward the clustered hair cells. The three-dimensional traces depict a skeletonized representation of the arbor network.

      Can the tracing method distinguish individual axons?

      Comments: No, our goal is to understand how the axon-collective behave around the clustered hair cells during development.

      How do you know where an end is versus continued looping?

      Comments: We have categorically defined the topological attributes in lines 187-191 and in Figure 3A.

      Also, are all neuromasts similarly affected or is there a divergence based on which organ you are imaging? What neuromast was imaged in this and other figures?

      Comments: Yes, all the neuromasts in the trunk and tail regions were affected similarly by the sema7a mutation. We did not observe any region-specific phenotypic outcome. We consistently imaged the trunk neuromasts, particularly the second, third, and fourth neuromasts.

      Discussion: The short discussion failed to put these findings into context or to discuss how this unique topological arrangement of axon terminals impacts function.

      Comments: We have added a new segment, lines 432-448, in the discussion section which mentions the potential role of the topological features in arranging the distribution pattern of the postsynaptic densities and thereby potentially influencing the network’s ability to gather sensory inputs through properly placed postsynaptic aggregates.

      Can you speculate on how the looping structure may alter number of synaptic contacts per axon for instance? For this, it would be useful to know if normally the synapses form on loops versus bare terminals.

      Comments: Thank you for this insightful suggestion. We have performed detailed analysis, as mentioned in lines 384-397, to characterize the distribution of the postsynaptic densities between the two topological attributes.

      Does this looping facilitate single axons contacting more hair cells of the same polarity? Would that be beneficial?

      Comments: Looping behaviors indeed facilitate the contact between the axons and the hair cells. As we have observed, the primary topological attribute that the sensory arbor network underneath the clustered hair cells adopts is a loop. The bare terminals are predominantly projected transverse to the clustered hair cells and lack contact with them. Whether a single axon, being part of a loop, preferentially contacts hair cells of same polarity is yet to be determined. We can address this question by mosaic labeling a single axon in the arbor network and determine its association with the hair cells. We intend to do these experiments in our upcoming manuscript.

      Minor concerns:

      (1) For the stacked charts quantifying topological features, I found interpreting them challenging. Is it possible to put these into overlapping histograms or line graphs to better compare wild type to mutant directly?

      Comments: Thank you for your suggestion. We tried several ways to represent our data and found that the stacked charts optimally signify our analysis and depict the characteristic phenological differences between the control and the sema7a-/- mutants.

      (2) There are numerous strong statements throughout not directly supported by the data, e.g. lines 110-113; 206-208; 357-360 and others. These should be tempered.

      Comments: For lines 110-113, we have updated this section with new experiments and the new segment is represented in lines 115-126.

      For lines 206-208, we have updated the statement to “This result suggests that the stereotypical circuit topology observed in the mature organ may emerge through transition of individual arbors from forming bare terminals to forming closed loops encircling topological holes” in lines 225-227.

      Reviewer #2 (Recommendations For The Authors):

      The authors should be careful about making any assumptions which form of sema7a is active in NMs. Their RT-PCR demonstrates presence of both isoforms in a whole animal; however, whether they are similarly present in HCs is not investigated here.

      Comments: We have addressed this concern and have updated the manuscript with new experiments, detailed in lines 78-84.

      Also, there is an issue of translation and trafficking to the membrane with subsequent secretion. An important experiment that would address this question is expressing two sema7a isoforms in mutant HCs and asking whether this can suppress the mutant phenotype.

      Comments: Thank you for suggesting the rescue experiments. We are in the process of generating CRISPR/Cas9-mediated transcript-specific knockout animals. We are currently preparing another manuscript that incorporates the above-mentioned rescue experiments to dissect the role of each transcript in regulating arbor topology and synapse formation.

      Presumably, sema7a is trafficked to the membrane during HC maturation. This is consistent with the authors' observation that sema7a localization is changing as NM mature. However, actin-sema7a co-labeling does not actually show whether sema7a is on the membrane. Labeling HCs with a membrane marker (transgene) would be much more convincing. Alternatively, can the authors show sema7a localization actually correlates with the presence of sensory axon terminals? They already have immunos that label both. Thus, this should be pretty straightforward.

      Comments: Thank you for these suggestions. We have addressed these issues in lines 112114, and in lines 119-126.

      Figure 2 should have a control panel, so the reduced sema7a staining can be compared to the control side-by-side.

      Comments: We have depicted Sema7A staining in control neuromasts in multiple images, including Figure 1E, Figure 1H, and in Figure 2—figure supplement 1B. We have kept the control panel in the supplementary figure due to space restrictions in Figure 2.

      Arborization topology: While I appreciate the very careful characterization of the topology for wild-type and mutant NMs, I think it would be much more informative to mark individual axons and then analyze their topology. The main reason is that the authors cannot really distinguish whether some aspects of topology they describe are really due to the densely packed overlapping terminals of multiple axons or these are really characteristic, higher order organization of individual axons. Because of this, they cannot be certain what is really happening with sema7a mutant terminals. Related to the point above. While it is clear that the overall topology is abnormal in the mutant, the authors should be careful in concluding that sema7a regulates specific aspects of it. The overall structure is probably highly interconnected perturbing one parameter would likely affect all the others.

      Comments: Thank you for this comment. In a previous eLife publication from our laboratory, we utilized the serial blockface scanning electron micrograph (SBFSEM) technique to characterize the connectome of the neuromast microcircuit where patterns of innervation of all the individual axons can be delineated in five-days-old larvae (reference number 8). However, the collective behavior of all the sensory axons that build the innervation network remained enigmatic, especially in a living animal during development. In this paper we addressed how the sensory axon-collective behave around the clustered hair cells and build the innervation network in living animals during diverse developmental stages. Our analyses have not only identified how the axon-collective associates itself with the hair cell cluster as the organ matures, but also discovered distinct topological features in the arbor network that emerges during organ maturation, which may influence assembly of postsynaptic aggregates (lines 384-403, Figure 6G-I). We believe that our quantitative approach to capture collective axonal behaviors and their topological attributes during circuit formation have highlighted the importance of understanding network assembly during sensory organ development.

      Experiments with the secreted sema7a isoform would be much more informative if they were compared/contrasted to the GPI anchored isoform.

      Comments: We added a new section, lines 338-351, and a new Figure 5 to address this issue.

      The phenotype of ectopic projections in sema7a overexpression experiments is pretty dramatic, especially given the fact that these were performed in wild-type animals. Does this mean that the phenotype would be even more dramatic in sema7a mutants, as they have more bare axon terminals according to the authors' analysis. Have the authors attempted this type of experiments?

      Comments: That is an interesting suggestion. We have not tested that yet. Our guess is that in the sema7a-/- mutants, the abundant bare terminals will be far more sensitive to an ectopic source of Sema7A. But even in the sema7a-/- mutants, other chemotropic cues are still functional, which may impart certain restrictions on how many bare terminals are allowed to leave the neuromast region.

      Reviewer #3 (Recommendations For The Authors):

      (1) No raw data are shown, such that it is difficult to assess variability across animals or within animals, just the overall trends within the whole dataset. Raw data need to be shown for every measurement, at least in supplemental figures. It would also be useful to reliably show control next to mutant in the same plot, as it is a bit hard to compare across panels, which occurs in several figures.

      Comments: We have uploaded all the raw data related to each experiment.

      (2) Given the focus on the two possible forms of Sema7a, the authors should use HCR or another form of reliable in situ hybridization to show the spatiotemporal pattern of expression of each isoform.

      Comments: We have utilized the HCR™ RNA-FISH Technology to generate transcript specific probes. To generate transcript-specific HCR probes to distinctly detect the sema7aGPI (NM_001328508) and the sema7asec (NM_001114885) transcripts, Molecular Instruments could design only 11 probes against the sema7aGPI transcript and only one probe against the sema7asec transcript (personal correspondence with Mike Liu, PhD, Head of Operations and Product Development Lead Molecular Instruments, Inc.). The HCR probe against the sema7aGPI transcript showed a very faint signal. Unfortunately, the HCR probe against the sema7asec transcript failed to detect the presence of any transcript. For robust detection of transcripts, the protocol demands a minimum of 20 probes. We believe that the very low number of probes against our transcripts is the primary reason for the lack of a signal.

      (3) The authors should explain the criteria used to select the 22 embryos used to analyze the effects of expressing diffusible Sema7a.

      Comments: We have explained this in lines 291-292. We identified 22 mosaic sema7asecmKate2 integration events, in which a single mosaic ectopic integration had occurred near the network of sensory arbors, from a total of almost 100 integrations. We rejected events where the sema7asec-mKate2 integration occurred either farther away from the sensory arbor network or had happened in multiple neighboring cells.

      (4) Although arbors were imaged in live embryos, time is never presented as a variable, so I cannot tell whether axon topology was changing as the images were collected. This needs to be clarified.

      Comments: We imaged the trunk neuromasts of both control and sema7a-/- mutant live zebrsfish larvae at 2, 3, and 4 dpf. We imaged the control and the sema7a-/- mutants of each developmental stage in parallel, within a span of two hours, and repeated these experiments multiple times to gather almost a hundred larvae from each genotype. Even though the sensory arbor network is dynamic, we believe imaging both the genotypes in parallel and within a span of two hours, and averaging almost a hundred larvae from each genotype minimize the temporal variability observed in the arbor architecture.

      (5) Ideally, the authors should use CRISPR/cas-9 to create a mutation in the C-terminus that would prevent production of the GPI-anchored form and not of the diffusible form. I understand if this is too much work to do in a short time, and would be satisfied with another experiment that could distinguish roles for at least one isoform more clearly. For instance, it would be interesting to see an analysis of how far an axon can be from a source to detect diffusible Sema7a (live imaging would be ideal for this) and then to show that the effect is different when the membrane bound form is expressed.

      Comments: Thank you for this comment. We are currently working in generating transcript specific knockout animals.

      We have added live timelapse video microscopy data in lines 330-337, Figure 4H-J, Figure 4—figure supplement 2, Video15,16.

      We have added a new segment analyzing the membrane-bound transcript variant in lines 338-351.

      Reviewer #4 (Recommendations For The Authors):

      Feedback to authors

      Overall, this is a very important and novel study. Currently the manuscript does need revision.

      Major concerns:

      (1) Controls. For the ectoptic expression of Sema7a, injection of a construct expressing Sema7a under a heatshock promoter is used to drive ectopic expression. No heatshock (injected) animal are used as a control. In many systems heatshock can impact neuron morphology. And heatshock proteins are required for normal neurite and synapse formation. Please examine sensory axons in uninjected wildtype animals with heatshock.

      Comments: We have added this control experiment in a new segment, explained in detail in lines 348-350 and Figure 5.

      (2) Synapse staining - regarding Figure 5 and related supplement

      Understanding whether guidance defects ultimately impact synapse formation is an important aspect of this paper. Therefore, is necessary to have accurate measurements of the number of complete synapses, and the overall numbers of pre- and postsynaptic components. Currently the data plotted in Figure 5 is extensive, but the way the data is laid out, the relevant comparisons are challenging to make. Perhaps include this quantification in the supplement, and move the data from the supplement to the main figure? The quantifications in the supplement are easier to follow and easier to compare between genotypes.

      Comments: We have performed exhaustive analysis on the formation of pre- and postsynaptic structures and have identified how their distribution changes along neuromast development in control larvae. We have further analyzed how such distribution is perturbed in the sema7a-/- mutants. We believe that showing only the average numbers will not reveal the changes in the distribution of the synaptic structures during development and across genotypes.

      Looking at the data itself, there seems to be some discrepancies with the synaptic counts compared to published work. While the CTBP numbers seem in order, the Maguk numbers do not. In both mutant and control there are many hair cells without any Maguk puncta/aggregates-leading to 0.75-1 postsynapses per hair cell (Figure 5 supplement H-I). Typically, the numbers should be more comparable to what was obtained for CTBP, 3-4 puncta per cells (Figure 5 supplement B-C), especially by 3-4 dpf. 3-4 CTPB or Maguk puncta per cell is based on previously published immunostaining and EM work.

      The Maguk immunostaining, especially at early stages (2-3 dpf) is challenging. To compound a challenging immunostain, around 2019 Neuromab began to outsource the purification of their Maguk antibody. After this outsourcing our lab was no longer able to get reliable label with the Maguk antibody from Neuromab.

      Millipore sells the same monoclonal antibody and it works well: https://www.emdmillipore.com/US/en/product/Anti-pan-MAGUK-Antibody-clone-K2886,MM_NF-MABN72

      I would recommend this source.

      Comments: Thank you for suggesting the new MAGUK antibody. We have utilized this new MAGUK antibody from Millipore and added a new segment in lines 389-408. In future publication we will utilize this antibody to capture the postsynaptic densities in the sensory arbors.

      The discrepancies in the postsynaptic punctae number in our control larvae may arise due to the reliability of the Neuromab MAGUK antibody. We have utilized this same antibody to stain the sema7a-/- mutants and have observed a significant decrease in MAGUK punctae number and area. On grounds of keeping parity between the control and the sema7a-/- mutants, we have decided to keep our experimental results in the manuscript.

      In addition to a more accurate Maguk label, a combined pre- and post-synaptic label is essential to understand whether synapses pair properly in the sema7a mutants. This can be accomplished using subtype specific antibodies using goat anti-mouse IgG1/Maguk and goat anti-mouse IgG2a/CTBP secondaries.

      Comments: Thank you for suggesting this. We are preparing another manuscript in which we will utilize this technique along with other suggestions to tease apart the role of distinct transcript variants in regulating neural guidance and synapse formation.

      (3) Does sema7a lesion impact the number of hair cells per neuromast? If hair cell numbers are reduced several of the quantifications could be impacted.

      Comments: We have added the raw data with the hair cell counts in both control and sema7a-/- mutants across developmental stages. The homozygous sema7a-/- mutants have slightly less hair cells and we have normalized all our topological analyses by the corresponding hair cell numbers for each neuromast in each experiment (lines 669-675).

      (4) Could innervation just be developmentally delayed in sema7a mutants? At 4 dpf the sensory system is just starting to come online and could still be in the process of refinement. Did you look at slightly older ages, after the sensory system is functional behaviorally, for example, 6 dpf? Do the cores phenotypes (synapse defects and excess arbors) persist at 6 dpf in the sema7a mutants?

      Comments: The homozygous sema7a-/- mutants are unviable and start to die at 6 dpf. We therefore restricted our analysis until 4 dpf. The association of the sensory arbors with the clustered hair cells gradually decreases as the neuromasts mature from 2 dpf to 4dpf in the sema7a-/- mutants (lines 174-176, Figure 2I). Moreover, in the sema7a-/- mutants the sensory axons throw long projections that keep getting farther away from the clustered hair cells as the neuromast matures from 2 dpf to 4 dpf (lines 166-168, Figure 2H; Figure 2—figure supplement 1K,L). These observations suggests that if the phenotypes in the sema7a-/- mutants were due to developmental delays, then we should have seen a recovery of disrupted arborization patterns over time. But instead, we observe a further deterioration of the arborization patterns and other architectural assemblies. These findings confirm that the observed phenotypes in the sema7a-/- mutants are not due to delayed development of the larvae, but a specific outcome for the loss of Sema7A protein.

      Minor comments to address:

      Results

      Page 4 lines 89-91. For the readers, explain why you examined levels in Sema7a in rostral and caudal hair cells. Also, this sentence is, in general, a little bit misleading-initially reading that there is no difference in Sema7a at 1.5-4 dpf.

      Comments: In lines 44-48, we explain that the hair cells in the neuromast contain mechanoreceptive hair cells of opposing polarities that help them detect water currents from opposing directions. In lines 93-106, we tested whether the Sema7A level varies between the two polarities. We observed that the Sema7A level is similar between the two polarities of hair cells, but the average Sema7A intensity increases significantly over the developmental period of 2 dpf to 4 dpf in both rostrally and caudally polarized hair cells.

      Page 10-11 Lines 263-270. What was the frequency of these 2 outcomes- out of the 22 cases with ectopic expression?

      Comments: We have explained this in lines 291-292. We identified 22 mosaic sema7asecmKate2 integration events, in which a single mosaic ectopic integration had occurred near the network of sensory arbors, from a total of almost 100 integrations. We rejected events where the sema7asec-mKate2 integration occurred either farther away from the sensory arbor network or had happened in multiple neighboring cells.

      Discussion

      Page 14 Lines 359-360. There is not enough evidence provided in this work to suggest that the membrane attached form of Sema7a is playing a role. Both the secreted and membrane form are gone in the sema7a mutants. If the membrane attached form was specifically lesioned, and resulted in a phenotype, then there would be sufficient evidence. Currently there is strong evidence for a distinct role for the secreted form. Although the authors qualify the outlined statement with the word 'likely', stating this possibility in the discussion take-home is misleading.

      Comments: In future we will utilize the CRISPR/Cas9 technique to target the unique Cterminal domain of the GPI-anchored sema7a transcript variant. We believe that this will only perturb the formation of the full-length Sema7A protein and help us differentiate between the roles of the membrane-bound Sema7AGPI molecule and the secreted Sema7Asec in sensory arborization and synaptic assembly.

      It might be interesting in either the intro or discussion to reference the role Sema3F in axon guidance in the mouse auditory epithelium. https://elifesciences.org/articles/07830

      Comments: We have added this reference in lines 61-64.

      Figures

      Please indicate on one of your Figures where the mutation is (roughly) in the sema7a mutant (in addition to stating it in the results).

      Comments: We have added this information in Figure 2—figure supplement 1A.

      Either state or indicate in a Figure where the epitope used to make the Sema7a antibody-to show that the antibody is predicted to recognize both isoforms.

      Comments: We have stated the details of the epitope in lines 528-529.

      Figure 2-S1 what is the scale in panel A, is it different between mutant and wildtype?

      Comments: We have updated the images. New images are depicted in Figure 2—figure supplement 1A.

      Methods

      What were the methods used to quantify synapse number and area?

      Comments: We have added a new section in lines 702-708 to explain the measurement techniques.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This article presents important results describing how the gathering, integration, and broadcasting of information in the brain changes when consciousness is lost either through anesthesia or injury. They provide convincing evidence to support their conclusions, although the paper relies on a single analysis tool (partial information decomposition) and could benefit from a clearer explication of its conceptual basis, methodology, and results. The work will be of interest to both neuroscientists and clinicians interested in fundamental and clinical aspects of consciousness.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Luppi et al., apply the recently developed integrated information decomposition to the question how the architecture of information processing changes when consciousness is lost. They explore fMRI data from two different populations: healthy volunteers undergoing reversible anesthesia, as well as from patients who have long-term disorders of consciousness. They show that, in both populations, synergistic integration of information is disrupted in common ways. These results are interpreted in the context of the SAPHIRE model (recently proposed by this same group), that describes information processing in the brain as being composed of several distinct steps: 1) gatekeeping (where gateway regions introduce sensory information to the global synergistic workspace where 2) it is integrated or "processed" before 3) by broadcast back to to the brain.

      I think that this paper is an excellent addition to the literature on information theory in neuroscience, and consciousness science specifically. The writing is clear, the figures are informative, and the authors do a good job of engaging with existing literature. While I do have some questions about the interpretations of the various information-theoretic measures, all in all, I think this is a significant piece of science that I am glad to see added to the literature.

      One specific question I have is that I am still a little unsure about what "synergy" really is in this context. From the methods, it is defined as that part of the joint mutual information that is greater than the maximum marginal mutual information. While this is a perfectly fine mathematical measure, it is not clear to me what that means for a squishy organ like the brain. What should these results mean to a neuro-biologist or clinician?

      Right now the discussion is very high level, equating synergy to "information processing" or "integrated information", but it might be helpful for readers not steeped in multivariate information theory to have some kind of toy model that gets worked out in detail. On page 15, the logical XOR is presented in the context of the single-target PID, but 1) the XOR is discrete, while the data analyzed here are continuous BOLD signals w/ Gaussian assumptions and 2) the XOR gate is a single-target system, while the power of the Phi-ID approach is the multi-target generality. Is there a Gaussian analog of the single-target XOR gate that could be presented? Or some multi-target, Gaussian toy model with enough synergy to be interesting? I think this would go a long way to making this work more accessible to the kind of interdisciplinary readership that this kind of article with inevitably attract.

      We appreciate this observation. We now clarify that:

      “redundancy between two units occurs when their future spontaneous evolution is predicted equally well by the past of either unit. Synergy instead occurs when considering the two units together increases the mutual information between the units’ past and their future – suggesting that the future of each is shaped by its interactions with the other. At the microscale (e.g., for spiking neurons) this phenomenon has been suggested as reflecting “information modification” 36,40,47. Synergy can also be viewed as reflecting the joint contribution of parts of the system to the whole, that is not driven by common input48.”

      In the Methods, we have also added the following example to provide additional intuition about synergy in the case of continuous rather than discrete variables:

      “As another example for the case of Gaussian variables (as employed here), consider a 2-node coupled autoregressive process with two parameters: a noise correlation c and a coupling parameter a. As c increases, the system is flooded by “common noise”, making the system increasingly redundant because the common noise “swamps” the signal of each node. As a increases, each node has a stronger influence both on the other and on the system as a whole, and we expect synergy to increase. Therefore, synergy reflects the joint contribution of parts of the system to the whole that is not driven by common noise. This has been demonstrated through computational modelling (Mediano et al 2019 Entropy).”

      See below for the relevant parts of Figures 1 and 2 from Mediano et al (2019 Entropy), where Psi refers to the total synergy in the system.

      Author response image 1.

      Strengths

      The authors have a very strong collection of datasets with which to explore their topic of interest. By comparing fMRI scans from patients with disorders of consciousness, healthy resting state, and various stages of propofol anesthesia, the authors have a very robust sample of the various ways consciousness can be perturbed, or lost. Consequently, it is difficult to imagine that the observed effects are merely a quirk of some biophysical effect of propofol specifically, or a particular consequence of long-term brain injury, but do in fact reflect some global property related to consciousness. The data and analyses themselves are well-described, have been previously validated, and are generally strong. I have no reason to doubt the technical validity of the presented results.

      The discussion and interpretation of these results is also very nice, bringing together ideas from the two leading neurocognitive theories of consciousness (Global Workspace and Integrated Information Theory) in a way that feels natural. The SAPHIRE model seems plausible and amenable to future research. The authors discuss this in the paper, but I think that future work on less radical interventions (e.g. movie watching, cognitive tasks, etc) could be very helpful in refining the SAPHIRE approach.

      Finally, the analogy between the PID terms and the information provided by each eye redundantly, uniquely, and synergistically is superb. I will definitely be referencing this intuition pump in future discussions of multivariate information sharing.

      We are very grateful for these positive comments, and for the feedback on our eye metaphor.

      Weaknesses

      I have some concerns about the way "information processing" is used in this study. The data analyzed, fMRI BOLD data is extremely coarse, both in spatial and temporal terms. I am not sure I am convinced that this is the natural scale at which to talk about information "processing" or "integration" in the brain. In contrast to measures like sample entropy or Lempel-Ziv complexity (which just describe the statistics of BOLD activity), synergy and Phi are presented here as quasi-causal measures: as if they "cause" or "represent" phenomenological consciousness. While the theoretical arguments linking integration to consciousness are compelling, is this is right data set to explore them in? For example, the work by Newman, Beggs, and Sherril (nee Faber), synergy is associated with "computation" performed in individual neurons: the information about the future state of a target neuron that is only accessible when knowing both inputs (analogous to the synergy in computing the sum of two dice). Whether one thinks that this is a good approach neural computation or not, it fits within the commonly accepted causal model of neural spiking activity: neurons receive inputs from multiple upstream neurons, integrate those inputs and change their firing behavior accordingly.

      In contrast, here, we are looking at BOLD data, which is a proxy measure for gross-scale regional neural activity, which itself is a coarse-graining of millions of individual neurons to a uni-dimensional spectrum that runs from "inactive to active." It feels as though a lot of inferences are being made from very coarse data.

      We appreciate the opportunity to clarify this point. It is not our intention to claim that Phi-R and synergy, as measured at the level of regional BOLD signals, represent a direct cause of consciousness, or are identical to it. Rather, our work is intended to use these measures similarly to the use of sample entropy and LZC for BOLD signals: as theoretically grounded macroscale indicators, whose empirical relationship to consciousness may reveal the relevant underlying phenomena. In other words, while our results do show that BOLD-derived Phi-R tracks the loss and recovery of consciousness, we do not claim that they are the cause of it: only that an empirical relationship exists, which is in line with what we might expect on theoretical grounds. We have now clarified this in the Limitations section of our revised manuscript, as well as revising our language accordingly in the rest of the manuscript.

      We also clarify that the meaning of “information processing” that we adopt pertains to “intrinsic” information that is present in the system’s spontaneous dynamics, rather than extrinsic information about a task:

      “Information decomposition can be applied to neural data from different scales, from electrophysiology to functional MRI, with or without reference to behaviour 34. When behavioural data are taken into account, information decomposition can shed light on the processing of “extrinsic” information, understood as the translation of sensory signals into behavioural choices across neurons or regions 41,43,45,47. However, information decomposition can also be applied to investigate the “intrinsic” information that is present in the brain’s spontaneous dynamics in the absence of any tasks, in the same vein as resting-state “functional connectivity” and methods from statistical causal inference such as Granger causality 49. In this context, information processing should be understood in terms of the dynamics of information: where and how information is stored, transferred, and modified 34.”

      References:

      (1) Newman, E. L., Varley, T. F., Parakkattu, V. K., Sherrill, S. P. & Beggs, J. M. Revealing the Dynamics of Neural Information Processing with Multivariate Information Decomposition. Entropy 24, 930 (2022).

      Reviewer #2 (Public Review):

      The authors analysed functional MRI recordings of brain activity at rest, using state-of-the-art methods that reveal the diverse ways in which the information can be integrated in the brain. In this way, they found brain areas that act as (synergistic) gateways for the 'global workspace', where conscious access to information or cognition would occur, and brain areas that serve as (redundant) broadcasters from the global workspace to the rest of the brain. The results are compelling and consisting with the already assumed role of several networks and areas within the Global Neuronal Workspace framework. Thus, in a way, this work comes to stress the role of synergy and redundancy as complementary information processing modes, which fulfill different roles in the big context of information integration.

      In addition, to prove that the identified high-order interactions are relevant to the phenomenon of consciousness, the same analysis was performed in subjects under anesthesia or with disorders of consciousness (DOC), showing that indeed the loss of consciousness is associated with a deficient integration of information within the gateway regions.

      However, there is something confusing in the redundancy and synergy matrices shown in Figure 2. These are pair-wise matrices, where the PID was applied to identify high-order interactions between pairs of brain regions. I understand that synergy and redundancy are assessed in the way the brain areas integrate information in time, but it is still a little contradictory to speak about high-order in pairs of areas. When talking about a "synergistic core", one expects that all or most of the areas belonging to that core are simultaneously involved in some (synergistic) information processing, and I do not see this being assessed with the currently presented methodology. Similarly, if redundancy is assessed only in pairs of areas, it may be due to simple correlations between them, so it is not a high-order interaction. Perhaps it is a matter of language, or about the expectations that the word 'synergy' evokes, so a clarification about this issue is needed. Moreover, as the rest of the work is based on these 'pair-wise' redundancy and synergy matrices, it becomes a significative issue.

      We are grateful for the opportunity to clarify this point. We should highlight that PhiID is in fact assessing four variables: the past of region X, the past of region B, the future of region X, and the future of region Y. Since X and Y each feature both in the past and in the future, we can re-conceptualise the PhiID outputs as reflecting the temporal evolution of how X and Y jointly convey information: the persistent redundancy that we consider corresponds to information that is always present in both X and Y; whereas the persistent synergy is information that X and Y always convey synergistically. In contrast, information transfer would correspond to the phenomenon whereby information was conveyed by one variable in the past, and by the other in the future (see Luppi et al., 2024 TICS; and Mediano et al., 2021 arXiv for more thorough discussions on this point). We have now added this clarification in our Introduction and Results, as well as adding the new Figure 2 to clarify the meaning of PhiID terms.

      We would also like to clarify that all the edges that we identify as significantly changing are indeed simultaneously involved in the difference between consciousness and unconsciousness. This is because the Network-Based Statistic differs from other ways of identifying edges that are significantly different between two groups or conditions, because it does not consider edges in isolation, but only as part of a single connected component.

      Reviewer #3 (Public Review):

      The work proposes a model of neural information processing based on a 'synergistic global workspace,' which processes information in three principal steps: a gatekeeping step (information gathering), an information integration step, and finally, a broadcasting step. The authors determined the synergistic global workspace based on previous work and extended the role of its elements using 100 fMRI recordings of the resting state of healthy participants of the HCP. The authors then applied network analysis and two different measures of information integration to examine changes in reduced states of consciousness (such as anesthesia and after-coma disorders of consciousness). They provided an interpretation of the results in terms of the proposed model of brain information processing, which could be helpful to be implemented in other states of consciousness and related to perturbative approaches. Overall, I found the manuscript to be well-organized, and the results are interesting and could be informative for a broad range of literature, suggesting interesting new ideas for the field to explore. However, there are some points that the authors could clarify to strengthen the paper. Key points include:

      (1) The work strongly relies on the identification of the regions belonging to the synergistic global workspace, which was primarily proposed and computed in a previous paper by the authors. It would be great if this computation could be included in a more explicit way in this manuscript to make it self-contained. Maybe include some table or figure being explicit in the Gradient of redundancy-to-synergy relative importance results and procedure.

      We have now added the new Supplementary Figure 1 to clarify how the synergistic workspace is identified, as per Luppi et al (2022 Nature Neuroscience).

      (2) It would be beneficial if the authors could provide further explanation regarding the differences in the procedure for selecting the workspace and its role within the proposed architecture. For instance, why does one case uses the strength of the nodes while the other case uses the participation coefficient? It would be interesting to explore what would happen if the workspace was defined directly using the participation coefficient instead of the strength. Additionally, what impact would it have on the procedure if a different selection of modules was used? For example, instead of using the RSN, other criteria, such as modularity algorithms, PCA, Hidden Markov Models, Variational Autoencoders, etc., could be considered. The main point of my question is that, probably, the RSN are quite redundant networks and other methods, as PCA generates independent networks. It would be helpful if the authors could offer some comments on their intuition regarding these points without necessarily requiring additional computations.

      We appreciate the opportunity to clarify this point. Our rationale for the procedure used to identify the workspace is to find regions where synergy is especially prominent. This is due to the close mathematical relationship between synergistic information and integration of information (see also Luppi et al., 2024 TICS), which we view as the core function of the global workspace. This identification is based on the strength ranking, as per Luppi et al (2022 Nature Neuroscience), which demonstrated that regions where synergy predominates (i.e., our proposed workspace) are also involved with high-level cognitive functions and anatomically coincide with transmodal association cortices at the confluence of multiple information streams. This is what we should expect of a global workspace, which is why we use the strength of synergistic interactions to identify it, rather than the participation coefficient. Subsequently, to discern broadcasters from gateways within the synergistic workspace, we seek to encapsulate the meaning of a “broadcaster” in information terms. We argue that this corresponds with making the same information available to multiple modules. Sameness of information corresponds to redundancy, and multiplicity of modules can be reflected in the network-theoretic notion of participation coefficient. Thus, a broadcaster is a region in the synergistic workspace (i.e., a region with strong synergistic interactions) that in addition has a high participation coefficient for its redundant interactions.

      Pertaining specifically to the use of resting-state networks as modules, indeed our own (Luppi et al., 2022 Nature Neuroscience) and others’ research has shown that each RSN entertains primarily redundant interactions among its constituent regions. This is not surprising, since RSNs are functionally defined: their constituent elements need to process the same information (e.g., pertaining to a visual task in case of the visual network). We used the RSNs as our definition of modules, because they are widely understood to reflect the intrinsic organisation of brain activity into functional units; for example, Smith et al., (2009 PNAS) and Cole et al (2014 Neuron) both showed that RSNs reflect task-related co-activation of regions, whether directly quantified from fMRI in individuals performing multiple tasks, or inferred from meta-analysis of the neuroimaging literature. This is the aspect of a “module” that matters from the global workspace perspective: modules are units with distinct function, and RSNs capture this well. This is therefore why we use the RSNs as modules when defining the participation coefficient: they provide an a-priori division into units with functionally distinct roles.

      Nonetheless, we also note that RSN organisation is robustly recovered using many different methods, including seed-based correlation from specific regions-of-interest, or Independent Components Analysis, or community detection on the network of inter-regional correlations - demonstrating that they are not merely a function of the specific method used to identify them. In fact, we show significant correlation between participation coefficient defined in terms of RSNs, and in terms of modules identified in a purely data-driven manner from Louvain consensus clustering (Figure S4).

      (3) The authors acknowledged the potential relevance of perturbative approaches in terms of PCI and quantification of consciousness. It would be valuable if the authors could also discuss perturbative approaches in relation to inducing transitions between brain states. In other words, since the authors investigate disorders of consciousness where interventions could provide insights into treatment, as suggested by computational and experimental works, it would be interesting to explore the relationship between the synergistic workspace and its modifications from this perspective as well.

      We thank the Reviewer for bringing this up: we now cite several studies that in recent years have applied perturbative approaches to induce transitions between states of consciousness.

      “The PCI is used as a means of assessing the brain’s current state, but stimulation protocols can also be adopted to directly induce transitions between states of consciousness. In rodents, carbachol administration to frontal cortex awakens rats from sevoflurane anaesthesia120, and optogenetic stimulation was used to identify a role of central thalamus neurons in controlling transitions between states of responsiveness121,122. Additionally, several studies in non-human primates have now shown that electrical stimulation of the central thalamus can reliably induce awakening from anaesthesia, accompanied by the reversal of electrophysiological and fMRI markers of anaesthesia 123–128. Finally, in human patients suffering from disorders of consciousness, stimulation of intra-laminar central thalamic nuclei was reported to induce behavioural improvement 129, and ultrasonic stimulation 130,131 and deep-brain stimulation are among potential therapies being considered for DOC patients 132,133. It will be of considerable interest to determine whether our corrected measure of integrated information and topography of the synergistic workspace also restored by these causal interventions.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I would appreciate it if the authors could revisit the figures and make sure that:

      (1) All fonts are large enough to be readable for people with visual impairments (for ex. the ranges on the colorbars in Fig. 2 are unreadably small).

      Thank you: we have increased font sizes.

      (2) The colormaps are scaled to show meaningful differences (Fig. 2A)

      We have changed the color scale in Figure 2A and 2B.

      Also, the authors may want to revisit the references section: some of the papers that were pre-prints at one point have now been published and should be updated.

      Thank you: we have updated our references.

      Minor comments:

      • In Eqs. 2 and 3, the unique information term uses the bar notation ( | ) that is typically indicative of "conditioned on." Perhaps the authors could use a slash notation (e.g. Unq(X ; Z / Y)) to avoid this ambiguity? My understanding of the Unique information is that it is not necessarily "conditioned on", so much as it is "in the context of".

      Indeed, the “|” sign of “conditioning” could be misleading; however, the “/” sign could also be misleading, if interpreted as division. Therefore, we have opted for the “\” sign of “set difference”, in Eq 2 and 3, which is conceptually more appropriate in this context.

      • The font on the figures is a little bit small - for readers with poor eyes, it might be helpful to increase the wording size.

      We have increased font sizes in the figures where relevant.

      • I don't quite understand what is happening in Fig. 2A - perhaps it is a colormap issue, but it seems as though it's just a bit white square? It looks like redundancy is broadly correlated with FC (just based on the look of the adjacency matrices), but I have no real sense of what the synergistic matrix looks like, other than "flat."

      We have now changed the color scale in Figure 2.

      Reviewer #2 (Recommendations For The Authors):

      Besides the issues mentioned in the Public review, I have the following suggestions to improve the manuscript:

      • At the end of the introduction, a few lines could be added explaining why the study of DOC patients and subjects under anesthesia will be informative in the context of this work.

      By comparing functional brain scans from transient anaesthetic-induced unconsciousness and from the persistent unconsciousness of DOC patients, which arises from brain injury, we can search for common brain changes associated with loss of consciousness – thereby disambiguating what is specific to loss of consciousness.

      • On page and in general the first part of Results, it is not evident that you are working with functional connectivity. Many times the word 'connection' is used and sometimes I was wondering whether they were structural or functional. Please clarify. Also, the meaning of 'synergistic connection' or 'redundant connection' could be explained in lay terms.

      Thank you for bringing this up. We have now replaced the word “connection” with “interaction” to disambiguate this issue, further adding “functional” where appropriate. We have also provided, in the Introduction, an intuitive explanation of what synergy and redundancy mean int he context of spontaneous fMRI signals.

      • Figure 2 needs a lot of improvement. The matrix of synergistic interactions looks completely yellow-ish with some vague areas of white. So everything is above 2. What does it mean?? Pretty uninformative. The matrix of redundant connections looks a lot of black, with some red here and there. So everything is below 0.6. Also, what are the meaning and units of the colorbars?.

      We agree: we have increased font sizes, added labels, and changed the color scale in Figure 2. We hope that the new version of Figure 2 will be clearer.

      • Caption of Figure 2 mentions "... brain regions identified as belonging to the synergistic global workspace". I didn't get it clear how do you define these areas. Are they just the sum of gateways and broadcasters, or is there another criterion?

      Regions belonging to the synergistic workspace are indeed the set comprising gateways and broadcasters; they are the regions that are synergy-dominated, as defined in Luppi et al., 2022 Nature Neuroscience. We have now clarified this in the figure caption.

      • In the first lines of page 7, it is said that data from DOC and anesthesia was parcellated in 400 + 54 regions. However, it was said in a manner that made me think it was a different parcellation than the other data. Please make it clear that the parcellation is the same (if it is).

      We have now clarified that the 400 cortical regions are from the Schaefer atlas, and 54 subcortical regions from the Tian atlas, as for the other analysis. The only other parcellation that we use is the Schaefer-232, for the robustness analysis. This is also reported in the Methods.

      • Figure 3: the labels in the colorbars cannot be read, please make them bigger. Also, the colorbars and colorscales should be centered in white, to make it clear that red is positive and blue is negative. O at least maintain consistency across the panels (I can't tell because of the small numbers).

      Thank you: we have increased font sizes, added labels, indicated that white refers to zero (so that red is always an increase, and blue is always a decrease), and changed the color scale in Figure 2.

      • The legend of Figure 4 is written in a different style, interpreting the figure rather than describing it. Please describe the figure in the caption, in order to let the read know what they are looking at.

      We have endeavoured to rewrite the legend of Figure 4 in a style that is more consistent with the other figures.

      • In several parts the 'whole-minus-sum' phi measure is mentioned and it is said that it did not decrease during loss of consciousness. However, I did not see any figure about that nor any conspicuous reference to that in Results text. Where is it?

      We apologise for the confusion: this is Figure S3A, in the Supplementary. We have now clarified this in the text.

      Reviewer #3 (Recommendations For The Authors):

      (1) In the same direction, regarding Fig. 2, in my opinion, it does not effectively aid in understanding the selection of regions as more synergistic or redundant. In panels A) and B), the color scales could be improved to better distinguish regions in the matrices (panel A) is saturated at the upper limit, while panel B) is saturated at the lower limit). Additionally, I suggest indicating in the panels what is being measured with the color scales.

      Thank you: we have increased font sizes, added labels, and changed the color scale in Figure 2.

      (2) When investigating the synergistic core of human consciousness and interpreting the results of changes in information integration measures in terms of the proposed framework, did the authors consider the synergistic workspace computed in HCP data? If the answer is positive, it would be helpful for the authors to be more explicit about it and elaborate on any differences that may be found, as well as the potential impact on interpretation.

      This is correct: the synergistic workspace, including gateways and broadcasters, are identified from the Human Connectome Project dataset. We now clarify this in the manuscript.

      Minors:

      (1) I would suggest improving the readability of figures 2 and 3, considering font size (letters and numbers) and color bars (numbers and indicate what is measured with this scale). In Figure 1, the caption defines steps instead stages that are indicated in the figure.

      Thank you: we have increased font sizes, added labels, and replaced steps with “stages” in Figure 1.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Qin et al. set out to investigate the role of mechanosensory feedback during swallowing and identify neural circuits that generate ingestion rhythms. They use Drosophila melanogaster swallowing as a model system, focusing their study on the neural mechanisms that control cibarium filling and emptying in vivo. They find that pump frequency is decreased in mutants of three mechanotransduction genes (nompC, piezo, and Tmc), and conclude that mechanosensation mainly contributes to the emptying phase of swallowing. Furthermore, they find that double mutants of nompC and Tmc have more pronounced cibarium pumping defects than either single mutants or Tmc/piezo double mutants. They discover that the expression patterns of nompC and Tmc overlap in two classes of neurons, md-C and md-L neurons. The dendrites of md-C neurons warp the cibarium and project their axons to the subesophageal zone of the brain. Silencing neurons that express both nompC and Tmc leads to severe ingestion defects, with decreased cibarium emptying. Optogenetic activation of the same population of neurons inhibited filling of the cibarium and accelerated cibarium emptying. In the brain, the axons of nompC∩Tmc cell types respond during ingestion of sugar but do not respond when the entire fly head is passively exposed to sucrose. Finally, the authors show that nompC∩Tmc cell types arborize close to the dendrites of motor neurons that are required for swallowing, and that swallowing motor neurons respond to the activation of the entire Tmc-GAL4 pattern.

      Strengths:

      • The authors rigorously quantify ingestion behavior to convincingly demonstrate the importance of mechanosensory genes in the control of swallowing rhythms and cibarium filling and emptying

      • The authors demonstrate that a small population of neurons that express both nompC and Tmc oppositely regulate cibarium emptying and filling when inhibited or activated, respectively

      • They provide evidence that the action of multiple mechanotransduction genes may converge in common cell types

      Thank you for your insightful and detailed assessment of our work. Your constructive feedback will help to improve our manuscript.

      Weaknesses:

      • A major weakness of the paper is that the authors use reagents that are expressed in both md-C and md-L but describe the results as though only md-C is manipulated-Severing the labellum will not prevent optogenetic activation of md-L from triggering neural responses downstream of md-L. Optogenetic activation is strong enough to trigger action potentials in the remaining axons. Therefore, Qin et al. do not present convincing evidence that the defects they see in pumping can be specifically attributed to md-C.

      Thank you for your comments. This is important point that we did not adequately address in the original preprint. We have obtained imaging and behavioral results that strongly suggest md-C, rather than md-L, are essential for swallowing behavior.

      36 hours after the ablation of the labellum, the signals of md-L were hardly observable when GFP expression was driven by the intersection between Tmc-GAL4 & nompC-QF (see F Figure 3—figure supplement 1A). This observation indicates that the axons of md-L likely degenerated after 36 hours, and were unlikely to influence swallowing. Moreover, the projecting pattern of Tmc-GAL4 & nompC-QF>>GFP exhibited no significant changes in the brain post labellum ablation.

      Furthermore, even after labellum ablation for 36 hours, flies exhibited responses to light stimulation (see Figure 3—figure supplement 1B-C, Video 5) when ReaChR was expressed in md-C. We thus reasoned that md-C but not md-L, plays a crucial role in the swallowing process.

      • GRASP is known to be non-specific and prone to false positives when neurons are in close proximity but not synaptically connected. A positive GRASP signal supports but does not confirm direct synaptic connectivity between md-C/md-L axons and MN11/MN12.

      In this study, we employed the nSyb-GRASP, wherein the GRASP is expressed at the presynaptic terminals by fusion with the synaptic marker nSyb. This method demonstrates an enhanced specificity compared to the original GRASP approach.

      Additionally, we utilized +/ UAS-nSyb-spGFP1-10, lexAop-CD4-spGFP11 ; + / MN-LexA fruit flies as a negative control to mitigate potential false signals originating from the tool itself (Author response image 1, scale bar = 50μm). Beside the genotype Tmc-Gal4, Tub(FRT. Gal80) / UAS-nSyb-spGFP1-10, lexAop-CD4-spGFP11 ; nompC-QF, QUAS-FLP / MN-LexA fruit flies discussed in this manuscript, we also incorporated genotype Tmc-Gal4, Tub(FRT. Gal80) / lexAop-nSyb-spGFP1-10, UAS-CD4-spGFP11 ; nompC-QF, QUAS-FLP / MN-LexA fruit flies as a reverse control (Author response image 2). Unexpectedly, similar positive signals were observed, indicating that, positive signals may emerge due to close proximity between neurons even with nSyb-GRASP.

      Author response image 1

      It should be noted that the existence of synaptic projections from motor neurons (MN) to md-C cannot be definitively confirmed at this juncture. At present, we can only posit the potential for synaptic connections between md-C and motor neurons. A more conclusive conclusion may be attainable with the utilization of comprehensive whole-brain connectome data in future studies.

      Author response image 2

      • As seen in Figure 2—figure supplement 1, the expression pattern of Tmc-GAL4 is broader than md-C alone. Therefore, the functional connectivity the authors observe between Tmc expressing neurons and MN11 and 12 cannot be traced to md-C alone

      It is true that the expression pattern of Tmc-GAL4 is broader than that of md-C alone. Our experiments, including those flies expressing TNT in Tmc+ neurons, demonstrated difficulties in emptying (Figure 2A, 2D). Notably, we encountered challenges in finding fly stocks bearing UAS>FRT-STOP-P2X2. Consequently, we opted to utilize Tmc-GAL4 to drive UAS-P2X2 instead. We believe that the results further support our hypothesis on the role of md-C in the observed behavioral change in emptying.

      Overall, this work convincingly shows that swallowing and swallowing rhythms are dependent on several mechanosensory genes. Qin et al. also characterize a candidate neuron, md-C, that is likely to provide mechanosensory feedback to pumping motor neurons, but the results they present here are not sufficient to assign this function to md-C alone. This work will have a positive impact on the field by demonstrating the importance of mechanosensory feedback to swallowing rhythms and providing a potential entry point for future investigation of the identity and mechanisms of swallowing central pattern generators.

      Reviewer #2 (Public Review):

      In this manuscript, the authors describe the role of cibarial mechanosensory neurons in fly ingestion. They demonstrate that pumping of the cibarium is subtly disrupted in mutants for piezo, TMC, and nomp-C. Evidence is presented that these three genes are co-expressed in a set of cibarial mechanosensory neurons named md-C. Silencing of md-C neurons results in disrupted cibarial emptying, while activation promotes faster pumping and/or difficulty filling. GRASP and chemogenetic activation of the md-C neurons is used to argue that they may be directly connected to motor neurons that control cibarial emptying.

      The manuscript makes several convincing and useful contributions. First, identifying the md-C neurons and demonstrating their essential role for cibarium emptying provides reagents for further studying this circuit and also demonstrates the important of mechanosensation in driving pumping rhythms in the pharynx. Second, the suggestion that these mechanosensory neurons are directly connected to motor neurons controlling pumping stands in contrast to other sensory circuits identified in fly feeding and is an interesting idea that can be more rigorously tested in the future.

      At the same time, there are several shortcomings that limit the scope of the paper and the confidence in some claims. These include:

      a) the MN-LexA lines used for GRASP experiments are not characterized in any other way to demonstrate specificity. These were generated for this study using Phack methods, and their expression should be shown to be specific for MN11 and MN12 in order to interpret the GRASP experiments.

      Thanks for the suggestion. We have checked the expression pattern of MN-LexA, which is similar to MN-GAL4 used in previous work (Manzo et al., PNAS., 2012, PMID:22474379) . Here is the expression pattern:

      Author response image 3

      b) There is also insufficient detail for the P2X2 experiment to evaluate its results. Is this an in vivo or ex vivo prep? Is ATP added to the brain, or ingested? If it is ingested, how is ATP coming into contact with md-C neuron if it is not a chemosensory neuron and therefore not exposed to the contents of the cibarium?

      The P2X2 experimental preparation was done ex vivo. We immersed the fly in the imaging buffer, as described in the Methods section under Functional Imaging. Following dissection and identification of the subesophageal zone (SEZ) area under fluorescent microscopy, we introduced ATP slowly into the buffer, positioned at a distance from the brain

      c) In Figure 3C, the authors claim that ablating the labellum will remove the optogenetic stimulation of the md-L neuron (mechanosensory neuron of the labellum), but this manipulation would presumably leave an intact md-L axon that would still be capable of being optogenetically activated by Chrimson.

      Please refer to the corresponding answers for reviewer 1 and Figure 3—figure supplement 1.

      d) Average GCaMP traces are not shown for md-C during ingestion, and therefore it is impossible to gauge the dynamics of md-C neuron activation during swallowing. Seeing activation with a similar frequency to pumping would support the suggested role for these neurons, although GCaMP6s may be too slow for these purposes.

      Profiling the dynamics of md-C neuron activation during swallowing is crucial for unraveling the operational model of md-C and validating our proposed hypothesis. Unfortunately, our assay faces challenges in detecting probable 6Hz fluorescent changes with GCaMP6s.

      In general, we observed an increase of fluorescent signals during swallowing, but movement of alive flies during swallowing influenced the imaging recording, so we could not depict a decent tracing for calcium imaging for md-C neurons. To enhance the robustness of our findings, patching the md-C neurons would be a more convincing approach. As illustrated in Figure 2, the somata of md-C neurons are situated in the cibarium rather than the brain. patching of the md-C neuron somata in flies during ingestion is difficult.

      e) The negative result in Figure 4K that is meant to rule out taste stimulation of md-C is not useful without a positive control for pharyngeal taste neuron activation in this same preparation.

      We followed methods used in the previous work (Chen et al., Cell Rep., 2019, PMID:31644916), which we believe could confirm that md-C do not respond to sugars.

      In addition to the experimental limitations described above, the manuscript could be organized in a way that is easier to read (for example, not jumping back and forth in figure order).

      Thanks for your suggestion and the manuscript has been reorganized.

      Reviewer #3 (Public Review):

      Swallowing is an essential daily activity for survival, and pharyngo-laryngeal sensory function is critical for safe swallowing. In Drosophila, it has been reported that the mechanical property of food (e.g. Viscosity) can modulate swallowing. However, how mechanical expansion of the pharynx or fluid content sense and control swallowing was elusive. Qin et al. showed that a group of pharyngeal mechanosensory neurons, as well as mechanosensory channels (nompC, Tmc, and Piezo), respond to these mechanical forces for regulation of swallowing in Drosophila melanogaster.

      Strengths:

      There are many reports on the effect of chemical properties of foods on feeding in fruit flies, but only limited studies reported how physical properties of food affect feeding especially pharyngeal mechanosensory neurons. First, they found that mechanosensory mutants, including nompC, Tmc, and Piezo, showed impaired swallowing, mainly the emptying process. Next, they identified cibarium multidendritic mechanosensory neurons (md-C) are responsible for controlling swallowing by regulating motor neuron (MN) 12 and 11, which control filling and emptying, respectively.

      Weaknesses:

      While the involvement of md-C and mechanosensory channels in controlling swallowing is convincing, it is not yet clear which stimuli activate md-C. Can it be an expansion of cibarium or food viscosity, or both? In addition, if rhythmic and coordinated contraction of muscles 11 and 12 is essential for swallowing, how can simultaneous activation of MN 11 and 12 by md-C achieve this? Finally, previous reports showed that food viscosity mainly affects the filling rather than the emptying process, which seems different from their finding.

      We have confirmed that swallowing sucrose water solution activated md-C neurons, while sucrose water solution alone could not (Figure 4J-K). We hypothesized that the viscosity of the food might influence this expansion process.

      While we were unable to delineate the activation dynamics of md-C neurons, our proposal posits that these neurons could be activated in a single pump cycle, sequentially stimulating MN12 and MN11. Another possibility is that the activation of md-C neurons acts as a switch, altering the oscillation pattern of the swallowing central pattern generator (CPG) from a resting state to a working state.

      In the experiments with w1118 flies fed with MC (methylcellulose) water, we observed that viscosity predominantly affects the filling process rather than the emptying process, consistent with previous findings. This raises an intriguing question. Our investigation into the mutation of mechanosensitive ion channels revealed a significant impact on the emptying process. We believe this is due to the loss of mechanosensation affecting the vibration of swallowing circuits, thereby influencing both the emptying and filling processes. In contrast, viscosity appears to make it more challenging for the fly to fill the cibarium with food, primarily attributable to the inherent properties of the food itself.

      Reviewer #4 (Public Review):

      A combination of optogenetic behavioral experiments and functional imaging are employed to identify the role of mechanosensory neurons in food swallowing in adult Drosophila. While some of the findings are intriguing and the overall goal of mapping a sensory to motor circuit for this rhythmic movement are admirable, the data presented could be improved.

      The circuit proposed (and supported by GRASP contact data) shows these multi-dendritic neurons connecting to pharyngeal motor neurons. This is pretty direct - there is no evidence that they affect the hypothetical central pattern generator - just the execution of its rhythm. The optogenetic activation and inhibition experiments are constitutive, not patterned light, and they seem to disrupt the timing of pumping, not impose a new one. A slight slowing of the rhythm is not consistent with the proposed function.

      Motor neurons implicated in patterned motions can be considered effectors of Central Pattern Generators (CPGs)(Marder et al., Curr Biol., 2001, PMID: 11728329; Hurkey et al., Nature., 2023, PMID:37225999). Given our observation of the connection between md-C neurons and motor neurons, it is reasonable to speculate that md-C neurons influence CPGs. Compared to the patterned light (0.1s light on and 0.1s light off) used in our optogenetic experiments, we noted no significant changes in their responses to continuous light stimulation. We think that optogenetic methods may lead to overstimulation of md-C neurons, failing to accurately mimic the expansion of the cibarium during feeding.

      Dysfunction in mechanosensitive ion channels or mechanosensory neurons not only disrupts the timing of pumping but also results in decreased intake efficiency (Figure 1E). The water-swallowing rhythm is generally stable in flies, and swallowing is a vital process that may involve redundant ion channels to ensure its stability.

      The mechanosensory channel mutants nompC, piezo, and TMC have a range of defects. The role of these channels in swallowing may not be sufficiently specific to support the interpretation presented. Their other defects are not described here and their overall locomotor function is not measured. If the flies have trouble consuming sufficient food throughout their development, how healthy are they at the time of assay? The level of starvation or water deprivation can affect different properties of feeding - meal size and frequency. There is no description of how starvation state was standardized or measured in these experiments.

      Defects in mechanosensory channel mutants nompC, piezo, and TMC, have been extensively investigated (Hehlert et al., Trends Neurosci., 2021, PMID:332570000). Mutations in these channels exhibit multifaceted effects, as illustrated in our RNAi experiments (see Figure 2E). Deprivation of water and food was performed in empty fly vials. It's important to note that the duration of starvation determines the fly's willingness to feed but not the pump frequency (Manzo et al., PNAS., 2012, PMID:22474379).

      In most cases, female flies were deprived water and food in empty vials for 24 hours because after that most flies would be willing to drink water. The deprivation time is 12 hours for flies with nompC and Tmc mutated or flies with Kir2.1 expressed in md-C neurons, as some of these flies cannot survive 24h deprivation.

      The brain is likely to move considerably during swallow, so the GCaMP signal change may be a motion artifact. Sometimes this can be calculated by comparing GCaMP signal to that of a co-expressed fluorescent protein, but there is no mention that this is done here. Therefore, the GCaMP data cannot be interpreted.

      We did not co-express a fluorescent protein with GCaMP for md-C. The head of the fly was mounted onto a glass slide, and we did not observe significant signal changes before feeding.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      .>Abstract: I disagree that swallow is the first step of ingestion. The first paragraph also mentions the final checkpoint before food ingestion. Perhaps sufficient to say that swallow is a critical step of ingestion.

      Indeed, it is not rigorous enough to say “first step”. This has been replaced by “early step”.

      Introduction:

      Line 59: "Silence" should be "Silencing"

      This has been replaced.

      Results:

      Lines 91-92: I am not clear about what this means. 20% of nompC and 20% of wild-type flies exhibit incomplete filling? So nompC is not different from wild-type?

      Sorry for the mistake. Viscous foods led to incomplete emptying (not incomplete filling), as displayed in Video 4. The swallowing behavior differs between nompC mutants and wild-type flies, as illustrated in Figure 1C, Figure 1—figure supplement 1A-C and video 1&5.

      When fed with 1% MC water solution (Figure 1—figure supplement 1E-H). We found that when fed with 1% MC watere solution, Tmc or piezo mutants displayed incomplete emptying, which could constitute a long time proportion of swallowing behavior; while only 20% of nompC flies and 20% of wild-type flies sporadically exhibit incomplete emptying, which is significantly different. Though the percent of flies displaying incomplete pump is similar between nompC mutant and wild-type files, you can find it quite different in video 1 and 5.

      Line 94: Should read: “while for foods with certain viscosity, the pump of Tmc or piezo mutants might"

      What evidence is there for weakened muscle motion? The phenotypes of all three mutants is quite similar, so concluding that they have roles in initiation versus swallowing strength is not well supported -this would be better moved to the discussion since it is speculative.

      Muscles are responsible for pumping the bolus from the mouth to the crop. In the case of Tmc or piezo mutants, as evidenced by incomplete filling for viscous foods (see Video 4), we speculate that the loss of sensory stimuli leads to inadequate muscle contraction. The phenotypes observed in Tmc and piezo mutants are similar yet distinct from those of the wild-type or nompC mutant, as shown in Video 1 and 4. The phrase "due to weakened muscle motion" has been removed for clarity.

      Line 146: If md-L neurons are also labeled by this intersection, then you are not able to know whether the axons seen in the brain are from md-L or md-C neurons. Line 148: cutting the labellum is not sufficient to ablate md-L neurons. The projections will still enter the brain and can be activated with optogenetics, even after severing the processes that reside in the labellum.

      Please refer to the responses for reviewer #1 (Public Review):” A major weakness of the paper…” and Figure 4.

      Line 162: If the fly head alone is in saline, do you know that the sucrose enters the esophagus? The more relevant question here is whether the md-C neurons respond to mechanical force. If you could artificially inflate the cibarium with air and see the md-C neurons respond that would be a more convincing result. So far you only know that these are activated during ingestion, but have not shown that they are activated specifically by filling or emptying. In addition, you are not only imaging md-C (md-L is also labeled). This caveat should be mentioned.

      We followed the methods outlined in the previous work (Chen et al., Cell Rep., 2019, PMID:31644916), which suggested that md-C neurons do not respond to sugars. While we aimed to mechanically stimulate md-C neurons, detecting signal changes during different steps of swallowing is challenging. This aspect could be further investigated in subsequent research with the application of adequate patch recording or two-photon microscopy (TPM).

      Figure 3: It is not clear what the pie charts in Figure 3 A refer to. What are the three different rows, and what does blue versus red indicate?

      Figure 3A illustrates three distinct states driven by CsChrimson light stimulation of md-C neurons, with the proportions of flies exhibiting each state. During light activation, flies may display difficulty in filling, incomplete filling, or a normal range of pumping. The blue and red bars represent the proportions of flies showing the corresponding state, as indicated by the black line.

      Figure 4: Where are the example traces for J? The comparison in K should be average dF/F before ingestion compared with average dF/F during ingestion. Comparing the in vitro response to sucrose to the in vivo response during ingestion is not a useful comparison.

      Please refer to the answers for reviewer #2 question d).

      Reviewer #2 (Recommendations For The Authors):

      Suggested experiments that would address some of my concerns listed in the public review include:

      a) high resolution SEZ images of MN-LexA lines crossed to LexAop-GFP to demonstrate their specificity

      b) more detail on the P2X2 experiment. It is hard to make suggestions beyond that without first seeing the details.

      c) presenting average GCaMP traces for all calcium imaging results

      d) to rule out taste stimulation of md-C (Figure 4K) I would suggest performing more extensive calcium imaging experiments with different stimuli. For example, sugar, water, and increasing concentrations of a neutral osmolyte (e.g. PEG) to suppress the water response. I think that this is more feasible than trying to get an in vitro taste prep to be convincing.

      Please refer to the responses for public review of reviewer #2.

      Reviewer #3 (Recommendations For The Authors):

      Below I list my suggestions as well as criticisms.

      (1) It would be excellent if the authors could demonstrate whether varying levels of food viscosity affect md-C activation.

      That is a good point, and could be studied in future work.

      (2) It is not clear whether an intersectional approach using TMC-GAL4 and nompC-QF abolishes labelling of the labellar multidendritic neurons. If this is the case, please show labellar multidendritic neurons in TMC-GAL4 only flies and flies using the intersectional approach. Along with this question, I am concerned that labellum-removed flies could be used for feeding assay.

      Intersectional labelling using TMC-GAL4 and nompC-QF could not abolish labelling of the labellar multidendritic neurons (Author response image 4). Labellum-removed flies could be used for feeding assay (Figure 3—figure supplement 1B-C, video 5), but once LSO or cibarium of fly was damaged, swallowing behavior would be affected. Removing labellum should be very careful.

      Author response image 4

      (3) Please provide the detailed methods for GRASP and include proper control.

      Please refer to the responses for public review of reviewer #1.

      (4) The authors hypothesized that md-C sequentially activates MN11 and 12. Is the time gap between applying ATP on md-C and activation of MN11 or MN12 different? Please refer to the responses for public review of reviewer #3. The time gap between applying ATP on md-C and activation of MN11 or MN12 didn’t show significant differences, and we think the reason is that the ex vivo conditions could not completely mimic in vivo process.

      I found the manuscript includes many errors, which need to be corrected.

      (1) The reference formatting needs to be rechecked, for example, lines 37, 42, and 43.

      (2) Line 44-46: There is some misunderstanding. The role of pharyngeal mechanosensory neurons is not known compared with chemosensory neurons.

      (3) Line 49: Please specify which type of quality of food. Chemical or physical?

      (4) Line 80 and Figure 1B-D Authors need to put filling and emptying time data in the main figure rather than in the supplementary figure. Otherwise, please cite the relevant figures in the text(S1A-C).

      (5) Line 84-85; Is "the mutant animals" indicating only nompC? Please specify it.

      (6) Figure 1a: It is hard to determine the difference between the series of images. And also label filling and emptying under the time.

      (7) S1E-H: It is unclear what "Time proportion of incomplete pump" means. Please define it.

      (8) Please reorganize the figures to follow the order of the text, for example, figures 2 and 4

      (9) Figure 4A. There is mislabelling in Figure 4A. It is supposed to be phalloidin not nc82.

      (10) Figure 4K: It does not match the figure legend and main text.

      (11) Figure 4D and G: Please indicate ATP application time point.

      Thanks for your correction and all the points mentioned were revised.

      Reviewer #4 (Recommendations For The Authors):

      The figures need improvement. 1A has tiny circles showing pharynx and any differences are unclear.

      The expression pattern of some of these drivers (Supplement) seems quite broad. The tmc nompC intersection image in Figure 1F is nice but the cibarium images are hard to interpret: does this one show muscle expression? What are "brain" motor neurons? Where are the labellar multi-dendritic neurons?

      Tmc nompC intersection image show no expression in muscles. Somata of motor neurons 12 or 11 situated at SEZ area of brain, while somata of md-C neurons are in the cibarium. Image of md-L neurons was posted in response for reviewer #3 (Recommendations For The Authors):

      Why do the assays alternate between swallowing food and swallowing water?

      Thank for your suggestion, figure 1A has been zoomed-in. The Tmc nompC intersection image in Figure 2F displayed the position of md-C neurons in a ventral perspective, and muscles were not labelled. We stained muscles in cibarium by phalloidin and the image is illustrated in Figure 4A, while we didn’t find overlap between md-C neurons and muscles. Image of md-L neurons were posted as Author response image 4.

      In the majority of our experiments, we employed water to test swallowing behavior, while we used methylcellulose water solution to test swallowing behavior of mechanoreceptor mutants, and sucrose solution for flies with md-C neurons expressing GCaMP since they hardly drank water when their head capsules were open.

      How starved or water-deprived were the flies?

      One day prior to the behavioral assays, flies were transferred to empty vials (without water or food) for 24 hours for water deprivation. Flies who could not survive 24h deprivation would be deprived for 12h.

      How exactly was the pumping frequency (shown in Fig 1B) measured? There is no description in the methods at all. If the pump frequency is scored by changes in blue food intensity (arbitrary units?), this seems very subjective and maybe image angle dependent. What was camera frame rate? Can it capture this pumping speed adequately? Given the wealth of more quantitative methods for measuring food intake (eg. CAFE, flyPAD), it seems that better data could be obtained.

      How was the total volume of the cibarium measured? What do the pie charts in Figure 3A represent?

      The pump frequency was computed as the number of pumps divided by the time scale, following the methodology outlined in Manzo et al., 2012. Swallowing curves were plotted using the inverse of the blue food intensity in the cibarium. In this representation, ascending lines signify filling, while descending lines indicate emptying (see Figure 2D, 3B). We maintain objectivity in our approach since, during the recording of swallowing behavior, the fly was fixed, and we exclusively used data for analysis when the Region of Interest (ROI) was in the cibarium. This ensures that the intensity values accurately reflect the filling and emptying processes. Furthermore, we conducted manual frame-by-frame checks of pump frequency, and the results align with those generated by the time series analyzer V3 of ImageJ.

      For the assessment of total volume of ingestion, we referred the methods of CAFE, utilizing a measurable glass capillary. We then calculated the ingestion rate (nL/s) by dividing the total volume of ingestion by the feeding time.

      The changes seem small, in spite of the claim of statistical significance.

      The observed stability in pump frequency within a given genotype underscores the significance of even seemingly small changes, which is statistically significant. We speculate that the stability in swallowing frequency suggests the existence of a redundant mechanism to ensure the robustness of the process. Disruption of one channel might potentially be partially compensated for by others, highlighting the vital nature of the swallowing mechanism.

      How is this change in pump frequency consistent with defects in one aspect of the cycle - either ingestion (activation) or expulsion (inhibition)?

      Please refer to Figure 2, 3. Both filling and emptying process were affects, while inhibition mainly influences emptying time (Figure 1—figure supplement 1).

      for the authors:

      Line 48: extensively

      Line 62 - undiscovered.

      Line 107, 463: multi

      Line 124: What is "dysphagia?" This is an unusual word and should be defined.

      Line 446: severe

      Line 466: in the cibarium or not?

      Thanks for your correction and all the places mentioned were revised.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript by Chen et al. entitled, "The retina uncouples glycolysis and oxidative phosphorylation via Cori-, Cahill-, and mini-Krebs-cycle", the authors look to provide insight on retinal metabolism and substrate utilization by using a murine explant model with various pharmacological treatments in conjunction with metabolomics. The authors conclude that photoreceptors, a specific cell within the explant, which also includes retinal pigment epithelium (RPE) and many other types of cells, are able to uncouple glycolytic and Krebs-cycle metabolism via three different pathways: 1) the mini-Krebs-cycle, fueled by glutamine and branched-chain amino acids; 2) the alanine-generating Cahill-cycle; and 3) the lactate-releasing Cori-cycle. While intriguing if determined to be true, these cell-specific conclusions are called into question due to the ex vivo experimental setup with the inclusion of RPE, the fact that the treatments were not cell-specific nor targeted at an enzyme specific to a certain cell within the retina, and no stable isotope tracing nor mitochondrial function assays were performed. Hence, without significant cell-specific methods and future experimentation, the primary claims are not supported.

      Strengths:

      This study attempts to improve on the issues that have limited the results obtained from previous ex vivo retinal explant studies by culturing in the presence of the RPE, which is a major player in the outer retinal metabolic microenvironment. Additionally, the study utilizes multiple pharmacologic methods to define retinal metabolism and substrate utilization.

      Weaknesses:

      A major weakness of this study is the lack of in vivo supporting data. Explant cultures remove the retina from its dual blood supply. Typically, retinal explant cultures are done without RPE. However, the authors included RPE in the majority of experimental conditions herein. However, it is unclear if the metabolomics samples included the RPE or not. The inclusion of the RPE, which is metabolically active and can be altered by the treatments investigated herein, further confounds the claims made regarding the neuroretina. Considering the pharmacologic treatments utilized with the explant cultures are not cell-specific and/or have significant off-target effects, it is difficult to ascertain that the metabolic changes are secondary to the effects on photoreceptors alone, which the authors claim. Additionally, the explants are taken at a very early age when photoreceptors are known to still be maturing. No mention or data is presented on how these metabolic changes are altered in retinal explants after photoreceptors have fully matured. Likewise, significant assumptions are made based on a single metabolomics experiment with no stable isotope tracing to support the pathways suggested. While the authors use immunofluorescence to support their claims at multiple points, demonstrating the presence of certain enzymes in the photoreceptors, many of these enzymes are present throughout the retina and likely the RPE. Finally, the claims presented here are in direction contradiction to recent in vivo studies that used cell-specific methods when examining retinal metabolism. No discussion of this difference in results is attempted. Response: We agree with the reviewer that in vivo studies could be very interesting indeed. However, technologically it will be extremely difficult to (repeatedly/continuously) sample the retina of an experimental animal and to combine this with an interventional study, with a subsequent metabolomic analysis. We do not currently have access to such technology nor are we aware of any other lab in the world capable of doing such studies. Moreover, virtually all prior studies on retinal metabolism have been done on explanted retina without RPE. This includes the seminal studies by Otto Warburg in the 1920s. As opposed to this, our retinal samples for also all the metabolomic analyses included the RPE, except for the no RPE condition that was used as a comparator for the earlier investigations.

      We note that our metabolomic analysis was done for all five experimental conditions where each condition included at least five independent samples (each derived from different animals).

      The reviewer is correct to say that our organotypic explant cultures are early post-natal, with explantation performed at post-natal day 9 and culturing until day 15. Since our retinal explant system has been validated extremely well over more than three decades of pertinent research (see for instance: Caffe et al., Curr Eye Res. 8:1083-92, 1989), we are confident that photoreceptors mature in vitro in ways that are very similar to the in vivo situation. As far as studies in adult retina (i.e. three months or older) are concerned, this is indeed an important question that will be addressed in future studies. Studies employing stable isotope labelling may also be very informative and are planned for the future, also in order to properly determine fluxes. This will likely require an extension to our NMR hardware with an 15N channel probe, something that we plan on implementing in the future.

      We are aware that a number of questions relating to retinal metabolism are controversial and that the use of other methodology or experimental systems may lead to alternative interpretations. We have now included citations of other studies that use, for example, conditional and/or inducible knock-outs or in vivo blood sampling (e.g. Wang et al., IOVS 38:48-55, 1997; Yu et al., Invest Ophthalmol Vis Sci. 46:4728-33, 2005; Swarup et al., Am J Physiol Cell Physiol. 316:C121-C133, 2019; Daniele et al., FASEB Journal 36:e22428, 2022) and discuss the pros and cons of such approaches (e.g. in Lines 376-384; 454-472).

      Reviewer #2 (Public Review):

      Summary:

      The authors aim to learn about retinal cell-specific metabolic pathways, which could substantially improve the way retinal diseases are understood and treated. They culture ex vivo mouse retinas for 6 days with 2 - 4 days of various drug treatments targeting different metabolic pathways or by removing the RPE/choroid tissue from the neural retina. They then look at photoreceptor survival, stain for various metabolic enzymes, and quantify a broad panel of metabolites. While this is an important question to address, the results are not sufficient to support the conclusions.

      Strengths:

      The questions the authors are exploring at extremely valuable and I commend the authors and working to learn more about retina metabolism. The different sensitivity of the cones to various drugs is interesting and may suggest key differences between rods and cones. The authors also provide a thoughtful discussion of various metabolic pathways in the context of previous publications.

      Weaknesses:

      As the authors point out, ex vivo culture models allow for control over multiple aspects of the environment (such as drug delivery) not available in vivo. Ex vivo cultures can provide good hints as to what pathways are available between interacting tissues. However, there are many limitations to ex vivo cultures, including shifting to a very artificial culture media condition that is extremely different than the native environment of the retina. It is well appreciated that cells have flexible metabolism and will adapt to the conditions provided. Therefore, observations of metabolic responses obtained under culture conditions need to be interpreted with caution, they indicate what the tissue is doing under those specific conditions (which include cells adapting and dying).

      Chen et al use pharmacological interventions to the impact of various metabolic pathways on photoreceptor survival and "long term" metabolic changes. The dose and timing of these drug treatments are not examined though. It is also hard to know how these drugs penetrate the tissue and it needs to be validated that the intended targets are being accurately hit. These relatively long-term treatments should be causing numerous downstream changes to metabolism, cell function, and survival, which makes looking at a snapshot of metabolite levels hard to interpret. It would be more valuable to look at multiple time points after drug treatment, especially easy time points (closer to 1 hr). The authors use metabolite ratios to make conclusions about pathway activity. It would be more valuable to directly measure pathway activity by looking a metabolite production rates in the media and/or with metabolic tracers again in time scales closer to minutes and hours instead of days.

      It is not clear from the text if the ex vivo samples with RPE/choroid intact are analyzed for metabolomics with the RPE/choroid still intact or if this is removed. If it is not removed, the comparison to the retina without RPE/choroid needs to be re-interpreted for the contribution of metabolites from the added tissue. The composition of the tissue is different and cannot be disentangled from the changes to the neural retina specifically.

      While the data is interesting and may give insights into some rod and cone-specific metabolic susceptibility, more work is needed to validate these conclusions. Given the limitations of the model the authors have over-interpreted their findings and the conclusions are not supported by the results. They need to either dramatically limit the scope of their conclusions or validate these hypotheses with additional models and tools.

      Response: We thank the reviewer for the insightful comments and agree that some of our interpretations may have been phrased too determinedly. We have therefore rephrased and toned down our conclusions in many instances in the text, and changed the manuscript title to now read “Retinal metabolism: Evidence for uncoupling of glycolysis and oxidative phosphorylation via Cori-, Cahill-, and mini-Krebs-cycle”.

      Nevertheless, when considering the major known metabolic pathways and their possible impact on metabolite patterns after the experimental manipulations used here, we believe our interpretations to be consistent with the data obtained. Conversely, the previously suggested retinal aerobic glycolysis cannot explain most of the data we have obtained. Even further, also a predominant use of the classical “full” Krebs-cycle/OXPHOS would not explain the metabolite patterns found (e.g. alanine, N-acetylaspartate (NAA)). While this does not in itself mean that our interpretations are all correct, they seem plausible in view of the data at hand and will hopefully stimulate further research on retinal energy metabolism using complementary technologies that were not available to us for the purpose of this study.

      We comment that our organotypic retinal explant cultures, while they do contain their very own, native RPE, do not comprise the choroidal vasculature (in our explantation procedure the RPE readily detaches from the choroid).

      As far as the drugs used on retinal explants are concerned, we note that:

      (1) all three compounds used are extremely well validated, with literally thousands of studies and decades of research to their credit (i.e., 1,9-dideoxyforskolin: >270 publications since 1984; Shikonin: >1000 publications since 1977; FCCP: >2800 publications since 1967),

      (2) all experimental conditions show clear and differential drug effects, as shown, for instance, by the principal component analysis in Figure1I and the cluster analysis in Figure2A,

      (3) the response patterns observed for key metabolites match the anticipated drug effects (e.g. decreased glucose consumption with 1,9-dideoxyforskolin; decreased lactate levels with Shikonin; lactate accumulation with FCCP).

      One can therefore be reasonably certain that these drugs did penetrate the explanted retina and that their respective drug targets were hit. Assessing dose-responses would certainly be interesting, however, the aim of this initial study was not pharmacodynamics but a general manipulation of energy metabolism. Moreover, given the extensive validation of these drugs, off-target effects seem not very likely at the concentrations used.

      We agree with the reviewer that using a longitudinal, time-series type of analysis could give additional insights. We note that each additional time-point will require retinae from 25 animals and a very resource-intensive and time-consuming metabolomic analysis, together with a significantly more complex multivariate analysis (metabolite, experimental condition, time). This is a completely new undertaking that is simply not feasible as an extension of the present study.

      To look at pathway activity in more direct ways is very good idea, to this end we aim to implement in the future an idea put forward by the reviewers, namely 13C-labeling and additionally 15N-labeling and tracing for specific metabolic fuels (e.g. glucose, lactate and anaplerotic amino acids such as glutamate and branched chain amino acids).

      The reviewer is of course correct to say that the culture condition is somewhat artificial and that this may have introduced changes in the metabolism. However, as noted above in the first response to reviewer #1, the organotypic retinal culture system, using a defined medium, free of serum and antibiotics, has been extremely well studied and validated for decades (cf. Caffé et al., Curr Eye Res. 8:1083-92, 1989). Importantly, this system allows to maintain retinal viability, histotypic organization, and function over many weeks in culture. Moreover, most previous studies on retinal metabolism have also used explanted retina – acute or cultured – i.e. experimental approaches that are similar to what we have used and that may be liable to their own artefactual changes in metabolism. This includes the seminal, 1920s studies by Otto Warburg, or the 1980s studies by Barry Winkler, the results of which the reviewers do not seem to doubt.

      We further agree that studying retinal metabolism in a situation closer to in vivo conditions would be thrilling, however to our knowledge to date there is no retina model that fully mimics the complex interplay of the blood metabolome with metabolic tissue activity. This likely means that for each metabolic condition to study (e.g. hyperglycemia, cachexia, etc.), a fairly large number of animals will need to be sacrificed for the molecular investigation of ex vivo retinal biopsies, which would mean a tremendous animal burden.

      We hope the reviewer will appreciate that the revised manuscript now includes numerous improvements, along with new, additional datasets and figures, references to further relevant literature, and – as mentioned above – a more cautious phrasing of our interpretations and conclusions, including a more careful wording for the manuscript title.

      Reviewer #3 (Public Review):

      Summary:

      The neural retina is one of the most energetically active tissues in the body and research into retinal metabolism has a rich history. Prevailing dogma in the field is that the photoreceptors of the neural retina (rods and cones) are heavily reliant on glycolysis, and as oxygen tension at the level of photoreceptors is very low, these specialized sensory neurons carry out aerobic glycolysis, akin to the Warburg effect in cancer cells. It has been found that this unique metabolism changes in many retinal diseases, and targeting retinal metabolism may be a viable treatment strategy. The neural retina is composed of 11 different cell types, and many research groups over the past century have contributed to our current understanding of cell-specific metabolism of retinal cells. More recently, it has been shown in mouse models and co-culture of the mouse neural retina with human RPE cultures that photoreceptors are reliant on the underlying retinal pigment epithelium for supplying nutrients. Chen and colleagues add to this body of work by studying an ex vivo culture of the developing mouse retina that maintained contact with the retinal pigment epithelium. They exposed such ex vivo cultures to small molecule inhibitors of specific metabolic pathways, performing targeted metabolomics on the tissue and staining the tissue with key metabolic enzymes to lay the groundwork for what metabolic pathways may be active in particular cell types of the retina. The authors conclude that rod and cone photoreceptors are reliant on different metabolic pathways to maintain their cell viability - in particular, that rods rely on oxidative phosphorylation and cones rely on glycolysis. Further, their data support multiple mechanisms whereby glycolysis may occur simultaneously with anapleurosis to provide abundant energy to photoreceptors. The data from metabolomics revealed several novel findings in retinal metabolism, including the use of glutamine to fuel the mini-Krebs cycle, the utilization of the Cahill cycle in photoreceptors, and a taurine/hypotaurine shuttle between the underlying retinal pigment epithelium and photoreceptors to transfer reducing equivalents from the RPE to photoreceptors. In addition, this study provides robust quantitative metabolomics datasets that can be compared across experiments and groups. The use of this platform will allow for rapid testing of novel hypotheses regarding the metabolic ecosystem in the neural retina.

      Strengths:

      The data on differences in the susceptibility of rods and cones to mitochondrial dysfunction versus glycolysis provides novel hypothesis-generating conjectures that can be tested in animal models. The multiple mechanisms that allow anapleurosis and glycolysis to run side-by-side add significant novelty to the field of retinal metabolism, setting the stage for further testing of these hypotheses as well.

      Weaknesses:

      Almost all of the conclusions from the paper are preliminary, based on data showing enzymes necessary for a metabolic process are present and the metabolites for that process are also present. However, to truly prove whether these processes are happening, C13 labeling or knock-out or over-expression experiments are necessary. Further, while there is good data that RPE cultures in vitro strongly recapitulate RPE phenotypes in vivo, ex vivo neural retina cultures undergo rapid death. Thus, conclusions about metabolism from explants should either be well correlated with existing literature or lead to targeted in vivo studies. This paper currently lacks both.

      Response: As mentioned above in the first answers to reviewers #1 and #2, we think of our study as a starting point that may provide novel directions for a whole series of investigations into retinal energy metabolism. Especially the use of novel technologies may in the future allow to decipher the different metabolic phenotypes of the 100+ distinct retinal cell types by in situ spatial metabolomics and lipidomics. Currently, we still have to limit the scope of our studies to only certain aspects of this topic. We thus agree that some of our interpretations need to be formulated more carefully and we have done so in the revised version of our manuscript. We also agree with the reviewer that carbon (13C) labelling and tracing studies will be very informative and will engage in such studies in the future. Besides 13C, we aim to further employ 15N labelled substrates, which is especially suitable to study the destiny of amino acids.

      As far as our organotypic retinal explant system is concerned, it is arguably one of the best validated such systems available (see responses to reviewers #1 and #2). While the reviewer is correct to say that the neuroretina without RPE degenerates relatively quickly in vitro, in our system, with the neuroretina and its native RPE cultured together, we can routinely culture the retina for four weeks or more, without major cell loss (Söderpalm et al., IOVS 35:3910-21, 1994; Belhadj et al., JoVE 165, 2020). Thus, our retinal cultures with RPE do not undergo rapid death. Within the time-frame of the present study (6 days in vitro) culturing-induced cell death is minimal and unlikely to influence our analyses. For further, more detailed answers to the reviewers’ questions please see our detailed point-to-point response below.

      We agree with the reviewer that eventually in vivo studies will be important to confirm our interpretations. As mentioned in our initial response to reviewer #1, such studies will be very challenging and new technologies may need to be developed before in vivo investigations can deliver the answers to the questions at hand (see answer to question Rev#3.17 below), especially if the cross-play between substrate availability from the blood metabolome and the retinal metabolic pathway activity shall be studied.

      Recommendations For The Authors

      Reviewer #1 (Recommendations For The Authors):

      Rev#1.1. The animals should be screened for and lack rd8.

      Response: This is a pertinent question from the reviewer. Ever since we first became aware of the presence of rd8 mutations in certain mouse lines from major vendors (e.g. Charles River, Jackson Labs) in around 2010, we have setup regular screening of all our mouse lines for this Crb1 mutation. Accordingly, the mouse lines used in this study were confirmed to be free of the rd8 / Crb1 mutation. A corresponding remark has now been inserted into the SI materials and methods section (Lines 37-38).

      Rev#1.2. GLUT1 looks significantly different from in vivo to in vitro. Recommend co-staining with RHO and cone markers (PNA or CAR) to further delineate where it is being expressed. The in vitro cultures appear to have much shorter outer segments (OS). Considering OS biosynthesis is thought to drive a good deal of metabolic adaptations, how relevant is the in vitro model system to what is truly occurring in vivo?

      Response: The GLUT1 staining shown in Figure 1 displays the in vivo situation. Since may not have been entirely clear from the previous figure legend, we have now labelled this as “in vivo retina” and distinguish it from “in vitro” samples in the legend to Figure 1 (Lines 774-778). As far as the comparison of GLUT1 staining in vivo (Figure 1A3) vs. in vitro (Figure S1C3) is concerned, in both situations a strong RPE labelling is clearly visible, with essentially no GLUT1 label within the neuroretina.

      Nevertheless, to better delineate the expression of GLUT1 in the outer retina, we have now performed an additional co-staining with rhodopsin (RHO) as rod marker and peanut agglutinin (PNA) as cone marker, as suggested by the reviewer (new supplemental Figure S1). In brief, this co-staining confirms the strong expression of GLUT1 in the RPE, while there is essentially no GLUT1 detectable in rod or cone photoreceptors.

      Retinal explants in long-term cultures do indeed have somewhat shorter outer segments compared to same age in vivo counterparts (Caffe et al., Curr Eye Res. 8:1083-1092, 1989). However, in the short-term cultures (6 DIV) and at the age studied here (P15) outer segments have only just started to grow out and are around 10 - 12 µm long, both in vitro and in vivo (cf. LaVail, JCB 58:650-661, 1978). Thus, the metabolism required for outer segment synthesis should be equivalent when in vitro and in vivo situations are compared. For considerations on outer segments in retinal explant cultures see also Rev#3.2 and Rev#3.29.

      Rev#1.3. Also, recent publications have shown that GLUT1 is expressed in the neuroretina including rods, cones, and muller glia. Was GLUT1 not appreciated in these cells in your ex vivo samples and if so, why? Likewise, these same studies previously demonstrated GLUT1 resulted in rod degeneration but not cone. The results presented here differ significantly. Why the difference in results and is it secondary to the in vitro vs. in vivo setting? Furthermore, the authors state that they thought the no RPE situation would be similar to the GLUT1 inhibitor experimental condition but instead, they were vastly different. Is this secondary to the fact that GLUT1 is expressed outside the RPE.

      Response: We are aware that there is a controversy regarding GLUT1 expression in the neuroretina, please see also our response to question Rev#3.1 below. As far as our immunostaining for GLUT1 on in vivo retina is concerned, we find an unambiguous and very marked expression of GLUT1 in RPE cells, at both basal and apical sides. Compared to the RPE, the neuroretina appears devoid of GLUT1 staining. However, at very high gamma values a faint staining in the neuroretina becomes visible, a staining which from its appearance – processes spanning the entire width of the retina – is most compatible with Müller glia cells. Under normal circumstances we would have dismissed such a faint staining as background and false positive. Given the sometimes very contradicting reports in the literature, we cannot fully exclude a weak expression of GLUT1 also in cells other than the RPE, with Müller glial cells perhaps being the most likely candidate. At any rate, GLUT1 expression in the neuroretina can only be much weaker than in the RPE, making its relevance for overall retinal metabolism unclear.

      As far as recent publications studying GLUT1 in the retina are concerned, we know of the study by Daniele et al. (FASEB Journal 36:e22428, 2022), which used a rod-specific, conditional knock-out of GLUT1 and found a relatively slow rod degeneration. We are not aware of a selective GLUT1 knock-out in cones, nor are we aware of conditional GLUT3 knock-outs in the retina. For further discussion of the Daniele et al. study please see Rev#3.13.

      The reviewer is right, initially we were thinking that, since GLUT1 was expressed only (predominantly) in RPE, the metabolic response to GLUT1 inhibition should look similar to the no RPE situation. However, this initial hypothesis did not consider a key fact: The RPE builds the blood retinal barrier and the tight-junction coupled RPE cells are a barrier to any larger molecule, including glucose. Removing the barrier by removing the RPE dramatically increases the availability of glucose to the retina, a phenomenon that is likely exacerbated by the expression of the high affinity/high capacity GLUT3 on photoreceptors (cf. Figure S1A). In other words, when the RPE is removed the outer retina is “flooded” with glucose and we believe that this is probably the main factor that explains why the metabolic response to GLUT1 inhibition (1,9-DDF group) is so different from the no RPE condition.

      We have now included an additional corresponding explanation in the discussion (Lines 422-429). Furthermore, we have added an entire new subchapter to the discussion to debate the expression of glucose transporters in the outer retina (Lines 454-472).

      Rev#1.4. Shikonin's mechanism of action via protein aggregation and lack of specificity for PKM2 vs PKM1 at 4uM is an experimental limitation that needs to be taken into account. All treatments utilized are not cell-specific.

      Response: While the reviewer is correct to say that Shikonin may have multiple cellular targets and a diverse range of possible applications as an anti-inflammatory, antimicrobial, or anticancer agent (cf. Guo et al., Pharmacol. Res. 149:104463, 2019), numerous studies support its specificity for PKM2 over PKM1, at concentrations ranging from 1 – 10 µM (Chen et al., Oncogene 30:4297-306, 2011; Zhao et al., Sci. Rep. 8:14517, 2018; Traxler et al., Cell Metab. 34:1248-1263, 2022). We settled for 4 μM as an intermediate concentration, considering its effectiveness and specificity in previous studies. We have now inserted references detailing the specificity and concentration range of Shikonin into the SI Materials and Methods section (Line 62).

      The concern that “all treatments” are not cell-specific is debatable. Certainly, any given compound may have off-target effects, yet, since the compounds we used in our study have all been studied for decades (see above, initial response to Reviewer #2), their off-target profile is well established and unlikely to play an important role here. Moreover, in our study the cell specificity does not come from the compounds used but from where their targets are expressed. As shown in Figure 1A and in Figure S1C, Shikonin´s target PKM2 is almost exclusively expressed in photoreceptor inner segments. Hence, it seems very reasonable to expect that the vast majority of the metabolomic changes observed by Shikonin treatment are related to photoreceptors. We note that this assertion would still be true even if there was a low-level expression of PKM2 in other retinal cell types and/or if Shikonin had moderate off-target effects on other enzymes since the bulk of the effect on the quantitative metabolomic dataset would still originate from PKM2 inhibition in photoreceptors.

      Rev#1.5. What was the method of cone counting in Figure 1?

      Response: Cones were counted per 100 µm of retinal circumference based on an arrestin-3 staining (cone arrestin, CAR).

      This information is now included in the SI Materials and Methods section under “Microscopy, cell counting, and statistical analysis” (Lines 99-100).

      Rev#1.6. How do you know that FCCP is not altering RPE ox phos, disrupting the outer retinal microenvironment and leading to cell death, and therefore, the effects seen are not photoreceptor-specific but rather downstream from the initial insult in RPE?

      Response: We propose that FCCP will be acting on both photoreceptors and RPE cells (and all other retinal cell types) at essentially the same time, over the experimental time-frame. Thus, OXPHOS should be inhibited in all cells simultaneously. However, FCCP will primarily affect cells that actually use OXPHOS to a large extent, while cells relying on other metabolic pathways (e.g. glycolysis) will hardly be affected.

      We believe the very strong effect of FCCP, seen exclusively in rod photoreceptors, to be a direct drug effect. While we cannot not fully exclude an indirect effect via the RPE – as proposed by the reviewer – we think this to be unlikely because:

      (1) RPE viability was not compromised by FCCP treatment.

      (2) If the reviewer´s hypothesis was correct, then also cone photoreceptors should have been affected (e.g. because now the RPE consumes all glucose, leaving nothing for cones). However, cones were essentially unaffected by the FCCP treatment, making a dependence on RPE OXPHOS unlikely. Especially so, because blocking GLUT1 and glucose import on the RPE with 1,9-DDF had only relatively minor effects on rod photoreceptor viability but strongly affected cones. This indicates that the RPE is mainly shuttling glucose through to photoreceptors, especially to cones, and this function does not seem to be impaired by FCCP treatment.

      (3) We found that enzymes required for Krebs-cycle and OXPHOS activity (i.e. citrate synthase, fumarase, ATP synthase γ) are predominantly expressed in photoreceptors but virtually absent from RPE (Figure 3D, see also answer to following question).

      (4) The density of mitochondria (i.e. the target for FCCP) is far lower in RPE than in photoreceptors, as evidenced also by the COX staining shown in Figure 1A. Hence, photoreceptors are far more likely to be hit by FCCP treatment than RPE cells.

      To accommodate the reviewer´s concern, we have now added a further comment into the discussion (Lines 440-442).

      Rev#1.7. While Figure 3D is interesting, it offers no significant insight into mechanisms as the enzyme levels are not being compared to control nor is mitochondrial fitness in these conditions being assessed, which would provide greater insight than just showing that these enzymes are present in the inner segments, which are known to be rich in mitochondria. Additionally, stating that the low ATP is secondary to decreased Krebs cycle activity and ox phos based on merely ATP levels is not supported by metabolite levels minus citrate nor ox phos enzyme levels or oxygen consumption. Also, citrate is purported to be decreased in the table in Figure 2 in the no RPE condition; however, Supplemental Figure 2 demonstrates this change is not significant then the same data is presented in Supplemental Figure 3 and it is statistically significant again. Why the difference in data and why is the same data being shown multiple times?

      Response: The immunostaining shown in Figure 3D shows the in vivo retina, or in other words the localization of enzymes in the native situation. Since this may not have been obvious in the previous manuscript version, we have added a corresponding comment to the legend of Figure 3 (Line 806). The localization of the Krebs-cycle/OXPHOS enzymes citrate synthase, fumarase, and ATP synthase mainly to photoreceptors, but not (or much less) to RPE, is another piece of evidence supporting the idea that OXPHOS is predominantly performed by photoreceptors (see also answer to previous question Rev#1.6).

      The decreased ATP levels (together with citrate, aspartate, NAA) shown in Figure 3 in the no RPE group, are an indication that photoreceptor Krebs-cycle activity may be decreased but not abolished in the absence of RPE. Importantly, GTP levels are not reduced in the no RPE group (Figure 2). Since large amounts of GTP can only by synthesized by either SUCLG-1 in the Krebs-cycle or by NDK-mediated exchange with ATP, the most plausible interpretation is that Krebs-cycle dependent ATP-synthesis was decreased in the no RPE situation, but that the (mini) Krebs-cycle or Cahill-cycle, notably the step from succinyl-CoA to succinate, was running. Since there is no RPE in this group, this strongly suggests important Krebs-cycle/OXPHOS activity in photoreceptors where the majority of the corresponding enzymes are located (see above).

      We thank the reviewer for pointing out that the information on group comparisons may not have been presented with sufficient clarity. In the figures mentioned by the reviewer the data is shown and compared in different contexts: the table in Figure 2B and the data in Figure S3 (now renumbered to Figure S5) refer to two-way comparisons of treatment condition to control, to elucidate individual treatment effects. Meanwhile Figure S2 (now supplementary Figure S3) refers to a 5-way comparison for a general overview that puts all five groups in context with each other. These differences in comparisons and normalization to the respective common standards entail the use of different statistical tools, resulting in different p-values. The statistical testing approaches and thresholds are now disclosed in the figure legends, and additionally in the SI Materials and Methods section (Lines 145-155).

      Rev#1.8. When were the ex vivo samples taken for metabolomics, and if taken when significant TUNEL staining and cell death have occurred, are the changes in metabolism due to cell death or a true indication of differential metabolism? Furthermore, it is unclear if the metabolomics samples included the RPE or not. Considering these treatments will affect most cells in the retina and the RPE, which is included in the ex vivo samples, it is difficult to ascertain that these changes are secondary to the effects on photoreceptors alone.

      Response: The samples for metabolomics included the RPE (except for the no RPE condition) and were taken at the same time as the tissues for histological preparations and TUNEL assays, i.e. they were all taken at post-natal day 15. This has now been clarified in the SI Materials and Methods section (Lines 108-110).

      We cannot entirely exclude an effect of ongoing cell death caused by the different drug treatments on the retinal metabolome. However, since in the experimental treatments cell death was still comparatively low (even in the FCCP condition, overall cell death was only around 10% of the total retina), and the metabolomic analysis considered the entire tissue, the impact of cell death per se on the total metabolome will be comparatively minor (≤ 10%, i.e. within the typical error margin of the metabolomic analysis).

      As mentioned above, the drug treatments should in principle affect all retinal cells at the same time. However, only cells that express the drug targets (i.e. 1,9-DDF targets GLUT1 in RPE cells, Shikonin targets PKM2 in photoreceptors; cf. Figure 1A) should react to the treatment. Even FCCP, in the paradigm employed, will only affect those cells that rely heavily on OXPHOS. Our data indicates that while this is almost certainly the case for rods; cones, RPE cells, and essentially all of the inner retina, are not affected by FCCP treatment, strongly suggesting that OXPHOS is of minor importance for these cell populations.

      Rev#1.9. Why were the FCCP and no RPE groups compared? If they have similar metabolite patterns as noted in Figure 2, would that suggest that FCCP's greatest effect is on the ox phos of RPE and the metabolite patterns are secondary to alterations in RPE metabolism? Also, the increase in citrate and decrease in NAD may be related to effects on RPE mitochondrial metabolism when comparing these groups, and the disruption of RPE metabolism may then result in PARP staining of photoreceptors.

      Response: The reason for the pair-wise comparison of the no RPE and FCCP groups initially was indeed the similarity in metabolite patterns. This was now rephrased accordingly in the results section “Photoreceptors use the Krebs-cycle to produce GTP” (Lines 218-219). The interpretation that the reviewer proposed here is interesting, but does not conform with the data analysis of this and other group comparisons.

      Instead, the similarity between the metabolic patterns found in the no RPE and FCCP groups further supports the idea that a lack of RPE decreases retinal OXPHOS and increases glycolysis. This interpretation is based on the following observations:

      (1) Mitochondrial density in the RPE is far lower than in photoreceptors (see COX staining in Figure 1A), thus quantitatively the metabolite pattern caused by a disruption of OXPHOS (via FCCP treatment) will be dominated by metabolites generated by photoreceptors. For the same reason the depletion of retinal NAD+, and the concomitant increase in photoreceptor PAR accumulation after FCCP treatment, is unlikely to be due to changes in RPE.

      (2) Similarly, citrate synthase (CS) was found to be almost exclusively expressed in photoreceptor inner segments, with little expression in RPE (Figure 3D). Hence, the quantitative increase of citrate levels after FCCP treatment can only originate in photoreceptors.

      (3) The comparison of the control (with RPE) against the no RPE group suggested an increase in (aerobic) glycolysis in the absence of RPE, evidenced notably by a retinal accumulation of lactate, BCAAs, and glutamate (Figure 3A). The very same metabolite pattern is seen for the FCCP treatment (Figure 1B) indicating a marked upregulation of glycolysis (Figure 6C). The latter observation suggests that photoreceptors, after disruption of OXPHOS switch to an exclusively glycolytic metabolism, which, however, rods cannot sustain (Figure 1C, D).

      (4) Glucose consumption and lactate release is increased in the no RPE group vs. control (new Supplementary Figure 4). A similar increase in glucose consumption and lactate production is seen in the FCCP group suggesting that also the no RPE situation disrupts OXPHOS in photoreceptors.

      Rev#1.10. The conclusions being reached are difficult to interpret secondary to the experimental procedures and the fact that the treatments are not cell-specific and RPE is included with the neuroretina as well. Likewise, stating FCCP is altering the Krebs cycle in the neuroretina is difficult to believe as there are no changes in the Krebs cycle when compared to the control, which also has RPE.

      Response: We agree with the reviewer, that some of the conclusions may have been somewhat speculative. Accordingly, we have toned down our conclusions in several instances in the text, notably in abstract, introduction, and discussion.

      When it comes to Krebs cycle intermediates a key limitation of our study is indeed the lack of carbon-tracing and metabolic flux analysis as noted by the reviewers, a limitation that we now highlight more strongly in the discussion of the revised manuscript (Lines 545-549). While it is highly probable that the flux of Krebs cycle intermediates is altered by FCCP, our steady-state data does not show significant changes in the metabolites citrate, fumarate, and succinate. However, our study does show a highly significant decrease in GTP levels, which as explained above, is a key indicator of Krebs cycle activity/inactivity. Moreover, while GTP levels were reduced also in the no RPE group, GTP was still significantly higher in the no RPE group compared to the FCCP treatment. Our interpretation of this finding is that there is Krebs-cycle/OXPHOS activity in the neuroretina, which is abolished by FCCP.

      Rev#1.11. Supplemental Figure 4C and D states that GAC inhibition affected only photoreceptors, but GAC is expressed throughout the retina and so the inhibition is altering glutamine-glutamate homeostasis throughout the retina. Clearly, based on histology, one can see that the architecture of the retina, especially at the highest dose, is lost likely because all cells are being affected. So it is not photoreceptor-specific and even at low doses one can see that the inner retina is edematous. Moreover, with such a high amount of TUNEL staining in the ONL, are rods more affected than cones?

      Response: In our hands the immunostaining for Glutaminase C (GAC) labelled predominantly cone inner segments, the OPL, and perhaps bipolar cells (Figure S1A). The deleterious effects mentioned by the reviewer are only seen at the highest concentration of the GAC inhibitor compound 968. This concentration (10 µM) is 100-fold higher than the dose that produces a significant loss of cones in the outer retina (0.1 µM). We therefore think that this data points to the extraordinary reliance of cones on glutamine and glutamate. As can be seen from the images (Figure S4C) illustrating the effects of 0.1 and 1 µM Compound 968 treatment, the ONL thickness is not significantly reduced by the GAC inhibitor. This strongly indicates that at these doses the rods are not affected by GAC inhibition.

      Rev#1.12. The no RPE vs 1,9 DDF data may be interpreted as preventing glucose transport in the RPE increases BCAA catabolism by the RPE, which has been shown to utilize BCAA in culture systems. To this end, when the RPE is not present, the BCAA is increased as compared to the control with RPE.

      Response: Our original interpretation of this data was that after GLUT1 inhibition and a correspondingly reduced retinal glucose uptake, the retina switched to an increasing use of anaplerotic substrates, including BCAAs. This is supported by the concomitant upregulation of the Cahill-cycle product alanine and the mini-Krebs-cycle product N-acetylaspartate (NAA). Yet, we agree with the reviewer that BCAAs could also be consumed by the RPE. We have now changed our conclusion at the end of the results chapter “Reduced retinal glucose uptake promotes anaplerotic metabolism“ to also highlight this possibility (Lines 261-262).

      Rev#1.13. It is unclear why so much effort is comparing the no RPE group to the treatment groups and not comparing the control group to the different treatment groups.

      Response: Previous studies – including the seminal studies of Otto Warburg from the early 1920s – had always used retina without RPE. This “no RPE” situation is therefore something of a reference for our entire study, which is why we dedicated more effort to its analysis. We have now inserted a corresponding remark into the manuscript (Lines 182-184).

      Rev#1.14. The conclusions are significantly overstated especially with regards to rods versus cones as these are not cell-specific treatments. For example, the control vs 1,9 DDF vs FCCP clearly shows that there is mitochondrial dysfunction due to decreased NAD, increased AMP/ATP ratio, decreased Asp but increased Gln, and a compensatory increase in lactate production.

      Response: We agree with the reviewer and have tried to phrase our statements in more measured fashion. Notably, we have toned down our statements in the title, abstract, results, discussion, and several of the subchapter headings.

      Rev#1.15. While metabolic conclusions are drawn on serine/lactate ratio, this ratio is driven by the drastic changes in lactate and not so much serine in the treatment conditions as it was rather stable. Likewise, substrates beyond glucose have the potential to fuel the TCA cycle and make GTP via SUCLG1, such as fatty acids, other AAs, etc. Therefore, this ratio may not tell the entire story about anaplerotic metabolism. Furthermore, knowing that RPE utilize BCAAs to fuel their TCA cycle, the no RPE condition may simply have increased BCAAs due to lack of metabolism by the RPE, which drives the GTP/BCAA ratio. To state that the neuroretina was utilizing BCAAs for anaplerosis is not well supported based on the current data. Similarly, what is to say that the GTP/lactate ratio in the no RPE situation is not driven by the fact that the RPE is no longer present to act as acceptor of retinal lactate production or that more glucose is reaching the retina since the RPE is not present to accept and utilize that produced. Glucose uptake was not assessed to further address these issues.

      Response: We agree with reviewer that metabolite ratios may not tell the full story underlying retinal metabolism however based on the robustness of using quantitative and highly reproducible NMR data, they are an important part of the metabolomics toolbox. The reviewer correctly observed that the changes in lactate levels are more dramatic than in serine. Still, also serine was significantly increased in the no RPE, 1,9-DDF, and Shikonin groups. Together with the lactate changes (same or opposite direction) the resulting serine/lactate ratios display marked alterations.

      When it comes to the supply of other potential energy substrates mentioned by the reviewer, i.e. fatty acids or amino acids other than BCAAs, these are only supplied in minimal amounts in the defined, serum free R16 medium (Romijn, Biology of the Cell, 63, 263-268, 1988) and – if used to any important extent – would be rapidly depleted by the retina. Thus, for a culture period of 2 days in vitro between medium changes these energy sources are not available and thus cannot be used by the retina.

      Our conclusion that the retina is using anaplerosis is based not only on the observations made in the no RPE group but also on, for instance, the metabolite ratios seen in the 1,9-DDF treatment group. In this group decreased glycolytic activity may correspond to increased serine synthesis and anaplerosis.

      As far as glucose uptake is concerned, we have analysed the medium samples at P15 (equivalent to the retina tissue collection time point) and now present data that addresses this question more directly via the consumption of glucose from and release of lactate to the culture medium (New Supplementary Figure 4C, D). This new dataset provides another independent observation showing that:

      (1) Glucose consumption/lactate release (i.e. aerobic glycolysis) is high in the no RPE situation but low in the control situation. In other words, retinal aerobic glycolysis is most likely stimulated by the absence of RPE.

      (2) 1,9-DDF treatment decreases glucose consumption/lactate release as would be expected from a GLUT1 blocker. Since ATP and GTP production are high nonetheless, this indicates that other substrates (i.e. anaplerosis) were used for retinal energy production, in agreement with the analysis shown in Figure 6C.

      (3) The FCCP treatment, which disrupts oxidative ATP-production, increases glucose consumption/lactate release in way similar to the no RPE situation. Yet, the no RPE retina can still generate sizeable amounts of GTP but not ATP. Together, this provides further evidence that neuroretinal OXPHOS is decreased in the absence of RPE.

      Rev#1.16. The evidence for the mini-Krebs cycle is intriguing but weak considering it is based on certain enzymes being expressed in the photoreceptors, which had already been shown to be present in other publications, and a single ratio of metabolites that is increased in FCCP. One would expect this ratio to be increased under FCCP regardless. There is no stable isotope tracing with certain fuels to confirm the existence of the mini-Krebs cycle.

      Response: We thank the reviewer for this suggestion. We agree that our evidence for the mini-Krebs-cycle (and the Cahill-cycle) may be to some extent circumstantial and additional technologies would help to obtain further supportive data. Still, here we would like to invite the reviewer to a thought experiment where he/she could try and interpret our data without considering the Cahill- or the mini-Krebs-cycle. At least we ourselves, when we engaged into such thought experiments, were unable to explain the data observed without these alternative energy-producing cycles. Most notably, we were unable to explain the strong accumulation of either alanine or N-acetyl-aspartate (NAA) when only considering glycolysis and (full) Krebs-cycle metabolism. Of course, this may still be considered “weak” evidence, and we expect that future studies including complementary technologies will either confirm or expand our interpretation of the existing data set.

      The suggestion to perform stable isotope-labelled tracing with potential alternative fuels (e.g. glutamate, glutamine, pyruvate, etc.) is very attractive indeed. While such studies are likely to shed further light on the metabolic pathways proposed, this will entail very extensive experimental work, with multiple different conditions and concentrations and variety of analysis methods that is currently not feasible (e.g. a 1.7 mm NMR probe equipped with a 15N channel) as an extension of the present manuscript. Nevertheless, we will certainly consider this approach for future follow-up studies once such techniques are available and will screen for suited collaboration partners. A corresponding comment on such future possibilities has now been inserted into the discussion (Lines 545-549).

      Rev#1.17. The discussion does not mention how this data contradicts a recent in vivo study looking at Glut1 knockout in the retina (Daniele et al. FASEB. 2022) or previous in vivo studies that suggest cones may be less sensitive to changes in glucose levels (Swarup et al. 2019). This is a key oversight.

      Response: We thank the reviewer for pointing this out. We now included these studies in the revised discussion in a new subchapter on the expression of glucose transporters in the outer retina (Lines 454-472). For a critical review of the Daniele et al., 2022 study please also see our more detailed response to question Rev#3.13 below.

      Rev#1.18. GAC is expressed in more than just cones so making cell-specific statements regarding fuel utilization is not well supported.

      Response: Our immunostaining for GAC revealed a strong expression in cone inner segments (Figure S1A3). While this does not exclude (relatively minor) expression in other retinal cell types, cones are likely to be more reliant on GAC activity than other cell types. See also answer above.

      Rev#1.19. Suggesting that rods utilize the mini-Krebs cycle based on AAT2 being seen in the inner segments without at least co-staining for RHO or PNA is weak evidence for such a cycle. AAT looks to be expressed in the inner segments of all photoreceptors.

      Response: We have taken up this suggestion from the reviewer and now provide an additional co-staining for AAT1 and AAT2 with rhodopsin. Note that in response to a pertinent comment from Reviewer #3 we have changed the abbreviation for aspartate aminotransferase from “AAT” to the more commonly used “AST” throughout the manuscript.

      New images showing a co-staining for AST1 and AST2 with rhodopsin now replace the former image set in Figure 7D. In brief, the new images show the expression of both AST1 and AST2 across the retina, with, notably an expression in the inner segments of photoreceptors but not in the outer segments, where rhodopsin is expressed.

      Reviewer #3 (Recommendations For The Authors):

      Rev#3.1. The staining for the glucose transporters GLUT1 and GLUT3 does not reflect what has previously been published by two different groups that were validated by cell-specific knockout mice. As mentioned by the author GLUT1 and GLUT3 have differences in transport kinetics, which would affect their metabolism. Therefore, the lack of GLUT1 in photoreceptors would suggest that photoreceptor metabolism is not faithfully replicated in this system. This difference from the previous literature should be discussed in the discussion.

      Response: As the reviewer pointed out, the expression of GLUT1 in the retina is somewhat controversial, with much older literature showing expression on the RPE, while some more recent studies claim GLUT1 expression in photoreceptors. For a brief discussion of our GLUT1 immunostaining please see also our answer to question Rev#1.3 above.

      Although the retinal expression of GLUT1 was besides the focus of our study, we feel we must address this point in more detail: In the brain the generally accepted setup for GLUT1 and GLUT3 expression is that low-affinity GLUT1 (Km = 6.9 mM) is expressed on glial cells, which contact blood vessels, while high-affinity GLUT3 (Km = 1.8 mM) is expressed on neurons (Burant & Bell, Biochemistry 31:10414-20, 1992; Koepsell, Pflügers Archiv 472, 1299–1343, 2020). This setup matches decreasing glucose concentration with increasing transporter affinity, for an efficient transport of glucose from blood vessels, to glial cells, to neurons. In the retina, the cells that contact the choroidal blood vessels are the tight-junction-coupled RPE cells. As shown by us and many others, RPE cells strongly express GLUT1 (cf. Figure 1A-3.). To warrant an efficient glucose transport from the RPE to photoreceptors, photoreceptors must express a glucose transporter with higher glucose affinity than GLUT1. We show that this is indeed the case with photoreceptors expressing GLUT3 (cf. Supplemental Figure 1-5.). While a part teleological explanation does not per se prove that our data is correct, at least our data is plausible. In contrast, the glucose transporter setup sometimes claimed in the literature is biochemically implausible, i.e. for the flow of metabolites (glucose) to go against a gradient of transporter affinities, and we are not aware of an example of such a setup occurring anywhere in nature.

      However, at this point we cannot exclude low levels of GLUT1 expression on Müller glia cells or even photoreceptors. This expression could, for instance, be relevant in cases where cells were shuttling excess glucose – perhaps produced through gluconeogenesis – onwards to other retinal cells. Still, GLUT1 expression can only be minor when compared to RPE since a major expression would destroy the glucose affinity gradient (see above) required for efficient glucose shuttling into the energy hungry photoreceptors.

      To address this request by the reviewer (and also reviewer #1) we now discuss the question of glucose transporter expression in the outer retina in a new subchapter of the discussion (Lines 454-472).

      Rev#3.2. Photoreceptor metabolism and aerobic glycolysis are tied to photoreceptor function, as demonstrated by Dr. Barry Winkler. The authors should provide data or mention (if previously published) about photoreceptor OS growth and function in this system.

      Response: The studies of Barry Winkler (e.g. Winkler, J Gen Physiol. 77, 667-692, 1981) confirmed the original work of Otto Warburg and expanded on the idea that the neuroretina was using aerobic glycolysis. Importantly, Winkler used a very similar experimental setup as Warburg has used, namely explanted rat retina without RPE. In light of our data where we compare metabolism of mouse retina with and without RPE – where retina cultured without RPE confirms the data of Warburg and Winkler – it appears most likely that the purported aerobic glycolysis occurs mostly in the absence of RPE but only to a lower extent in the native retina.

      Photoreceptor outer segment outgrowth is somewhat slower in the organotypic retinal explant cultures compared to the in vivo situation (cf. Caffe et al., Curr Eye Res. 8:1083-1092, 1989 with LaVail, JCB 58:650-661, 1978; see also answer to reviewer #1). Importantly, organotypic retinal explant cultures and their photoreceptors are fully functional and remain so for extended periods in culture (Haq et al., Bioengineering 10:725, 2023; Tolone et al., IJMS 24:15277, 2023). This information has now been added to the manuscript discussion section, into the new subchapter “The retina as an experimental system for studies into neuronal energy metabolism” (Lines 367-395).

      Rev#3.3. It is unclear from the description of the experiment in both the results and methods if 1,9DDF, Shikonin, and FCCP were added to both apical and basal media compartments or one or the other and should be specified. The details of what was on the apical compartment would be helpful, as the model is supposed to allow for only nutrients from the basal compartment (as indicated by the authors themselves). Is the apical compartment just exposed to air? How does this affect survival?

      Response: In organotypic retinal explant cultures the RPE rests on the permeable culturing membrane such that the basal side is contact with the membrane and the medium below (far schematic drawing see Figure S1B), while the apical side is covered by a thin film of medium created by the surface tension of water (Caffe et al., Curr Eye Res. 1989; Belhadj et al., JoVE, 2020). This thin liquid film ensures sufficient oxygenation and is an important factor that allows the retinal explant to remain viable for several weeks in culture. If the retinal cultures were submerged by the medium, their viability – especially that of the photoreceptors – would drop dramatically and would typically be below 3-5 days. Therefore, in the retinal organotypic explant cultures used here, the nutrients and the drugs applied do indeed reach the outer retina from the basal side, i.e. similar as they would in vivo.

      To address this question from the reviewer, corresponding clarifications have been inserted into the SI Materials and Methods section (Lines 64-66).

      Rev#3.4. As the metabolomic data obtained was quantitative, several metabolites discussed should be analyzed in terms of ratios, for example, Glutathione and glutathione disulfide should be reported as a ratio. In addition as ATP, ADP, and AMP were measured, they can used to calculate the energy charge of the tissue.

      Response: We thank the reviewer for these suggestions and have created corresponding graphs for GSH / GSSG ratio and energy charge. These new graphs have now been added to the SI datasets, to the new Supplementary Figure 4. To accommodate other requests from the Reviewers, this new Figure also contains additional new datasets on glucose and lactate concentrations (see further comments above and below). Please note that all later SI Figures have been renumbered accordingly.

      In brief, the ratios for GSH/GSSG show no significant changes between control and the different experimental groups. Meanwhile, the adenylate energy charge of the retinal tissues show a significant decrease in the energy charge for the Shikonin group and the FCCP group. Note that in the new Supplementary Figure 4A, the dotted lines indicate the energy charge window typical for most healthy cells (0.7 – 0.95).

      Rev#3.5. I think a missed opportunity when discussing the possible taurine/hypotaurine shuttle would be the impact on the osmosis of the subretinal space as taurine has been hypothesized as a major osmolyte.

      Response: This is another interesting recommendation from the reviewer. To address this point, we have now introduced a corresponding paragraph and references in the discussion of the manuscript (Lines 503-504; 512-514).

      Rev#3.6. In Figure 3, the distribution of these enzymes should also be studied under the no RPE condition as the culture treatment took several days for these metabolic changes to occur.

      Response: The images shown in Figure 3D are from the in vivo retina. Since this may not have been very clear in the previous manuscript version, we have now added a corresponding explanation to the legend of Figure 3. As far as we can tell, the expression and localization of neuroretinal enzymes does not change in cultured retina, during the culture period (compare Figure 1A with Supplementary Figure S1C). However, when it comes to the metabolite taurine its production (localization) changes dramatically in the no RPE situation where taurine is essentially undetectable by immunostaining (not shown but see metabolite data in Figure 2A, Figure 3A).

      Rev#3.7. In Figures 4 and 5, it is unclear why the experimental groups were not compared to the control and requires further explanation. Furthermore, the authors should justify the concentrations of drugs used as the cell death could have risen from toxicity to the drugs and not due to disruption of metabolism.

      Response: The reviewer is right, the rationale for these comparisons may not have been laid out with sufficient clarity. In Figure 4 the no RPE and FCCP groups are compared because both groups showed similar metabolite changes towards the control situation. The no RPE to FCCP comparison thus focussed on the details of the – at first seemingly minor – differences between these two groups. This has now been clarified in the corresponding part of the results (Lines 218-219).

      In Figure 5A, B we compare the no RPE and 1,9-DDF groups with each other, notably because the data obtained seemingly contradicted our initial expectation that these two groups should show similar metabolite patterns. Also here, we have now inserted an additional explanation for this choice of comparisons (Lines 252-253).

      In Figure 5C, D we compare the Shikonin and FCCP groups with each other. The idea behind this comparison was that in the 1st group glycolysis was blocked while in the 2nd group OXPHOS was inhibited, or in other words here were compared what happened when the two opposing ends of energy metabolism were manipulated in opposite directions. This reasoning is now given in the results section (Lines 265-268).

      As far as the choice of drugs and concentrations is concerned, we used only compounds that have been extremely well validated through up to five decades of scientific research (see initial response to Reviewer #2 above). We therefore are confident that at the concentrations employed the results obtained stem from drug effects on metabolism and not from generic, off-target toxicity. Then again, as we show, prolonged (i.e. 4 days) block of energy metabolism pathways does cause cell death.

      Rev#3.8. In line 203, the authors discuss GTP as being primarily a mitochondrial metabolite, however, photoreceptors would require a localized source of GTP synthesis in the outer segments as part of phototransduction, and therefore GTP in photoreceptors cannot be a mitochondrial-specific reaction in photoreceptors. Furthermore, the authors mentioned NDK as being a possible source of GTP, but they do not show NDK localization despite it being reported in the literature to be localized in the OS.

      Response: The question as to the source of GTP in photoreceptor outer segments is indeed highly relevant. For GTP production in mitochondria see the answer to the next question below (Rev#3.9). An early study showed nucleoside-diphosphate kinases (NDK) to be expressed on the rod outer segments of bovine retina (Abdulaev et al., Biochemistry 37:13958-13967, 1998). More recently NDK-A was shown to be strongly expressed in photoreceptor inner segments (Rueda et al., Molecular Vision 22:847-885, 2016). We now refer to both studies in the results section of the manuscript (Line 227-228).

      Rev#3.9. In the "Impact on glycolytic activity, serine synthesis pathway, and anaplerotic metabolism" section, the authors claim in the no RPE group glycolytic activity was higher due to a depressed GTP-to-lactate ratio. However, this reviewer is under the impression that GTP production in photoreceptors is not mitochondrial specific, so this ratio doesn't make sense (I could be mistaken, however). A better ratio would have been pyruvate/lactate or glucose/lactate when discussing increased glucose consumption.

      Response: We appreciate the reviewers’ comment, yet we do indeed believe we can show that GTP-production in our experimental context is mainly mitochondrial. As explained in the manuscript results section (“Photoreceptors use the Krebs-cycle to produce GTP”), there are essentially only two possibilities for a photoreceptor to produce sizeable amounts of GTP. In the mitochondria via SUCLG1 – i.e. an enzyme highly expressed in photoreceptor inner segments (Figure 5D) – and the cytoplasm via NDK from excess ATP. The claim about the depressed GTP-to-lactate ratio in the no RPE situation takes this into account. Importantly, since in the no RPE situation ATP-levels are significantly lower than GTP, here GTP can only be produced via SUCLG1 and OXPHOS. Moreover, this contrasts with the FCCP group where mitochondrial OXPHOS is disrupted and both ATP and GTP are depleted.

      As far as ratios with pyruvate and glucose are concerned, we agree that these could potentially be very interesting to analyse. Unfortunately, in our retinal tissue 1H-NMR spectroscopy- based metabolomics analysis the levels of both pyruvate and glucose were below the detection limits which likely reflects their rapid metabolic turnover (cf. table S1). While this might be attributable to the marked consumption of these metabolites within the tissue, it does not allow for us to calculate the suggested ratios to lactate. Then again, in the supernatant medium which was collected at the same time point as the retina tissue, we can readily detect glucose and lactate levels, for this data please see the new Supplementary Figure 4.

      Rev#3.10. Aspartate aminotransferase should be abbreviated as AST, as it is more commonly noted.

      Response: In response to this comment from the reviewer, we have changed the abbreviation for aspartate aminotransferase from AAT to AST throughout the manuscript.

      Rev#3.11. In the discussion the assumptions of the ex vivo culture systems should be clearly stated. One that was not mentioned, but affects the implications of the data, is that the retinas used in this study are from the developing mouse eye. Another important assumption that was made in this paper was that the changes in retinal metabolism were due to photoreceptors even though the whole neural retina was included.

      Response: The reviewer is correct; we have added these two points to the discussion section of the manuscript. Notably, we now included a new subchapter “The retina as an experimental system for studies into neuronal energy metabolism” (Lines 367-395) to present different in vitro and in vivo test systems.

      Rev#3.12. Starting at line 347: As the authors know, the RPE has been shown to be highly reliant on mitochondrial function, and disruption of RPE mitochondrial metabolism leads to photoreceptor degeneration (numerous papers have shown this). Furthermore, the lower levels of lactate detected in their explants when RPE was present suggests that lactate is actively transported out of the neural retina by the RPE.

      Response: The reviewer is right about lactate being exported from the retina to the blood stream in vivo, or, in our in vitro study, to the culture medium. In the new dataset showing glucose and lactate concentrations in the culture medium (new Supplementary Figure 4C, D), we show that without RPE (no RPE group) and the retina releases more significantly lactate into the medium than control retina with RPE. At the same time the no RPE retina consumes more glucose than control retina.

      Rev#3.13. Line 360: Again, in mouse photoreceptors (by bulk RNAseq and scRNAseq), there is no GLUT3 expression (encoded by slc2a3). It was also recently shown by Dr. Nancy Philp's lab that rod photoreceptors express GLUT1, encoded by slc2a1 (PMCID: PMC9438481). The differences reported in this study and previous studies should be discussed.

      Response: Although this comment may not make us very popular, we are somewhat sceptical of RNAseq data (especially single cell RNAseq) since the underlying methodology – at the current level of technological development – is notoriously unreliable when it comes to the assessment of low abundance transcripts and suffers from apoor batch reproducibility, compared to NMR based metabolomics. Due to methodological constraints RNAseq have a propensity to display erroneously high or low expression. Moreover, and perhaps even more important, dissociated cells in scRNAseq studies undergo rapid gene expression changes that can significantly falsify the image obtained (Rajala et al., PNAS Nexus 2:1-12, 2023). Finally, it cannot be emphasized enough that mRNA expression profiles DO NOT equate protein expression and there are numerous examples for divergent expression profiles when mRNA and protein is compared.

      The Daniele et al. study (FASEB Journal 36:e22428, 2022; PMCID: PMC9438481) used in situ hybridization to study the mRNA expression of GLUT1 (slc2a1) and GLUT3 (slc2a3). In line with our comment just above, the Daniele et al. study may provide for an example of divergence between mRNA and protein expression, since it seemingly showed only minor expression of GLUT1/slc2a1 in the RPE, i.e. precisely in the one cell type that is well-known for its very strong GLUT1 protein expression.

      Furthermore, Daniele et al. used a conditional GLUT1 knock-out in photoreceptors induced by repeated Tamoxifen injections. The photoreceptor GLUT1 knock-out led to a relatively mild phenotype with only about 45% of the outer nuclear layer lost over a 4-months time-course. This is in stark contrast with the FCCP or the 1,9-DDF treatment, which would ablate nearly all rod photoreceptors in under one or two weeks, respectively.

      As a side note, Tamoxifen is an oestrogen receptor antagonist (with partial agonistic behaviour) with a long history of causing retinal and photoreceptor damage. Notably, oestrogen receptor signalling is important for maintaining photoreceptor viability (Nixon & Simpkins, IOVS 53:4739-47, 2012; Xiong et al., Neuroscience 452:280-294, 2021). Therefore, the relatively minor effects of the conditional GLUT1 KO in photoreceptors found in Daniele et al. may have been confounded by direct tamoxifen photoreceptor toxicity. On a wider level, this possible confounding factor related to the use of Tamoxifen points to general problems associated with certain forms of genetic manipulations.

      We now mention the controversy around the expression of glucose transporters in the retina, including the Daniele et al. study in a new subchapter of the discussion on "Expression of glucose transporters in the outer retina” (Lines 454-472).

      Rev#3.14. Lines 370-372: FCCP caused a strong cell death phenotype in rods, however under stress rods upregulate the secretion of RdCVF, which leads to cone photoreceptor survival by the upregulation of aerobic glycolysis in cones. The data should be re-interpreted in the context of this previous literature.

      Response: We thank the reviewer for this comment; however, we could not find a reference that would state that “…under stress rods upregulate the secretion of RdCVF”. What we did find was a reference stating that similar factors such as thioredoxins (TRX80) are secreted from blood monocytes under stress (Sahaf & Rosén, Antioxid Redox Signal 2:717-26, 2000). However, we consider these cells to be too dissimilar to rod photoreceptors to warrant a corresponding comment. Moreover, the research group who discovered RdCVF originally showed that rod-secreted RdCVF cannot prevent cone degeneration if the corresponding Nxnl1 gene is knocked-out in cones, arguing for a cell-autonomous mechanism of RdCVF -dependent cone protection (Mei et al., Antioxid Redox Signal. 24:909-23, 2016).

      Since it is very possible that we may have missed the correct reference(s), we would welcome further guidance by the reviewer.

      Rev#3.15. Line 374: 1,9-DDF caused a 90% loss of cones, however, previous studies by Dr. Nancy Philp have shown glucose deprivation in the outer retina affects primarily rod photoreceptors. The differences should be discussed.

      Response: We thank the reviewer for directing us to these studies. As mentioned above (Rev#3.13.) the Daniele et al. 2022 study yielded only relatively mild effects for a rod-specific conditional GLUT1 KO on photoreceptor viability. Similarly, in an earlier study (Swarup et al., Am J Physiol Cell Physiol. 316: C121–C133, 2019) the Philp group found that also a GLUT1 KO in the RPE caused only a minor phenotype in the photoreceptor layer. We would argue that if glucose, and by extension aerobic glycolysis, were indeed of major importance for (rod) photoreceptor survival, the degenerative effect of these genetic GLUT1 ablations should have been devastating and should have destroyed most of the outer retina in a matter of days. The fact that this was not seen in both studies is another piece of independent evidence that rod photoreceptors do not rely to any major extent on glycolytic metabolism.

      The two studies from the Philp lab (Swarup et al., 2019; Daniele et al., 2022) are now cited in the discussion (Lines 417-419 and 458-460).

      Rev#3.16. Line 375: Yes Dr. Claudio Punzo and Dr. Leveillard Thierry along with other groups have shown glycolysis is required to maintain cone survival when under stress, however, the authors should emphasize that it is under stress that this is observed.

      Response: In response to this comment we have now specifically extended our corresponding remark in the discussion of the manuscript (Lines 446-447).

      Rev#3.17. The section "Cone photoreceptors use the Cahill-cycle". The presence of ALT in photoreceptors was surprising and suggests alternatives to the Cori reaction. However, previous measurements of glucose and lactate from localized in vivo cannulation of animal eyes suggest the majority of glucose taken up by the retina is released back to the blood as lactate. Again, this section should discuss this idea in terms of the previous literature.

      Response: Here, we believe the reviewer is referring to studies performed in the late 1990s where, in anaesthetized cats, the lactate concentration in blood samples obtained from choroidal vein cannulation was compared against that in blood samples obtained from femoral arteries (Wang et al., IOVS 38:48-55, 1997). We note that a more relevant in vivo measurement of retinal glucose consumption and lactate production would likely require the simultaneous cannulation of the central retinal artery (CRA) and the central retinal vein (CRV). This would need to be combined with repeated (online) blood sampling, drug applications, and subsequent metabolomic analysis. We are not aware of any in vivo studies where such procedures have been successfully performed and further miniaturization and increased sensitivity of metabolomic analytic equipment will likely be required before such an undertaking may become feasible. Even so, such studies may not be feasible in small rodents (mice, rats) and may instead require larger animal species (e.g. dog, monkey) to overcome limitations in eye and blood sample size.

      We have now extended the discussion of our manuscript with a new subchapter on “The retina as an experimental system for studies into neuronal energy metabolism”. Within this new subchapter we now present two different in vivo experimental approaches that addressed retinal energy metabolism (Lines 376-384). Moreover, we now present new data on retinal lactate release to the culture medium, showing, for instance, a strong increase in lactate release in the no RPE condition compared to control (new Supplementary Figure 4).

      Rev#3.18. Lines 431-433: The study cited suggested that the mitochondrial AST was detected in other cells, in agreement with the data shown. However, the authors' statements in this section are misleading as they do not take into consideration the contribution of AST from other cell types.

      Response: The reviewer is right, we found both AST1 and AST2 to be expressed not only in photoreceptor inner segments but also in the inner retina, especially in the inner plexiform layer (new Figure 6D). Since this might indicate mini-Krebs-cycle activity also in retinal synapses, we have added a corresponding comment to the discussion (Lines 540-543).

      Grammatical and wording fixes:

      Rev#3.19. Line 98 - "the recycling of the photopigment, retinal."

      Response: We have inserted a comma after “photopigment”.

      Rev#3.20. Results section and Figure 1 start without providing context for the model system where staining is being done.

      Response: We have added this information to the beginning of the results section (Lines 105-106).

      Rev#3.21. Supplementary Figure 2 is not mentioned in the main text - there is no context for this figure.

      Response: Supplementary Figure 2 was originally referenced in the legend to Figure 2. We now mention supplementary figure 2 (now renumbered to supplementary figure S3) also in the main text, in the results section under “Experimental retinal interventions produce characteristic metabolomic patterns” (Line 148).

      Rev#3.22. Volcano plot in Supplementary Figures 3, 5, 6, 7, and 8 don't indicate what Log2(FC) is in reference to.

      Response: The log2 fold change (FC) is calculated as follows: log2 (fold change) = log2 (mean metabolite concentration in condition A) - log2 (mean metabolite concentration in condition B) where condition A and condition B are two different experimental groups being compared. This is now explained in the SI Materials and Methods (Lines 145-147) and indicated in abbreviated form in the figure legends. Please note that supplemental figures have now been renumbered due to the insertion of an additional, new Figure.

      Rev#3.23. Line 331 - –a“d allowed to analyze the..." ”s incorrect phrasing.

      Response: This phrasing was changed.

      Rev#3.24. Line 343 "c“cled" ”

      Response: This phrasing was changed.

      Rev#3.25. Line 446 is misworded.

      Response: This phrasing was changed.

      Technical questions:

      Rev#3.26. At what point after explant was the IHC done in Supplemental Figure 1? If early, but experiments are done later, there's’a chance things are more disorganized at the end of the experiment.

      Response: Staining and metabolomics analysis were both done at the end of each experiment, at the same time, at P15. This is now mentioned in the SI materials and methods section (Lines 67, 108-110).

      Rev#3.27. FCCP affects plasma membrane permeability, which is particularly critical in neurons that undergo repolarization and depolarization - –ow do we know FCCP on cell death via metabolism? See: https://www.sciencedirect.com/science/article/pii/S2212877813001233

      Response: The reviewer is correct, a significant permeabilization of cell membranes in general would likely cause extensive neuronal cell death, unrelated to a disruption of OXPHOS. However, the FCCP concentration used here (5 µM) is at the lower end of what was used in the mentioned Kenwood et al. study (Mol Metab. 3:114-123, 2014) and the effect on cell membrane permeability in tissue culture is likely to be rather small, as opposed to what was seen by Kenwood et al. in cultures of individual cells. This view is supported by the fact that in our FCCP treatments, we did not observe any significant increases of cell death in any retinal cell type (including RPE) other than in rod photoreceptors. Together with the fact that only photoreceptors strongly express Krebs-cycle/OXPHOS related enzymes, this strongly suggests that the FCCP effects seen by us were due to disruption of OXPHOS.

      Rev#3.28. Numerous metabolite comparisons are being made throughout the manuscript – what type of multiple hypothesis testing corrections are utilized? Only certain figures mention multiple hypothesis testing (e.g. Figure 6).

      Response: In general, in this manuscript we used two different statistical methods: 1) For two-group comparisons, we used an unpaired, two-tailed t-test, which reports a p-value with 95% confidence interval without additional multiple hypothesis testing (e.g. in Figure 2, Suppl. Figures 4, 6, 7, 8). 2) For multiple group comparisons we used a one-way ANOVA analysis with Tukey’s multiple comparisons post-hoc test (except suppl. Figure 9 where Fisher´s LSD post-hoc test was used). The information on which statistical test was used for what dataset is now given in the figure legends and in the SI Material and Methods section.

      Rev#3.29. For Figure 3, how do we know that the removal of RPE is causing the metabolite changes due to RPE-PR coupling? How do you rule out the fact that it isn’t just: I – a thicker physical barrier between media and the neural retina that is causing the changes, or II – removal of RPE from PR causes OS shearing and a stress response that alters metabolism?

      Response: We believe these concerns can be ruled out: The RPE cells are linked by tight junctions and are not “just a thicker barrier” but a barrier that is almost impermeable for most metabolites unless they are carried by specific transporters. Outer segment shearing via RPE removal would indeed be a concern if we had used adult retina. However, we explanted that retina at P9 when it does not possess any sizeable outer segments yet. As a matter of fact, photoreceptors grow out outer segments only after P9.

      Rev#3.30. While 1,9-dideoxyforskolin blocks GLUT1, it is known to have other effects, including on potassium channels. How do we know the effects of 1,9-dideoxyforskolin are specific to GLUT1? Utilizing a GLUT1 KO and showing no additional effects when adding 1,9-dideoxyforskolin would be helpful as a control.

      Response: This is a good suggestion from the reviewer. We note that this is technically not easy to achieve as it would require an RPE-specific knock-out that should be inducible at a given experimental time-point, in a quantitative manner. The study by Swarup et al. (see above Rev#3.13.) used an RPE specific knock-out that was, however, not inducible. Moreover, if the corresponding inducible knock-out animals could be generated, then the stochastic nature of the inducing treatment would probably affect only a limited number of cells within a given cell population. In our experimental context, a less than quantitative knock-out would significantly complicate interpretation of results, even to the point that no additional insight might be gained.

      Rev#3.31. The analysis in Figure 6, even with attempts to control drug treatments, is highly speculative. One really needs animals with predominately cones vs. predominately rods to do this analysis (e.g. with NRL mice).

      Response: The reviewer is right, the analysis shown in Figure 6 was an explorative approach to try and deduce features of rod and cone metabolism. This is now mentioned in the results section (Lines 282-284). Since the experiments were not initially intended to address such questions, by necessity the interpretations remain speculative. The comparison of mouse mutants in which there are either no cones (e.g. cpfl1 mouse) or no rods (e.g. NRL knock-out mouse) may allow to disentangle the metabolic contributions of rods and cones. We appreciate the suggestion from the reviewer and have now inserted a relating suggestion for future studies into the discussion section (Lines 450-452).

      Rev#3.32. Overall, much of the paper suggests intriguing pathways, but without C13 tracing or relevant genetic knock-outs, the pathways would have to be speculative rather than definitive.

      Response: We agree with the reviewer that further research, including 13C and 15N-tracing studies, will be necessary to evaluate which pathway(s) are used by what retinal cell type under what condition. Still, the high robustness and quantitative nature of the NMR metabolomics data allows us to draw pathway conclusions based on metabolites that are unique to specific pathways/cell types or using ratios. We now relate to the advantages of such carbon-tracing studies in the discussion of the manuscript (Lines 545-549).

      Stylistic suggestions:

      Rev#3.33. This is a very dense paper to read. It would be helpful for each figure to have a summary diagram of the relevant metabolite changes and how they fit together. Further, for those not metabolism-inclined, defining the mini-Kreb’s, Cahill, and Cori cycles and their brief implications at some point early in the manuscript would be helpful.

      Response: We have been thinking a lot about how we could add in the suggested summary diagrams into each figure. Unfortunately, whatever idea we contemplated would have significantly increased the complexity of the figures, while the actual benefit in terms of improved understandability was unclear.

      However, we did include the suggestion from the reviewer to present the terms Cori, Cahill-, and mini-Krebs-cycle already in the introduction and we hope that this has improved the understandability of the manuscript overall (Lines 79-92).

      Rev#3.34. More discussion about the step-by-step ways that the mini-Kreb’s reaction “uncouples” glycolysis from the Kreb’s cycle would be helpful. What do you mean by “uncouple” in this context?

      Response: We thank the reviewer for this suggestion. Uncoupling in this context means that glycolysis and Krebs cycle are not metabolically coupled to each other via pyruvate. Instead both pathways can run independently from each other and in parallel, as long as the Krebs-cycle uses glutamate, BCAAs or other amino acids as fuels. We now also address this point already in the introduction of the manuscript (Lines 87-90).

      Conceptual questions:

      Rev#3.35. As the proposal that PR undergo heavy amounts of OXPHOS is controversial, it would be helpful for the authors to review the literature on lactate production by the retina and what studies have shown previously about retina use of lactate, specifically lactate making its way into TCA cycle intermediates, suggesting OXPHOS, in PRs.

      Response: In response to this question we have added several new references to the introduction and discussion of the manuscript. The question of lactate production (aerobic glycolysis) vs. the use of OXPHOS is now discussed in Lines 77-81, Lines 367-384.

      Rev#3.36. Why would cones die more in the no RPE condition? The authors suggest this has something to do with GLUT1 expression on RPE and the transport of glucose to cones. Even if we accept that cones are highly glycolytic, loss of RPE should expose the neural retina to even more glucose in your experimental set-up.

      Response: This is a very interesting question from the reviewer. Indeed, loss of the RPE and blood-retinal barrier function should increase photoreceptor access to glucose, even more so if they are expressing high affinity GLUT3. In the discussion (Lines 420-424), we speculate that this may trigger the Crabtree effect, shutting down OXPHOS and causing the cells to exclusively rely on glycolysis. This, however, will likely not yield sufficient ATP to maintain their viability, so that they “starve” to death even in the presence of ample glucose. Since cones require at least twice as much ATP as rods, they may be more sensitive to a Crabtree-dependent shut-down of OXPHOS. However, if this speculation was correct then the question remains why the FCCP treatment, which abolishes OXPHOS more directly, does not cause cone death. Here, we again can only speculate that high glucose may have additional toxic effects on cones that are independent of OXPHOS. We now try to present this reasoning in the discussion (Lines 426-429).

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editors for their comments, as well as for the time dedicated to make useful suggestions that have contributed to improve the manuscript. We have responded to the concerns raised by the reviewers, and after that, we have also responded to the different points highlighted in the Recommendations for the authors:

      Reviewer #1

      While in vivo injury was used to assess regeneration from subsets of PNS neurons, different in vitro neurite growth or explant assays were used for further assessments. However, the authors did not assess whether the differential "regenerative" responses in vivo could be recapitulated in vitro. Such results will be important in interpreting the results.

      We included a supplementary figure evaluating the neurite extension in vitro and updated the text accordingly.

      Intriguingly, even in individual groups of PNS neurons, not all neurons regenerate to the same extent. It is known that the distance between the cell body and the lesion site affects neuronal injury responses. It would be interesting to test this in the observed regeneration.

      Although it is true that the distance can affect the outcome, here we used a physiological model where all neurons are lesioned at the same point in the nerve. Not only distance is different for motoneurons, but also the microenvironment surrounding their somas and therefore the direct comparison of these neurons with sensory neurons is limited. We extended the discussion on this matter in the new manuscript.

      Fig 1: The authors quantified the number of regenerating axons at two different time points. However, the total numbers of neurons/axons in each subset are different. The authors should use these numbers to normalize their regenerative axons.

      Figure 1D shows the normalization of data from figure 1C (normalized against the number of control axons in each neuron type). This has been clarified in the text.

      Fig 2-5: In explaining differential regeneration of individual groups of neurons, there are at least two possibilities: (1). Each group of neurons has different injury/regenerative responses; (2). The same set of injury/regenerative responses are differentially activated. Some data in this manuscript suggested the latter possibility. But some other data point in the opposite direction. It would be informative for the authors to analyze/discuss this further.

      From our point of view, these two options can be considered differential response to injury and could be potentially used for the modulation of regeneration. However, if the second possibility is correct, the regenerative program could be more influenced by the time chosen to study the response. Given the importance of this, we added some discussion about this topic.

      Fig 6: Is it possible to assess the regenerative effects of knockdown Med12 after in vivo injury?

      It is possible, but it is out of the scope of this work. Here, we aimed to describe the regenerative response and validate our data by testing a potential target for specific regeneration. Future studies will focus on the modulation of this specific regeneration both in vitro and in vivo.

      Reviewer #2

      It seems that the most intriguing outcome of this paper revolves around the role of Med12 in nerve regeneration. The authors should prioritize this finding. Drawing a conclusion regarding Med12's role in proprioceptor regeneration based solely on this in vitro model may be insufficient. This noteworthy result requires further investigation using more animal models of nerve regeneration.

      The main goal of this work was to compare the regenerative responses of different neuron subpopulations. We modulated Med12 to validate our data and the potential of our findings. Unfortunately, investigating in depth the role of Med12 in regeneration is out of the scope of this paper. For this reason, we did not prioritise this finding here. As this finding was striking, we strongly agree that the next step should be studying how it modulates regeneration.

      One critique revolves around the authors' examination of only a single time point within the dynamic and continuously evolving process of regeneration/reinnervation. Given that this process is characterized by dynamic changes, some of which may not be directly associated with active axon growth during regeneration, and encompasses a wide range of molecular alterations throughout reinnervation, concentrating solely on a single time point could result in the omission of critical molecular events.

      We agree that this is probably the main limitation of this study, as we discussed in the text. We chose 7 days postinjury as a standard time point widely described in literature and to have a correlate with our histological data. Although the main aim was to compare populations, analyzing an additional time point after injury could add valuable information.

      Reviewer #3

      No concerns were expressed by that reviewer.

      Recommendations for the authors:

      The authors should assess whether the differential "regenerative" responses in vivo could be recapitulated in vitro.

      We included a supplementary figure evaluating the neurite extension in vitro and updated the text accordingly.

      Optional:

      It will be interesting to test if the distances between the cell body and the lesion site contribute to the observed differences in individual subsets of PNS neurons.

      Figure 1D shows the normalization of data from figure 1C (normalized against the number of control axons in each neuron type). This has been clarified in the text.

      Fig 2-5: In explaining differential regeneration of individual groups of neurons, there are at least two possibilities: (1). Each group of neurons has different injury/regenerative responses; (2). The same set of injury/regenerative responses are differentially activated. Some data in this manuscript suggested the latter possibility. But some other data point in the opposite direction. At least the authors should discuss these.

      From our point of view, these two options can be considered differential response to injury and could be potentially used for the modulation of regeneration. However, if the second possibility is correct, the regenerative program could be more influenced by the time chosen to study the response. Given the importance of this, we added some discussion about this topic.

      While the paper is technically well-executed, the conclusions and some of the findings appear to be incomplete and challenging to draw meaningful conclusions from. This manuscript presents some interesting findings, but the title is quite broad and may suggest that the authors have unveiled fundamental mechanisms explaining the varying regenerative abilities of peripheral axons. However, the results do not substantiate such a conclusion. Further comments and suggestions follow.

      We eliminated the word “regenerative (response)” from the title, as it could lead to think that all changes seen in these neurons are related only to regeneration. We think that “Neuron-specific RNA-sequencing reveals different responses in peripheral neurons after nerve injury” highlights the differences between neurons that we found without misleading towards thinking that we described regenerative mechanisms in all neurons.

      What's notably absent here is the validation of certain genes found with the ribosomes, especially those highlighted in the subsequent figures. The question arises as to whether the changes depicted in the figures align with changes in the DRGs in vivo. Is there concordance between the presence of these genes and their transcriptional changes? It would greatly enhance the study's value if the authors could show evidence of upregulation or downregulation of certain genes over time in tissue sections, utilizing techniques such as in situ hybridization or immunocytochemistry.

      We selected some factors that were specifically upregulated in subsets of neurons to corroborated by immunohistochemistry these findings. Changes in the immunofluorescence of P75 in motoneurons and ATF2 in cutaneous mechanoreceptors, were evaluated in controls and animals that received a nerve crush one week before. Supplementary figures with the images have been added.

      The authors discovered intriguing distinctions, such as the presence of specific signaling pathways unique to neurons projecting to muscle as opposed to those projecting to the skin. Among these pathways were those associated with receptor tyrosine kinases like VEGF, erbB, and neurotrophin signaling among others. The question now arises: do these pathways play a role in natural peripheral regeneration processes? To answer this, it is imperative to conduct in vivo studies. However, the authors employed an in vitro DRG neurite outgrowth assay to demonstrate that various types of neurons exhibit different responses to the presence of different neurotrophins. This does not reflect what actually happens in vivo. While neurotrophins indeed play a role in neuron survival and axon extension during development, their role in postnatal periods changes over time, and it remains unclear whether they play any role in the natural regenerative processes of the peripheral nerve. Therefore, this experiment may not be directly relevant in this case, especially during the early axon extension period of the regenerating axons. if the authors aim to establish a causal link with neurotrophin signaling, it becomes crucial to conduct in vivo experiments by manipulating the expression of key molecules like the receptors.

      It has been widely described that different types of peripheral neurons have a differential expression of Trk receptors, even in the adult, and that these respond differentially to neurotrophins. In our study, we do not stablish a causal relationship between the expression of Trk and neurite extension, but instead we show (as many others) that distinct neurons respond differentially to these neurotrophins. The fact that in vivo studies fail to show a clear effect does not necessarily mean that neurotrophins are not specific. It might mean that their effect is not strong enough to be a useful guide in the complex microenvironment found after an injury. For instance, NGF acts on TrkA (present in some neurons), but in vivo it has been shown to accelerate the clearance of myelin debris in Schwann cells (Li et al., 2020), which could facilitate regeneration of all type of axons, masking any potential specific effect on the subtypes of neurons expressing TrkA. In contrast, in an in vitro setting on neuronal cultures, the specific neuronal effect can be more evident.

      Additionally, it's worth noting that another paper utilizing the same methodology and experimental setup (PMID: 29756027, "Translatome Regulation in Neuronal Injury and Axon Regrowth" by Rozenbaum et al.) exists. Are there any significant differences or shared findings with that study?

      This study shows the transcriptomic response after an injury 4, 12 and 24 hours after an injury in a very similar experimental setup. They focus on comparing the neuronal vs the glial response to the injury, using a Ribotag line that tags ribosomes from all neurons in the DRG rather than specific neuron subtypes. As the time postinjury (24h vs 7 days) and the cell types studied are different, we could not directly compare our results. We did see an upregulation in both datasets of previously described growth-associated genes (Jun, Atf3, Sox11, Sprr1a, Gal…). We included the article in the references for its relevance in the topic.

      It would be helpful for readers to illustrate the finding of the fastest axon regeneration of nociceptors by showing fluorescence micrographs of the nerve samples in addition to the graphs shown in Fig. 1 C/D.

      In figure 1B, we show fluorescence micrographs of the nerves 7 days postinjury. As explained in the results, we counted the number of axons at 2 distances from the injury, we did not analyse the fastest axon. This is due to technical reasons: 7 days after the injury the fastest axon has surpassed our evaluation point, which was the further distance that we could assess in our experimental setting in a consistent manner. If the reviewer thinks that we need to include more images from our evaluations (from 9 dpi for example), we could prepare a new figure.

      The labeling in Fig. 2B is confusing. Is the CHAT immunoreactivity shown in the last panel illustrated by green or red signals? Is the red signal counterstaining with beta-tubulin?

      The labelling was changed in the figure to increase clarity.

      The references to the supplementary data throughout the manuscript are confusing. For example, where can the "Supp data 2" be found? (mention on p. 14 in the merged pdf file). Are they referring to the Excel spreadsheets?

      We divided the supplementary material in supplementary figures/table (found in the pdf) and supplementary data. Supplementary data refers to excel spreadsheets found outside the pdf file. We hope this will be clearer after the final formatting of the article.

      What does the following statement on p. 14 mean?: "The caveat in these analyses was that molecular classification by these approaches may be arbitrary, and not reflective of protein repurposing." This reviewer notes that these databases consider the fact that components participate in different pathways.

      Indeed, we aimed to explain that many proteins participate in different pathways, and this is a limitation of the enrichment analysis. We modified the sentence in the text.

      First paragraph on p. 15: The PPAR and AMPK pathways have much broader roles, and are not only "related to fatty acid metabolism". This factual inaccuracy should be corrected in the manuscript.

      The sentence has been corrected.

      The authors should consider showing increased TGF-beta signaling in their neurons after downregulation of Med12 given the previous implication of TGF-beta signaling in axon regeneration.

      We tried to demonstrate the effect of our knockdown in TGF-beta pathway by analyzing the expression of typical targets from this pathway by qPCR in our cultures. However, we could not detect any difference. We think that this can have two explanations: (1) as only a few cells upregulate Med12 whereas many cells downregulate it, the effect is masked (presumably only proprioceptors will have a significant difference in this pathway and, thus, it would be very difficult to see the effect), or (2) Med12 is not exerting its effect through this pathway. We added a supplementary figure with these data and discussed it in the manuscript.

      It would be helpful to eliminate typos and improve syntax/grammar/style.

      We revised the text to improve style.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      In the present manuscript, the authors analyzed diel oscillations in the brain and olfactory organs' transcriptome of Aedes aegypti and Anopheles culicifacies. The analysis of their RNAseq results showed an effect of time of day on the expression of detoxification genes involved in oxidoreductase and monooxygenase activity. Next, they investigated the effect of time of day on the olfactory sensitivity of Ae. aegypti and An. gambiae and identified the role of CYP450 in odor detection in these species using RNAi. In the last part of the study, they used RNAi to knock down the expression of one of the serine protease genes and observed a reduction in olfactory sensitivity. Overall, the experiments are well-designed and mostly robust (see comment regarding the sample size and data analysis of the EAG experiments) but do not always support the claims of the authors. For example, since no experiments were conducted under constant conditions, the circadian (i.e., driven by the internal clocks) effects are not being quantified here. In addition, knocking down the expression of a gene showing daily variations in its expression and observing an effect on olfactory sensitivity is not sufficient to show its role in the daily olfactory rhythms. Knowledge gaps are not well supported by the literature, and overstatements are made throughout the manuscript. Our detailed comments are listed below.

      We sincerely thank the reviewer for their time and consideration, and appreciate the thorough review of our manuscript. Their insightful comments have greatly enriched our work. We also apologies for instances of overinterpreting the data. Your feedback has helped us recognize areas where clarity and caution are needed, and we are committed to addressing these concerns in our revisions. Thank you for your valuable input and guidance.

      Major comments

      Introduction

      1. Several statements made in the introduction are misleading and suggest that authors are trying to exaggerate the impact of their work. For example, "Furthermore, different species of mosquitoes exhibit plasticity and distinct rhythms in their daily activity pattern, including locomotion, feeding, mating, blood-feeding, and oviposition, facilitating their adaptation into separate time-niches (7, 8), but the underlying molecular mechanism for the heterogenous temporal activity remains to be explored." is not accurate since daily rhythms in mosquitoes' transcriptomes, behavior, and olfactory sensitivity have been the object of several publications. Even though some of them are listed later in the introduction, they contradict the claim made about the knowledge gap. See:

      Rund, S. S., Gentile, J. E., & Duffield, G. E. (2013). Extensive circadian and light regulation of the transcriptome in the malaria mosquito Anopheles gambiae. BMC genomics, 14(1), 1-19

      Rund, S. S., Hou, T. Y., Ward, S. M., Collins, F. H., & Duffield, G. E. (2011). Genome-wide profiling of diel and circadian gene expression in the malaria vector Anopheles gambiae. Proceedings of the National Academy of Sciences, 108(32), E421-E430

      Rund, S. S., Bonar, N. A., Champion, M. M., Ghazi, J. P., Houk, C. M., Leming, M. T., ... & Duffield, G. E. (2013). Daily rhythms in antennal protein and olfactory sensitivity in the malaria mosquito Anopheles gambiae. Scientific reports, 3(1), 2494

      Rund, S. S., Lee, S. J., Bush, B. R., & Duffield, G. E. (2012). Strain-and sex-specific differences in daily flight activity and the circadian clock of Anopheles gambiae mosquitoes. Journal of insect physiology, 58(12), 1609-1619

      Leming, M. T., Rund, S. S., Behura, S. K., Duffield, G. E., & O'Tousa, J. E. (2014). A database of circadian and diel rhythmic gene expression in the yellow fever mosquito Aedes aegypti. BMC genomics, 15(1), 1-9

      Eilerts, D. F., VanderGiessen, M., Bose, E. A., Broxton, K., & Vinauger, C. (2018). Odor-specific daily rhythms in the olfactory sensitivity and behavior of Aedes aegypti mosquitoes. Insects, 9(4), 147

      Rivas, G. B., Teles-de-Freitas, R., Pavan, M. G., Lima, J. B., Peixoto, A. A., & Bruno, R. V. (2018). Effects of light and temperature on daily activity and clock gene expression in two mosquito disease vectors. Journal of Biological Rhythms, 33(3), 272-288

      Response: We apologies for this oversight. In the revised manuscript, we have added these references and made changes to the text as suggested by the reviewer.

      The knowledge gap brought up in the next paragraph of the introduction doesn't reflect the questions asked by the experiments: "But, how the pacemaker differentially influences peripheral clock activity present in the olfactory system and modulates olfactory sensitivity has not been studied in detail." Specifically, the control of peripheral clocks by the central pacemaker has not been evaluated here.

      Response: This statement has been modified in the revised manuscript.

      "In vertebrates and invertebrates, it is well documented that circadian phase-dependent training can influence olfactory memory acquisition and consolidation of brain functions" should also cite work on cockroaches and kissing bugs:

      Lubinski, A. J., & Page, T. L. (2016). The optic lobes regulate circadian rhythms of olfactory learning and memory in the cockroach. Journal of Biological Rhythms, 31(2), 161-169

      Page, T. L. (2009). Circadian regulation of olfaction and olfactory learning in the cockroach Leucophaea maderae. Sleep and Biological Rhythms, 7, 152-161

      Vinauger, C., & Lazzari, C. R. (2015). Circadian modulation of learning ability in a disease vector insect, Rhodnius prolixus. Journal of Experimental Biology, 218(19), 3110-3117

      Response: These references have been added in the revised manuscript as suggested by the reviewer.

      The sentence: "Previous studies showed that synaptic plasticity and memory are significantly influenced by the strength and number of synaptic connections (43, 44)." should be nuanced as the role of neuropeptides such as dopamine has also been showed to influence learning and memory in mosquitoes:

      Vinauger, C., Lahondère, C., Wolff, G. H., Locke, L. T., Liaw, J. E., Parrish, J. Z., ... & Riffell, J. A. (2018). Modulation of host learning in Aedes aegypti mosquitoes. Current Biology, 28(3), 333-344 Wolff, G. H., Lahondère, C., Vinauger, C., Rylance, E., & Riffell, J. A. (2023). Neuromodulation and differential learning across mosquito species. Proceedings of the Royal Society B, 290(1990), 20222118

      Response: We agree with the reviewer. We have modified this statement and added the references in the revised manuscript.

      Overall, the paragraph dealing with the idea that "circadian phase-dependent training can influence olfactory memory acquisition and consolidation of brain functions" is very confusing. This paragraph discusses mechanisms of learning-induced plasticity but seems to ignore the simplest (most parsimonious) explanations for the circadian regulation of learning (e.g., time-dependent expression of genes involved in memory consolidation). In addition, the sentence quoted above is circumvoluted to simply say that training at different times of the day affects memory acquisition and consolidation. Although the authors did look at one gene involved in neural function, learning, memory, or circadian effects were not analysed in this study. Please reconsider the relevance of the paragraph.

      Response: We have modified this paragraph as per the suggestions of the reviewer in the revised manuscript.

      The sentence: "But, how the brain of mosquitoes entrains circadian inputs and modulates transcriptional responses that consequently contribute to remodel plastic memory, is unknown." should be rephrased. First, it should be "entrains TO circadian inputs", and second, it suggests that the study will be investigating circadian modulation of learning and memory, which is not the case. Furthermore, the term "remodel plastic memory" is unclear and doesn't seem to relate to any specific cellular or neural processes.

      Response: This statement has been removed from the revised manuscript.

      Given the differences in mosquito chronobiology observed even between strains, why perform the RNAi and EAGs on a different species of Anopheles than the one used for the RNAseq (or vice versa)?

      Response: We agree with the reviewer that there are differences in mosquito chronobiology between different strains and therefore species variation may be challenging for data interpretation. Considering the strict nocturnal behavioral pattern of An. culicifacies and dirurnal behavior of Aedes aegypti, we performed RNA-Seq study with these respective species. However, 1) due to unavailability of EAG facility at ICMR-National Institute of Malaria Research, India (only where An. culicifacies colony is available), 2) challenges in rearing and adaptation of An. culicifacies in a new environment/laboratory, 3) to validate the proof-of-concept of CYP450 function in odorant detection and olfactory sensitivity, we opt for the current collaborative study. We are also aware that species variation of Anopheles for electroantennographic study would be difficult to correlate with the molecular data on An. culicifacies. Thus, we consider An. gambiae (not other Anopheles mosquitoes like An. stephensi, An. coluzzii etc.) because of the availability of diel rhythm associated molecular data for An. gambiae (68). For better interpretation we also compare expression profiling of CYP450 and OBP genes between An. culicifacies and An. gambiae (Supplemental file 3). Importantly, we found similar expression pattern of several CYP450 and OBP/CSP genes between An. culicifacies and An. gambiae. Furthermore, please note that the primary focus of the current MS is to highlight the role of peri-receptor proteins in olfactory sensitivity and odor detection. And, as a proof-of-concept, we validate this hypothesis both in An. gambiae and Aed. aegypti. We believe that the basic mechanism of odor detection and peri-receptor events are similar/conserved from insects to higher vertebrates, therefore, the arguments for species difference can be overruled.

      S. S. C. Rund, J. E. Gentile, G. E. Duffield, Extensive circadian and light regulation of the transcriptome in the malaria mosquito Anopheles gambiae. BMC Genomics. 14 (2013), doi:10.1186/1471-2164-14-218. S. S. C. Rund, T. Y. Hou, S. M. Ward, F. H. Collins, G. E. Duffield, Genome-wide profiling of diel and circadian gene expression in the malaria vector Anopheles gambiae. Proc. Natl. Acad. Sci. U. S. A. 108 (2011), doi:10.1073/pnas.1100584108. S. S. C. Rund, N. A. Bonar, M. M. Champion, J. P. Ghazi, C. M. Houk, M. T. Leming, Z. Syed, G. E. Duffield, Daily rhythms in antennal protein and olfactory sensitivity in the malaria mosquito Anopheles gambiae. Sci. Rep. 3, 2494 (2013).

      Results

      1. "As reported earlier, a significant upregulation of period and timeless during ZT12-ZT18 was observed in both species (Figure 1C)." Please provide effect size and summary statistics.

      Response: The statistics are provided in the Figure S2 in the revised manuscript.

      "Next, the distribution of peak transcriptional changes in both An. culicifacies and Ae. aegypti was assessed through differential gene-expression analysis. Noticeably, An. culicifacies showed a higher abundance of differentially expressed olfactory genes (Figure 1D)" Please provide effect size and summary statistics.

      Response: The statistics are provided in the Table 1 in the revised manuscript.

      "Taken together, the data suggests that the nocturnal An. culicifacies may possess a more stringent circadian molecular rhythm in peripheral olfactory and brain tissues." What do the authors mean by "stringent"? At this point, this should be stated as a working hypothesis, as the statement is not backed up by the data. It is possible that the fewer differentially expressed genes of Aedes aegypti are more central to regulatory networks and cascade into more "stringent" rhythmic control of activities and rhythms.

      Response: We thank the reviewer for this suggestion. We have modified this statement as suggested by the reviewer.

      The section title: "Circadian cycle differentially and predominantly expresses olfaction-associated detoxification genes in Anopheles and Aedes" doesn't make sense. The expression of genes can be modulated by circadian rhythms, but cycles don't express genes. Please rephrase. In addition, this whole section deals with "circadian rhythms" while no experiment has been conducted under constant conditions. The observed daily variations are therefore diel rhythms until their persistence under constant conditions is established.

      Response: We agree with the reviewer and changed the statement accordingly.

      "The downregulated genes of Ae. aegypti did not show any functional categories probably due to the limited transcriptional change." Could the authors explain if this is actually the phenomenon or due to a lack of temporal resolution in the study design (i.e., 4 time points)?

      Response: We do not agree with the reviewer’s comments about the lack of temporal resolution in the current study. The functional categories of differentially expressed genes are deduced by gene set enrichment analysis, which identify the classes of genes that are overrepresented in a large set of genes. The statistical significance value is dependent on the abundance of query and background genes. In our experiments, as the number of queries (i.e. number of downregulated genes) is limited, the enrichment tool, i.e. shinyGo didn’t able to show significant enrichment of downregulated genes with FDR cut-off 0.05 and top 10 pathways were selected. Though we have selected 4 time points, previous study by Rund et al. (BMC Genomics 2013) also showed that compared to Aed. aegypti, An. gambiae possess higher number of rhythmic genes (2.6 fold higher). Therefore, it can be stated that the data that we received is not due to the pitfalls of study design, but probably the physiological difference between Anopheles and Aedes mosquitoes.

      "a GO-enrichment analysis was unable to track any change in the response-to-stimulus or odorant binding category of genes (including OBPs, CSPs, and olfactory receptors)." This finding doesn't corroborate the statements made previously and doesn't align with previously published studies. Is it due to pitfalls in the study design?

      Response: The functional categories of differentially expressed genes are deduced by gene set enrichment analysis, which identify the classes of genes that are overrepresented in a large set of genes. The statistical significance value is dependent on the abundance of query and background genes. Though, differential expression analysis revealed a significant upregulation of a subset of CSPs (~ 5-fold) and OBP6 (~3.3-fold) transcripts in An. culicifacies mosquitoes during ZT12, as the number of queries (i.e. number of chemosensory genes) is limited (i.e. 3), the enrichment tool, i.e. shinyGo didn’t able to show significant enrichment of these categories of genes when FDR cut-off 0.05 and top 10 pathways were selected.

      Moreover, we do not agree with the reviewer regarding the comment on pitfalls of study design because our previous experiments with An. culicifacies according to diel rhythm, considering more extended time points, also revealed similar expression pattern of chemosensory genes (Das De et.al., 2018).

      "In contrast, three different clusters of OBP genes in Ae. aegypti showed a time-of-day dependent distinct peak in expression starting from ZT0-ZT12 (Figure 2F)." Please provide summary statistics.

      Response: Please find the table for summary statistics in the supplemental file 1.

      "In the case of An. gambiae, the amplitudes of odor-evoked responses were significantly influenced by the doses of all the odorants tested (repeated measure ANOVA, p {less than or equal to} 2e-16) (Figure S4B)." Did the authors use a positive control for the EAGs? How did the authors normalize the responses across the two species? Given the way the data is presented, how were the data normalized to allow inter-species comparisons? In addition, It is highly unlikely that all the mosquito preps used in the EAG assay responded to all the odors tested. If that was the case, then the dataset includes missing data for certain odors and time points. We believe the authors have ensured there are at least a certain number of responses per odor and time point combinations. If this is true, repeated measures ANOVA is not suited for analyzing this data because this statistical technique requires all repeated measures within and across preps without missing values. Also, the authors need to correct the summary statistics for multiple comparisons within this framework to avoid inflating type-I errors. Has this been done?

      Response: In our study involving An. gambiae, we observed significant influences of odorant doses on the amplitudes of odor-evoked responses (repeated measure ANOVA, p ≤ 2e-16) (Figure S4B). It's important to note that we did not employ a separate positive control for the electroantennogram (EAG) assays, as the compounds utilized in our research are already known to be EAG active in at least one of the mosquito species under investigation (mentioned in supplementary file 3).

      Our primary objective for performing EAG studies is to correlate the diel-rhythmic molecular data with the diel-rhythmic electroantennographic response in nocturnal and diurnal mosquitoes. To address the normalization of responses across the two species, we opted to control for dose and time rather than normalizing using one of the EAG active compounds. Further, the EAG responses were measured in relation to solvent control. In our experimental design, we utilized different batches of mosquitoes from the same cohort to test each odorant at various time points. EAG responses were acquired using the same mosquito across different dilutions for a single odor or volatile compound, rather than across time points. Hence, we didn’t end up with missing values.

      For individual species analysis, we performed repeated measures ANOVA for each compound's EAG response, considering dose and time as variables. This enabled not only enabled us select compounds which where ‘Time’ or its interaction terms were found to be significant. Subsequently, for compounds showing significance, we conducted a basic one-way ANOVA using only time as a variable, segregating the data by each individual dose. Post-hoc Tukey tests were then carried out to compare between time points. When comparing between species, we generated a dataset by combining both species and adding species as a variable as well. Repeated measures ANOVA for each compound's EAG response, considering species, dose, and time as variables, was applied. This enabled us select compounds which where ‘Time’ or its interaction terms were found to be significant. For significant compounds, a two-way ANOVA was performed using time and species as variables. Data were segregated by each individual dose, and post-hoc Tukey tests were employed to compare between time points. It's worth mentioning that our analysis aims to account for repeated measures within and across preparations. Additionally, we have implemented post-hoc Tukey tests to correct for multiple comparisons within this framework, ensuring that we avoid inflating type-I errors in our statistical interpretations.

      "Ae. aegypti was found to be most sensitive to all the odorants (4-methylphenol, β-ocimine, E2-nonenal, benzaldehyde, nonanal, and 3-octanol) during ZT18-20 except sulcatone (Figure 3C - 3H)." Although some of these chemicals are associated with plants and Ae. aegypti is suspected to sugar feed at night, how do the authors explain that the peak olfactory sensitivity occurs at night for compounds such as nonanal? It would be interesting to discuss how these results compare to previous studies such as:

      Eilerts, D. F., VanderGiessen, M., Bose, E. A., Broxton, K., & Vinauger, C. (2018). Odor-specific daily rhythms in the olfactory sensitivity and behavior of Aedes aegypti mosquitoes. Insects, 9(4), 147

      Response: The possible explanations have been added in the revised MS.

      "Additionally, our principal components analysis also illustrates that most loadings of relative EAG responses are higher towards the Anopheles observations (Figure S4C)." The meaning of this sentence is unclear? Please clarify.

      Response: Considering the limited clarity of the statement we have removed it from the revised manuscript.

      "Taken together these data indicate that An. gambiae may exhibit higher antennal sensitivity to at least five different odorants tested, as compared to Ae. aegypti." As mentioned above, how did the authors normalized across species to allow comparisons? If not normalized, how do you ensure that higher response magnitudes correlate with higher olfactory sensitivity, given potential differences in the morphology or size differences between the two species? Furthermore, An. gambiae has been exclusively used in the EAG assay. Besides the lack of a justification for using a species other than An. culicifacies, the authors have interpreted the EAG results under the assumption that the olfactory sensitivities of An. gambiae and An. culicifacies are comparable. This, however, is a major caveat in the experiment design, given previous studies (indicated below) have reported species-specific variations in olfactory sensitivity. In its present form, the EAG data from An. gambiae is not a piece of appropriate evidence that the authors could use to complement or substantiate the findings from other aspects of this study on An. culicifacies.

      Wheelwright, M., Whittle, C. R., & Riabinina, O. (2021). Olfactory systems across mosquito species. Cell and Tissue Research, 383(1), 75-90. Wooding, M., Naudé, Y., Rohwer, E., & Bouwer, M. (2020). Controlling mosquitoes with semiochemicals: a review. Parasites & Vectors, 13, 1-20.

      iii. Gupta, A., Singh, S. S., Mittal, A. M., Singh, P., Goyal, S., Kannan, K. R., ... & Gupta, N. (2022). Mosquito Olfactory Response Ensemble enables pattern discovery by curating a behavioral and electrophysiological response database. Iscience, 25(3).

      Response: The data is normalized as described above in the point 15. Also, it is technical limitation that we had to use multiple species of the mosquito for this study (please refer to the point 7).

      The reviewer’s statement “Besides the lack of a justification for using a species other than An. culicifacies, the authors have interpreted the EAG results under the assumption that the olfactory sensitivities of An. gambiae and An. culicifacies are comparable” is not true, as we never assume similar olfactory sensitivity between An. culicifacies and An. gambiae. We only consider nocturnal activity for both the mosquito species. Moreover, we are aware that species variation of Anopheles for electroantennographic study would be difficult to correlate with the molecular data on An. culicifacies. Thus, we consider An. gambiae (no other Anopheles mosquitoes like An. stephensi, An. coluzzii etc.) because of the availability of diel rhythm associated molecular data for An. gambiae (68). For better interpretation we also compare expression profiling of CYP450 and OBP genes between An. culicifacies and An. gambiae (Supplemental file 3). Importantly, we found similar expression pattern of several CYP450 and OBP/CSP genes between An. culicifacies and An. gambiae. Furthermore, we would like to emphasize that the primary focus of the current manuscript is to highlight the role of peri-receptor proteins in olfactory sensitivity and odor detection. And, as a proof-of-concept, we validated this hypothesis both in An. gambiae and Aed. aegypti. We believe that the basic mechanism of odor detection and peri-receptor events are similar/conserved from insects to higher vertebrates.

      "Similar to An. gambiae, a comparatively high amplitude response was also observed in An. stephensi (Figure S4D)." This is interesting but what would be even more relevant to the present study is to discuss how the time-dependent responses compare between the two Anopheles species.

      Response: We agree that it will be interesting to compare time-dependent response between the two Anopheles species. However, it is not our primary interest and objectives, and is beyond the scope of the current manuscript. Thus, we remove the data from the revised MS.

      The paragraph titled "Daily temporal modulation of neuronal serine protease impacts mosquito's olfactory sensitivity" is confusing because the authors move on to test the effect of knocking down a serine protease gene (found to be differentially expressed throughout the day) on olfactory sensitivity. While this is interesting in and of itself, the link between the role of this gene in learning-induced plasticity, the circadian modulation of "brain functions" and olfactory sensitivity is 1) unclear and 2) not explicitly tested. We agree with the authors that what has been tested is "the effect of neuronal serine protease on circadian-dependent olfactory responses," but the two paragraphs leading to it seem to be extrapolating functional links that have yet to be determined. In this context, their conclusions that "Our finding highlights that daily temporal modulation of neuronal serine-protease may have important functions in the maintenance of brain homeostasis and olfactory odor responses." is misleading because although they used the hypothetical "may", the link between the temporal modulation of one serine protease gene and the maintenance of brain homeostasis is not explicitly tested here.

      Response: Though, we strongly believe that neuronal serine protease are involved in remodelling of extracellular matrix and the maintenance of brain homeostasis, the limitation of experimental validation by neuroimaging (out of the scope of the current manuscript), restricting us to draw the conclusion. Therefore, we have modified our conclusions based on the available data as suggested by the reviewer.

      Discussion

      1. The first sentence of the discussion: "In this study, we provide initial evidence that the daily rhythmic change in the olfactory sensitivity of mosquitoes is tuned with the temporal modulation of molecular factors involved in the initial biochemical process of odor detection i.e., peri-receptor events" is not true since studies from Rund and Duffield previously revealed the daily modulation of OBP gene expression. It also contradicts the next sentence: "The findings of circadian-dependent elevation of xenobiotic metabolizing enzymes in the olfactory system of both Ae. aegypti and An. culicifacies are consistent with previous literature (26, 31), and we postulate that these proteins may contribute to the regulation of odorant detection in mosquitoes."

      Response: This statement is modified in the revised manuscript.

      The use of "circadian" in the discussion of the results is also misleading as only diel rhythms were evaluated in the present study.

      Response: This is changed in the revised manuscript.

      "Given the potentially larger odor space in mosquitoes (like other hematophagous insects) (16, 58)." This is not really what these references show.

      Response: The statement and the references have been changed in the revised manuscript.

      "Given the potentially larger odor space in mosquitoes (like other hematophagous insects) (16, 58), it can be hypothesized that detection of any specific signal in such a noisy environment, mosquitoes may have evolved a sophisticated mechanism for rapid (i) odor mobilization and (ii) odorant clearance, to prevent anosmia (24)." One could argue that this is a requirement for all insects, regardless of the size of their olfactory repertoire.

      Response: We agree with the reviewer and modified the text accordingly.

      "Taken together, we hypothesize that circadian-dependent activation of the peri-receptor events may modulate olfactory sensitivity and are key for the onset of peak navigation time in each mosquito species." This is not entirely accurate since spontaneous locomotor activity rhythms are also observed in the absence of olfactory stimulation. While "navigation" does imply olfactory-guided behaviors, "peak navigation time" appears to be driven by other processes. See, for example, all studies testing mosquito activity rhythms in locomotor activity monitors. Response: Considering the concern of the reviewer, we have modified the text.

      "Due to technical limitations, and considering the substantial data on the circadian-dependent molecular rhythmicity" please clarify what the technical limitations were. Is this something that prevented the authors specifically, or something tied to mosquito biology and would prevent anybody from doing it? Also, why couldn't the transcriptomic analysis be performed on An. gambiae?

      Response: As previously mentioned, primarily, unavailability of EAG facility at ICMR-National Institute of Malaria Research, India (only where An. culicifacies colony is available) is the major challenge for us to proof our hypothesis. Secondly, transportation of An. culicifacies was not possible due to Govt. regulations and also adaptation and establishment of the colony of An. culicifacies take long time as it is not easily adapted (Adak T, Kaur S, Singh OP. Comparative susceptibility of different members of the Anopheles culicifacies complex to Plasmodium vivax. Trans R Soc Trop Med Hyg. 1999;93:573–577) in a new environment/laboratory. Thirdly, An. culicifacies colony was not available at our collaborative laboratory. These are the major technical limitations.

      Therefore, to validate the hypothesis of CYP450 function in odorant detection and olfactory sensitivity, we opt for the current collaborative study. We are also aware that species variation of Anopheles for electroantennographic study would be difficult to correlate with the molecular data on An. culicifacies. Thus, we consider An. gambiae (not other Anopheles mosquitoes like An. stephensi, An. coluzzii etc.) because of the availability of diel rhythm associated molecular data for An. gambiae (68). For better interpretation we also compare expression profiling of CYP450 and OBP genes between An. culicifacies and An. gambiae (Supplemental file 3). Importantly, we found similar expression pattern of several CYP450 and OBP/CSP genes between An. culicifacies and An. gambiae. Performing another RNA-Seq study with An. gambiae would not be possible for the current MS. Furthermore, please note that the primary focus of the current MS is to highlight the role of peri-receptor proteins in olfactory sensitivity and odor detection. And, as a proof-of-concept, we validate this hypothesis both in An. gambiae and Aed. aegypti. We believe that the basic mechanism of odor detection and peri-receptor events are similar/conserved from insects to higher vertebrates.

      "In contrast to An. gambiae, the time-dose interactions had a higher significant impact on the antennal sensitivity of Ae. aegypti. An. gambiae showed a conserved pattern in the daily rhythm of olfactory sensitivity, peaking at ZT1-3 and ZT18-20." These two sentences are very confusing. Doesn't it simply mean that the co-variation is not linear or not the same across odors? In addition, what does it mean for a pattern to be more conserved? How can one conclude about the "conserved" nature of a pattern by looking at time-dependent variations in dose-response curves?

      Response: This section of discussion is re-written in the revised version of the manuscript.

      "Together these data, we interpret that mosquito's olfactory sensitivity possibly does not follow a fixed temporal trait" is unclear and suggests that the authors are discussing global versus odor-specific rhythms. Please rephrase.

      Response: This section of discussion is re-written in the revised version of the manuscript.

      "Moreover, we hypothesize that under standard insectary conditions, mosquitoes may not need to exhibit foraging flight activity either for nectar or blood, and during the time course, it may minimize their olfactory rhythm, which is obligately required for wild mosquitoes." This hypothesis is not supported by the results of the study and contradicts work by others (Rund et al., Eilerts et al., Gentile et., etc).

      Response: This section of discussion is re-written in the revised version of the manuscript.

      The same comment applies to "Therefore, it is reasonable to think that the mosquitoes used for EAG studies may have adapted well under insectary settings and, hence carry weak olfactory rhythm." as this statement is not supported by results of the present study or comparisons of the results to previous studies based on field-caught mosquitoes. Although it is an interesting question to ask in the future, it should be stated as a future research avenue rather than a working hypothesis that results from the present study.

      This section of discussion is re-written in the revised version of the manuscript.

      "Aedes aegypti displayed a peak in antennal sensitivity at ZT18-20 to the higher concentrations of plant and vertebrate host-associated odorants tested. Given the time-of-day dependent multiple peaks (at ZT6-8 and ZT18-20 for benzaldehyde and at ZT12-14 and ZT18-20 for nonanal) in antennal sensitivity to different odorants, our data supports the previous observation of bimodal activity pattern of Ae. aegypti (50)." Rephrase by saying that results are "aligned with the previous observations of bimodal activity". Olfactory rhythms don't "support" the activity patterns because olfactory processes and spontaneous locomotor activity are independent processes.

      Response: We have made these changes in the revised manuscript as per the suggestions of the reviewer.

      "our preliminary data indicate that Anopheles spp. may possess comparatively higher olfactory sensitivity to a substantial number of odorants as compared to Aedes spp." Consider removing this sentence unless the way the data has been normalized to allow for comparisons between species is clarified.

      Response: This statement is removed from the revised manuscript.

      In "A significant decrease in odorant sensitivity for all the volatile odors tested in the CYP450-silenced Ae. aegypti," please change "silenced" to "reduced" because RNAi doesn't silence (i.e. knockout) gene expression.

      Response: It has been modified as per the suggestions of the reviewer.

      The title "Neuronal serine protease consolidates brain function and olfactory detection" is extremely misleading. Do the authors refer to memory consolidation, which has not been tested here? What is brain function consolidation??

      Response: We agree with the reviewer. The title has been modified in the revised manuscript.

      The reference used in "Despite their tiny brain size, mosquitoes, like other insects, have an incredible power to process and memorize circadian-guided olfactory information (7)." is not appropriate. Also, "circadian-guided" is unclear. Consider replacing it with "circadian-gated".

      Response: It has been modified as per the suggestions of the reviewer.

      What is the "the homeostatic process of the brain"?

      Response: The process of maintaining a stable state can be defined as homeostasis. Here, the statement "the homeostatic process of the brain" is used to convey that after the active host-seeking/olfaction phase of mosquitoes during which the co-ordinated and integrated functions of both olfactory and neuronal system is required for crucial decision-making events, brain may undergo a homeostatic process (comes down from excitatory state to stable state) during the resting period. However, in view of reviewer’s concern we have modified the statement.

      "the temporal oscillation of the sleep-wake cycle of any organism is managed by the encoding of experience during wake, and consolidation of synaptic change during inactive (sleep) phases, respectively (70)." By experience, do the authors refer to learning? This seems out of topic as this process has not been evaluated here.

      Response: It has been modified as per the suggestions of the reviewer.

      "We speculate that after the commencement of the active phase (ZT6-ZT12), the serine peptidase family of proteins in the brain of Ae. aegypti mosquitoes may play an important function in consolidating brain actions (after ZT12) and aid circadian-dependent memory formation." The value of this statement is unclear. Circadian-dependent memory formation is not being evaluated here, and the results from the present study do not directly support this speculation, also because other processes involved in memory formation are not evaluated here. This seems at odds with the literature on learning and memory.

      Response: We have modified these statements in the revised manuscript and mentioned it as future research hypothesis.

      "Subsequent work on electrophysiological and neuro-imaging studies are needed to demonstrate the role of neuronal-serine proteases in the reorganization of perisynaptic structure." Sure. But the link between "the role of neuronal-serine proteases in the reorganization of perisynaptic structure" and rhythms in olfactory sensitivity is unclear.

      Response: It has been modified as per the suggestions of the reviewer.

      As a general comment, EAGs seem inappropriate to evaluate the effect of the central-brain processing in the regulation of peripheral olfactory processes. This is a critical comment that needs to be considered by the authors and clarified in the manuscript. If rhythms of central brain processes are important for olfactory-guided behaviors, these should be evaluated at the level of the central brain or via behavioral metrics. The effect of the RNAi knockdowns on peripheral sensitivity is interesting, but its link with central processes is unclear and doesn't support the speculations made by the authors about learning and memory.

      Response: We agree with the reviewer that EAG study is not enough/appropriate to comment on the effect of central-brain processing in the regulation of olfactory processes. Further validation by either neuroimaging or behavioral studies are needed to make any conclusion. We clearly mention in the manuscript that our data indirectly indicating this function of serine protease and further confirmatory studies are needed to prove this hypothesis.

      Methods

      1. No explanations are provided for how the EAG data are normalized to allow comparisons between species.

      Response: Please refer to the response of the point no. 15 of the reviewer 1.

      Figures 42. Figure 1: The daily rhythm depicted in A, are not representative of the actual profiles. See: Benoit, J. B., & Vinauger, C. (2022). Chapter 32: Chronobiology of blood-feeding arthropods: influences on their role as disease vectors. In Sensory ecology of disease vectors (pp. 815-849). Wageningen Academic Publishers. Or any other paper on mosquito activity rhythms.

      Response: Considering the reviewer’s concern we have revised the figure.

      Figure 3 and 4: The EAG results are plotted twice. This is redundant and misleading as it makes the reader think there is more data than actually presented.

      Response: Considering the reviewer’s comment we shifted figure 4 into the supplemental file.

      Figure 5: Please clarify the sample size for each panel. In C - F, what would be used as a reference? In other words, what is a Relative EAG Response of 1? And if it is "relative", are the units really mV? In E and F, it would be great to show how the Ethanol control compares to the no solvent condition. This could be placed in supplementary materials.

      Response: The sample size was mentioned in the figure legends. However, for the reviewer’s clarification, the odor response was tested with 40 individual mosquitoes of control and dsrRNA-treated groups. Therefore, sample size N=40 for Fig. 5C.

      Respective solvent control (hexane solvent) used as a reference to calculate the relative EAG response for both the dsrLacZ and dsrCYP450 group. As it is relative EAG amplitude we have removed the unit in the revised MS.

      Figures 5 and 6, given the dispersion in the EAG data, the treatments where N=40 appear robust, but the interpretation of results from treatments where N=6 may be limited due to the low sample size. This limitation is visible in Figure 5F, for example, where ABT-Aceto is different from Cont-Aceta but not PBO-Aceto because one individual shows a higher response.

      Response: We agree that probably, by increasing the sample size for inhibitor treatment experiment, may decrease these inter-individual differences and increase the overall significance value. However, our robust knock-down data showed significant results and simultaneously it complements the inhibitor study in Ae. aegypti, we do not think of any disparity in the data. Moreover, EAG response to human blend, nonanal and benzaldehyde showed similar significant results in both RNAi and inhibitor studies. Accounting, the different knock-down efficiency in dsRNA injected mosquitoes, the phenotypic assays (EAG recordings) were carried out with 40 control and 40 dsRNA-treated mosquitoes. And, we observed significant reduction in EAG response following inhibitor treatment in An. gambiae, when we tested for 6 ethanol and 6 inhibitor treated mosquitoes. Thus, we followed the similar protocol for Ae. aegypti also. However, inter-individual difference in response is affecting the significance value.

      Figure S6: how does this support that synaptic plasticity is influenced by "Time-of-day dependent modulation of serine protease genes in the brain"?

      Response: We agree with the reviewer’s concern that with only EAG data it is not possible to comment on synaptic plasticity. We apologize for it and revised the statement in the MS.


      Minor comments

      What do the authors mean by "consolidation of brain functions"? Memory consolidation? Please clarify.

      Response: The consolidation of brain function or memory consolidation means to the process of stabilizing the memory that an organism gains through the process of experience or training/learning phase. Memory consolidation initiates with rapid change in de-novo gene expression regulated by several transcription factors, effector genes and non-coding RNAs, known as molecular consolidation followed by cellular consolidation that involves cellular signal transmission within the neurons in the brain. The molecular and cellular consolidation are the basis for system level consolidation which is a slow process and involves communication among neurons located different regions of the brain. The system level consolidation is very important for the reorganization of the brain circuits to maintain long-term memory. The concept of system consolation is very much well evident in humans. Additionally, several studies in Drosophila also showed that fruit fly develop olfactory memories after classical conditioning or olfactory training through system consolidation process.

      Moreover, accumulating data from humans suggest that sleep helps in memory consolidation. Sleep is basic drive for all animals that help to build memories. There are two hypothesis and respective compelling evidences for that. First hypothesis and the supporting molecular and electrophysiological data convey that sleep facilitate the homeostatic processes of the brain involving loosening of synaptic connections between the overactive neurons, structural modification of synapse which consequently help in memory formation. The second hypothesis state the important contribution of sleep in system consolidation and long-term memory potentiation. Studying the electrical activity of the brain and the recent advancement of fMRI scan indicate reorganization of neural activity between brain regions during sleep-related memory consolidation.

      There are several experimental evidences in support of both the theory for humans as well as in fruit fry Drosophila melanogaster. In mosquitoes, the studies related to the function of brain are primarily restricted to the mechanism of odor coding and memory formation has been correlated with Dopamine neurotransmitter signalling. In view of the rapid adaptation potential, change in host-preference and evolution of temporal host-seeking behaviour, it can be hypothesized that mosquito brain also undergo the process of memory consolidation (either following any of the two hypothesized path or cumulatively apply the both) to learn new information in order to effectively shape future actions.

      Furthermore, according to the fundamental principle of modern neuroscience learning and memory are achieved either by the formation of new synaptic connections or changing in existing connections between neurons. The ability of synapses to either strengthen or weaken the communications is called plasticity which is influenced by learning and experience and facilitate organism’s adaptation and survival.

      Reference:

      1. Cervantes-Sandova, A. Martin-Peña, J. A. Berry, R. L. Davis, System-like consolidation of olfactory memories in Drosophila. J. Neurosci. 33, 9846–9854 (2013).
      2. In "Similar to previous studies (26), the expression of a limited number of rhythmic genes was visualized in Ae. aegypti" please replace "visualized" with "observed".
      3. Marshall, N. Cross, S. Binder, T. T. Dang-Vu, Brain rhythms during sleep and memory consolidation: Neurobiological insights. Physiology. 35, 4–15 (2020).
      4. Brendon O. Watson and György Buzsáki. Sleep, Memory & Brain Rhythms. Daedalus, 144(1): 67–82 (2015). doi:10.1162/DAED_a_00318

      Figure 2A, please clarify in the caption what FDR stands for.

      Response: FDR stands for “false discovery rate”. FDR is an adjusted p-value to trim false positive results.

      In "To further establish this proof-of-concept in An. gambiae, three potent CYP450 inhibitors, aminobenzotriazole(52), piperonyl butoxide(53), and schinandrin A (54), was applied topically on the head capsule of 5-6-day-old female mosquitoes" replace "was applied" with "were applied".

      Response: These changes are made in the revised manuscript.

      "Interestingly, our species-time interaction studies revealed that An. gambiae exhibits time-of-day dependent significantly high antennal sensitivity to at least four chemical odorants compared to Ae. aegypti, except phenol." is unclear. Please reword.

      Response: The statement has been revised in the MS.

      In "Similar observations were also noticed with An. stephensi." replace "noticed" with "made". Response: We have modified the statement in the revised version of the manuscript.



      Reviewer #1 (Significance (Required)):

      Such a study has the potential to be valuable for the field, but its value and significance are hindered by an accumulation of overstatements, the fact that prior work in the field has been minimized or omitted, and a lack of support for the stated conclusions.

      In this context, the advances are only slightly incremental compared to the work produced by Rund et al., and the mechanistic hypotheses emitted to link the genes selected for knockdown experiments and olfactory sensitivity are not clearly supported by the evidence presented here. The main strength of the paper is to show the role of CYP450 in olfactory sensitivity.

      The audience is fairly broad and includes insect neuro-ethologists, molecular biologists, and chronobiologists.

      Our field of expertise:

      • Mosquito chemosensation

      • Learning and memory

      • Chronobiology

      • Electrophysiology

      • Medical entomology









      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This report combines an examination of peripheral transcriptomes and general olfactory sensitivity in an effort to underscore the importance of peri-receptor components in circadian-directed modulation of olfaction across both Aedine and Anopheline mosquitoes. While the authors do a nice job of raising the importance of the often-underappreciated spectrum of insect olfactory peri-receptor proteins, the impact of their study is undercut by technical concerns regarding methods and data presentation. That several of these concerns (detailed below) are explicitly acknowledged by the authors as limitations of this study does not mitigate their impact in eroding confidence in these data and this study.

      All in all, as a result of these concerns, I am unconvinced as to the overall merits of this somewhat interesting but generally uneven study.

      We sincerely thank the reviewer for their time and consideration, and appreciate the thorough review of our manuscript. Their insightful comments have greatly enriched our work. We also apologies for instances of overinterpreting the data. Your feedback has helped us recognize areas where clarity and caution are needed, and we are committed to addressing these concerns in our revisions. Thank you for your valuable input and guidance.

      Major concerns:

      1. That the authors use An. culicifacies for their transcriptome studies and An. gambiae (G3) for the olfactory physiology does not work. The 'technical limitations' (read studies done at two different locations) make this report an unwelcome melding of what should perhaps be two distinct studies. In order to maintain this forced marriage as a single report I would suggest the authors utilize An. culicifacies for both components. Alternatively, they can do both parts with An. gambiae but here I would strongly urge them to use any strain other than G3 which as a result of its now decades-long laboratory residence has long since lost its relevance to natural populations of Anopheline vectors. Response: We agree with the reviewer that there is significant species-specific variation in olfactory sensitivity of mosquitoes. Considering the strict nocturnal behavioral pattern of An. culicifacies and dirurnal behavior of Aedes aegypti, we performed RNA-Seq study with these respective species. However, 1) due to unavailability of EAG facility at ICMR-National Institute of Malaria Research, India (only where An. culicifacies colony is available), 2) challenges in rearing and adaptation of An. culicifacies in a new environment/laboratory (An. culicifacies take long time as it is not easily adapted, Ref: Adak T, Kaur S, Singh OP. Comparative susceptibility of different members of the Anopheles culicifacies complex to Plasmodium vivax. Trans R Soc Trop Med Hyg. 1999;93:573–577), 3) An. culicifacies colony was not available at our collaborative laboratory, 4) to validate our hypothesis of CYP450 function in odorant detection and olfactory sensitivity of mosquitoes, we opt for the current collaborative study.

      We are also aware that species variation of Anopheles for electroantennographic study would be difficult to correlate with the molecular data on An. culicifacies. Thus, we consider An. gambiae (not other Anopheles mosquitoes like An. stephensi, An. coluzzii etc.) because of the availability of diel rhythm associated molecular data for An. gambiae (68). For better interpretation we also compare expression profiling of CYP450 and OBP genes between An. culicifacies and An. gambiae (Supplemental file 3). Importantly, we found similar expression pattern of several CYP450 and OBP/CSP genes between An. culicifacies and An. gambiae. Performing another RNA-Seq study with An. gambiae would not be possible for the current MS. Furthermore, please note that the primary focus of the current MS is to highlight the role of peri-receptor proteins in olfactory sensitivity and odor detection. And, as a proof-of-concept, we validate this hypothesis both in An. gambiae and Aed. aegypti. We believe that the basic mechanism of odor detection and peri-receptor events are similar/conserved from insects to higher vertebrates.

      The 70-80% alignment rate reported to the An. culicifacies reference genome significantly erodes this reader's confidence in the integrity of their analyses. That low level of alignment can have dramatic impacts on the estimation of transcript abundance has been repeated demonstrated (see, Srivastava, A., Malik, L., Sarkar, H. et al.. Genome Biol 21, 239, 2020, https://doi.org/10.1186/s13059-020-02151-8). This may (in part) explain why olfactory receptors have been largely absent from this data set.

      Response: We agree with the reviewer that alignment rate could have been better but this should not affect the quantitative information we are referring to in this manuscript. The alignment rates could have impacted the qualitative information which can vary due to multiple reasons including the quality of the reference genome. As it is evident from the analysis that in Ae. aegypti 90% of the reads are aligned to the reference genome, still we did not observe any difference in the abundancy of olfactory receptor genes. Previous microarray analysis in An. gambiae by Rund et.al. 2013, also did not show diel rhythmic expression of any OR genes.

      The issue of species choice is further complicated by questions regarding the An. culicifacies species complex which contains 5 cryptic species. How did the authors confirm they are indeed working with An. culicifacies species A -there is no mention regarding the molecular identification.

      Response: The An. culcifacies species A colony has been colonized at NIMR since 1999, with routine checks performed to verify its purity of species by analyzing inversion genotypes on chromosomes for the presence of sibling species (see the references). But at that time, we had three sibling species--A, B, C; subsequently, we lost B and C. Giving old references will not serve the purpose. Later we verified sibling species A by inversion genotype on chromosome and molecular tools. However, we do not have any published reference for that verified data.

      The species can be identified by performing 28S rDNA-based PCR (Singh et al, 2004) and cytochrome oxidase II-based PCR (Goswami et al 2006). Sequencing can also serve the purpose.


      Singh OP, Goswami G, Nanda N, Raghavendra K, Chandra D, Subbarao SK. An allele-specific polymerase chain reaction assay for the identification of members of Anopheles culicifacies complex. J Biosci. 2004; 29: 275—280 10.1007/bf02702609

      Goswami G, Singh OP, Nanda N, Raghavendra K, Gakhar SK, Subbarao SK. Identification of all members of the Anopheles culicifacies complex using allele-specific polymerase chain reaction assays. Am J Trop Med Hyg. 2006; 75: 454-460. doi: 10.4269/ajtmh.2006.75.454

      Adak T, Kaur S, Singh OP. Comparative susceptibility of different members of the Anopheles culicifacies complex to Plasmodium vivax. Trans R Soc Trop Med Hyg. 1999;93:573–577

      The switch from dsRNAi studies in Aedes to protease inhibitor studies in Anopheles adds to the interspecies confusion.

      Response: Our main goal in this study was to evaluate the function of CYP450 in mosquito’s odor detection and olfactory sensitivity. Our data as well as previous data (Rund et.al. 2011, Rund et.al. 2013) suggesting that the basic mechanism of odor detection and peri-receptor events are similar for both An. gambiae, An. culicifacies and Ae. aegypti, and the role of detoxification genes are very much evidenced from these data. Based on our RNA-Seq data on Ae. aegypti, we shortlisted one CYP450 gene for functional knockdown assays. However, for Anopheles we used An. gambiae for functional validation. Thus, it was not possible for us to select appropriate CYP450 gene from An. gambiae. That is why, we plan for using CYP450 protein inhibitors which block the function of all the CYP450 expressing in the olfactory system of mosquitoes. Expectedly, we also observed much more pronounced reduction of olfactory sensitivity when inhibitors were applied compared to dsRNAi mediated knock-down the function of only one CYP450 protein. These data indicate that Anopheles also possess similar mechanism of perireceptor events for odor detection and CYP450 plays an important role in it.

      The olfactory shifts presented in Fig 3 are somewhat underwhelming. In An. gambiae this mostly seen at very high (to my eyes, non-biologically relevant) 10-1 dilutions. In Aedes, while statistically significant, the EAG values (especially for 4MePhenol) are very low and therefore suspect and unconvincing. It is also unclear how 'Relative EAG Responses' were derived?? Does this mean relative to solvent alone controls??

      Response: Yes, relative EAG response means relative to respective solvent control. We also make necessary changes in the text as well as in the figures for better understanding and representation.

      The same data set seems to have been presented in Figures 3 and 4, with the latter's absence of salient details e.g. haphazard odor concentrations which are seen only when legend is examined). These factors make the inclusion of Figure 4 less obvious.

      Response: Depending on the reviewer’s concern we shifted the Figure 4 into the supplemental data and we are sorry for the miscommunication.

      I am concerned that the data in Figure 5B is derived from only those samples with altered EAGs. I believe that all injected mosquitoes should be assayed in order to better understand the actual efficacy of the treatment. The cherry picking of samples is troubling.

      Response: We pooled five heads for each replicate and we performed the assay with three replicates. That mean we have taken heads from 15 mosquitoes for each experimental setup (control vs knock-down). It is true that we did not consider all the 40 mosquitoes that we used for EAG-recordings. However, we believe that 15 mosquitoes will be a good representation of the population. And the error bars among replicates of the knock-down mosquitoes, compared to the dsLacZ group, clearly indicates the disparity in knock-down efficiency among individuals.

      As is true for earlier figures, Figure 5c-f is lacking critical information about concentration (also not presented in figure legend) and should be done within the context of a multi-point dose response study. The data in its current form is not acceptable.

      Response: We apologize for the mistake for not mentioning the concentration of the inhibitors. Now, we added this information in the revised manuscript.

      The same data concerns apply to Figure 6d-g.

      Response: We apologize for the mistake for not mentioning the concentration of the inhibitors. Now, we added this information in the revised manuscript.

      The inclusion of An. stephensi data Figure S4D seems thrown in as an after-thought and without good reason.

      Response: Our RNA-Seq data on An. culicifacies and Aedes aegypti revealed similar abundance and expression pattern of rhythmic transcripts specifically for peri-receptor transcripts, as reported before by Rund et. al. 2011 & 2013 for Aedes aegypti and Anopheles gambiae. Moreover, we observed significant difference in EAG response between Aedes aegypti and Anopheles gambiae, we hypothesized that higher abundance of rhythmic peri-receptor transcripts possibly has correlation with high EAG response in Anopheles. Therefore, to get an idea about the EAG response for other Anopheles sp. we used An. stephensi, and observed similar difference in EAG response. Though, it will be interesting to compare time-dependent response between the two Anopheles species, it is not our primary interest and objectives, and is beyond the scope of the current MS and the objective can be elaborated further in future.

      I am unsure how shifts in CNS levels of P450 or serine proteases impact peripheral EAG recordings? This is especially so given that any effects on synaptic plasticity/efficacy that might occur are expected to be downstream of the peripheral antennae being recorded in EAGs. The authors do not do a great job explaining away that paradox even though that section in the discussion seems overly speculative.

      Response: We agree with the reviewer that EAG study is not enough/appropriate to comment on the effect of central-brain processing in the regulation of olfactory processes. Further validation by either neuroimaging or beavioral studies are needed to make any conclusion. And we clearly mention in the MS that our data indirectly indicating this function of serine protease and further confirmatory studies are needed to proof this hypothesis. However, it is not possible for us to perform all the experiments now, due to technical and infrastructural limitations. Thus, we hypothesized it as future research endeavour. Moreover, considering the reviewer’s concern we have modified the text and removed the overstatements and speculations.

      The authors discussion on peri-receptor protein oscillation seems premature given the data that is presented (regardless of the caveats discussed above) center on transcript abundance. There is no data on protein abundance, which while related, is an entirely different question/issue.

      Response: Yes, we agree that our hypothesis of peri-receptor protein oscillation is based on our RNA-Seq data. However, later we validated our hypothesis by knock-down studies in mosquitoes as well as we used CYP450 protein inhibitors, where also we observed significant results of decrease in olfactory sensitivity. It is true that we do not have any data on protein abundance, but several previous studies along with our data showed the similar expression profiling of peri-receptor genes, which clearly indicates that the rhythmic expression pattern of these genes are conserved among mosquitoes. None of the previous studies address the hypothesis regarding the peri-receptor events and possible function of XMEs in odorant detection, which is the uniqueness of our study. Therefore, we believe that after functional validation by dsRNAi and inhibitor study, we are able to validate our hypothesis for scientific acceptance. While, CYP450 has been reported to have crucial role in xenobiotic detoxification, its role in odor detection has not been explored yet. We agree that further biochemical validation is required to see the interaction between CYP450 and odor molecules, and how CYP450 is modifying the odorant chemicals either for its detection or for its inactivation. But, such study is out of the scope of the MS and will be our future research endeavour. However, our current data and the MS will have large impact for designing of strategies for application of insecticides, as overlapping the timing of application of insecticide and rhythmic expression/natural upregulation of XMEs could accelerate the inactivation of insecticides and rapid generation of resistant mosquitoes. Thus, we believe that the current revised MS have potential data and would be valuable for publication.

      Minor concerns:

      1. The authors routinely confuse transcript abundance derived from their RNAseq data with gene expression. The former reflects the steady-state snapshot levels of transcripts encompassing\ synthesis, use and decay while the latter is limited to the rate of transcription requiring nuclear run on or single-nucleus RNAseq approaches. Response: Thank you for your insightful comment. We appreciate your clarification regarding the distinction between transcript abundance and gene expression. In the revised manuscript, we have included a clarification stating that 'transcript abundance is referred to as gene expression, unless explicitly stated otherwise”.

      There are numerous typos, spelling errors and other grammatical mistakes-a copy editor is needed.

      Response: In the revised manuscript, we have carefully corrected the spelling errors and other grammatical mistakes.

      Many of the supplemental figures are error filled, lacking sufficient details and otherwise difficult to parse/understand. I recommend revisiting/removing many of these/

      Response: We have improvised on the supplementary figures in the revised manuscript as suggested by the reviewer.

      __ Reviewer #2 (Significance (Required)):__

      In light of the serious concerns described above there is limited significance to this study. Similarly these concerns erode almost all of any advance to the field this study might have offered. The audience of interest would be highly specialized

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In the present manuscript, the authors analyzed diel oscillations in the brain and olfactory organs' transcriptome of Aedes aegypti and Anopheles culicifacies. The analysis of their RNAseq results showed an effect of time of day on the expression of detoxification genes involved in oxidoreductase and monooxygenase activity. Next, they investigated the effect of time of day on the olfactory sensitivity of Ae. aegypti and An. gambiae and identified the role of CYP450 in odor detection in these species using RNAi. In the last part of the study, they used RNAi to knock down the expression of one of the serine protease genes and observed a reduction in olfactory sensitivity. Overall, the experiments are well-designed and mostly robust (see comment regarding the sample size and data analysis of the EAG experiments) but do not always support the claims of the authors. For example, since no experiments were conducted under constant conditions, the circadian (i.e., driven by the internal clocks) effects are not being quantified here. In addition, knocking down the expression of a gene showing daily variations in its expression and observing an effect on olfactory sensitivity is not sufficient to show its role in the daily olfactory rhythms. Knowledge gaps are not well supported by the literature, and overstatements are made throughout the manuscript. Our detailed comments are listed below.

      Major comments

      Introduction

      Several statements made in the introduction are misleading and suggest that authors are trying to exaggerate the impact of their work. For example, "Furthermore, different species of mosquitoes exhibit plasticity and distinct rhythms in their daily activity pattern, including locomotion, feeding, mating, blood-feeding, and oviposition, facilitating their adaptation into separate time-niches (7, 8), but the underlying molecular mechanism for the heterogenous temporal activity remains to be explored." is not accurate since daily rhythms in mosquitoes' transcriptomes, behavior, and olfactory sensitivity have been the object of several publications. Even though some of them are listed later in the introduction, they contradict the claim made about the knowledge gap. See:

      Rund, S. S., Gentile, J. E., & Duffield, G. E. (2013). Extensive circadian and light regulation of the transcriptome in the malaria mosquito Anopheles gambiae. BMC genomics, 14(1), 1-19

      Rund, S. S., Hou, T. Y., Ward, S. M., Collins, F. H., & Duffield, G. E. (2011). Genome-wide profiling of diel and circadian gene expression in the malaria vector Anopheles gambiae. Proceedings of the National Academy of Sciences, 108(32), E421-E430

      Rund, S. S., Bonar, N. A., Champion, M. M., Ghazi, J. P., Houk, C. M., Leming, M. T., ... & Duffield, G. E. (2013). Daily rhythms in antennal protein and olfactory sensitivity in the malaria mosquito Anopheles gambiae. Scientific reports, 3(1), 2494

      Rund, S. S., Lee, S. J., Bush, B. R., & Duffield, G. E. (2012). Strain-and sex-specific differences in daily flight activity and the circadian clock of Anopheles gambiae mosquitoes. Journal of insect physiology, 58(12), 1609-1619

      Leming, M. T., Rund, S. S., Behura, S. K., Duffield, G. E., & O'Tousa, J. E. (2014). A database of circadian and diel rhythmic gene expression in the yellow fever mosquito Aedes aegypti. BMC genomics, 15(1), 1-9

      Eilerts, D. F., VanderGiessen, M., Bose, E. A., Broxton, K., & Vinauger, C. (2018). Odor-specific daily rhythms in the olfactory sensitivity and behavior of Aedes aegypti mosquitoes. Insects, 9(4), 147

      Rivas, G. B., Teles-de-Freitas, R., Pavan, M. G., Lima, J. B., Peixoto, A. A., & Bruno, R. V. (2018). Effects of light and temperature on daily activity and clock gene expression in two mosquito disease vectors. Journal of Biological Rhythms, 33(3), 272-288

      The knowledge gap brought up in the next paragraph of the introduction doesn't reflect the questions asked by the experiments: "But, how the pacemaker differentially influences peripheral clock activity present in the olfactory system and modulates olfactory sensitivity has not been studied in detail." Specifically, the control of peripheral clocks by the central pacemaker has not been evaluated here.

      "In vertebrates and invertebrates, it is well documented that circadian phase-dependent training can influence olfactory memory acquisition and consolidation of brain functions" should also cite work on cockroaches and kissing bugs:

      Lubinski, A. J., & Page, T. L. (2016). The optic lobes regulate circadian rhythms of olfactory learning and memory in the cockroach. Journal of Biological Rhythms, 31(2), 161-169

      Page, T. L. (2009). Circadian regulation of olfaction and olfactory learning in the cockroach Leucophaea maderae. Sleep and Biological Rhythms, 7, 152-161

      Vinauger, C., & Lazzari, C. R. (2015). Circadian modulation of learning ability in a disease vector insect, Rhodnius prolixus. Journal of Experimental Biology, 218(19), 3110-3117

      The sentence: "Previous studies showed that synaptic plasticity and memory are significantly influenced by the strength and number of synaptic connections (43, 44)." should be nuanced as the role of neuropeptides such as dopamine has also been showed to influence learning and memory in mosquitoes:

      Vinauger, C., Lahondère, C., Wolff, G. H., Locke, L. T., Liaw, J. E., Parrish, J. Z., ... & Riffell, J. A. (2018). Modulation of host learning in Aedes aegypti mosquitoes. Current Biology, 28(3), 333-344

      Wolff, G. H., Lahondère, C., Vinauger, C., Rylance, E., & Riffell, J. A. (2023). Neuromodulation and differential learning across mosquito species. Proceedings of the Royal Society B, 290(1990), 20222118

      Overall, the paragraph dealing with the idea that "circadian phase-dependent training can influence olfactory memory acquisition and consolidation of brain functions" is very confusing. This paragraph discusses mechanisms of learning-induced plasticity but seems to ignore the simplest (most parsimonious) explanations for the circadian regulation of learning (e.g., time-dependent expression of genes involved in memory consolidation). In addition, the sentence quoted above is circumvoluted to simply say that training at different times of the day affects memory acquisition and consolidation. Although the authors did look at one gene involved in neural function, learning, memory, or circadian effects were not analyzed in this study. Please reconsider the relevance of the paragraph.

      The sentence: "But, how the brain of mosquitoes entrains circadian inputs and modulates transcriptional responses that consequently contribute to remodel plastic memory, is unknown." should be rephrased. First, it should be "entrains TO circadian inputs", and second, it suggests that the study will be investigating circadian modulation of learning and memory, which is not the case. Furthermore, the term "remodel plastic memory" is unclear and doesn't seem to relate to any specific cellular or neural processes.

      Given the differences in mosquito chronobiology observed even between strains, why perform the RNAi and EAGs on a different species of Anopheles than the one used for the RNAseq (or vice versa)?

      Results

      "As reported earlier, a significant upregulation of period and timeless during ZT12-ZT18 was observed in both species (Figure 1C)." Please provide effect size and summary statistics.

      "Next, the distribution of peak transcriptional changes in both An. culicifacies and Ae. aegypti was assessed through differential gene-expression analysis. Noticeably, An. culicifacies showed a higher abundance of differentially expressed olfactory genes (Figure 1D)" Please provide effect size and summary statistics.

      "Taken together, the data suggests that the nocturnal An. culicifacies may possess a more stringent circadian molecular rhythm in peripheral olfactory and brain tissues." What do the authors mean by "stringent"? At this point, this should be stated as a working hypothesis, as the statement is not backed up by the data. It is possible that the fewer differentially expressed genes of Aedes aegypti are more central to regulatory networks and cascade into more "stringent" rhythmic control of activities and rhythms.

      The section title: "Circadian cycle differentially and predominantly expresses olfaction-associated detoxification genes in Anopheles and Aedes" doesn't make sense. The expression of genes can be modulated by circadian rhythms, but cycles don't express genes. Please rephrase. In addition, this whole section deals with "circadian rhythms" while no experiment has been conducted under constant conditions. The observed daily variations are therefore diel rhythms until their persistence under constant conditions is established.

      "The downregulated genes of Ae. aegypti did not show any functional categories probably due to the limited transcriptional change." Could the authors explain if this is actually the phenomenon or due to a lack of temporal resolution in the study design (i.e., 4 time points)?

      "a GO-enrichment analysis was unable to track any change in the response-to-stimulus or odorant binding category of genes (including OBPs, CSPs, and olfactory receptors)." This finding doesn't corroborate the statements made previously and doesn't align with previously published studies. Is it due to pitfalls in the study design?

      "In contrast, three different clusters of OBP genes in Ae. aegypti showed a time-of-day dependent distinct peak in expression starting from ZT0-ZT12 (Figure 2F)." Please provide summary statistics.

      "In the case of An. gambiae, the amplitudes of odor-evoked responses were significantly influenced by the doses of all the odorants tested (repeated measure ANOVA, p {less than or equal to} 2e-16) (Figure S4B)." Did the authors use a positive control for the EAGs? How did the authors normalize the responses across the two species? Given the way the data is presented, how were the data normalized to allow inter-species comparisons? In addition, It is highly unlikely that all the mosquito preps used in the EAG assay responded to all the odors tested. If that was the case, then the dataset includes missing data for certain odors and time points. We believe the authors have ensured there are at least a certain number of responses per odor and time point combinations. If this is true, repeated measures ANOVA is not suited for analyzing this data because this statistical technique requires all repeated measures within and across preps without missing values. Also, the authors need to correct the summary statistics for multiple comparisons within this framework to avoid inflating type-I errors. Has this been done?

      "Ae. aegypti was found to be most sensitive to all the odorants (4-methylphenol, β-ocimine, E2-nonenal, benzaldehyde, nonanal, and 3-octanol) during ZT18-20 except sulcatone (Figure 3C - 3H)." Although some of these chemicals are associated with plants and Ae. aegypti is suspected to sugar feed at night, how do the authors explain that the peak olfactory sensitivity occurs at night for compounds such as nonanal? It would be interesting to discuss how these results compare to previous studies such as:

      Eilerts, D. F., VanderGiessen, M., Bose, E. A., Broxton, K., & Vinauger, C. (2018). Odor-specific daily rhythms in the olfactory sensitivity and behavior of Aedes aegypti mosquitoes. Insects, 9(4), 147

      "Additionally, our principal components analysis also illustrates that most loadings of relative EAG responses are higher towards the Anopheles observations (Figure S4C)." The meaning of this sentence is unclear? Please clarify.

      "Taken together these data indicate that An. gambiae may exhibit higher antennal sensitivity to at least five different odorants tested, as compared to Ae. aegypti." As mentioned above, how did the authors normalized across species to allow comparisons? If not normalized, how do you ensure that higher response magnitudes correlate with higher olfactory sensitivity, given potential differences in the morphology or size differences between the two species? Furthermore, An. gambiae has been exclusively used in the EAG assay. Besides the lack of a justification for using a species other than An. culicifacies, the authors have interpreted the EAG results under the assumption that the olfactory sensitivities of An. gambiae and An. culicifacies are comparable. This, however, is a major caveat in the experiment design, given previous studies (indicated below) have reported species-specific variations in olfactory sensitivity. In its present form, the EAG data from An. gambiae is not a piece of appropriate evidence that the authors could use to complement or substantiate the findings from other aspects of this study on An. culicifacies.

      i. Wheelwright, M., Whittle, C. R., & Riabinina, O. (2021). Olfactory systems across mosquito species. Cell and Tissue Research, 383(1), 75-90.

      ii. Wooding, M., Naudé, Y., Rohwer, E., & Bouwer, M. (2020). Controlling mosquitoes with semiochemicals: a review. Parasites & Vectors, 13, 1-20.

      iii. Gupta, A., Singh, S. S., Mittal, A. M., Singh, P., Goyal, S., Kannan, K. R., ... & Gupta, N. (2022). Mosquito Olfactory Response Ensemble enables pattern discovery by curating a behavioral and electrophysiological response database. Iscience, 25(3).

      "Similar to An. gambiae, a comparatively high amplitude response was also observed in An. stephensi (Figure S4D)." This is interesting but what would be even more relevant to the present study is to discuss how the time-dependent responses compare between the two Anopheles species.

      The paragraph titled "Daily temporal modulation of neuronal serine protease impacts mosquito's olfactory sensitivity" is confusing because the authors move on to test the effect of knocking down a serine protease gene (found to be differentially expressed throughout the day) on olfactory sensitivity. While this is interesting in and of itself, the link between the role of this gene in learning-induced plasticity, the circadian modulation of "brain functions" and olfactory sensitivity is 1) unclear and 2) not explicitly tested. We agree with the authors that what has been tested is "the effect of neuronal serine protease on circadian-dependent olfactory responses," but the two paragraphs leading to it seem to be extrapolating functional links that have yet to be determined. In this context, their conclusions that "Our finding highlights that daily temporal modulation of neuronal serine-protease may have important functions in the maintenance of brain homeostasis and olfactory odor responses." is misleading because although they used the hypothetical "may", the link between the temporal modulation of one serine protease gene and the maintenance of brain homeostasis is not explicitly tested here.

      Discussion

      The first sentence of the discussion: "In this study, we provide initial evidence that the daily rhythmic change in the olfactory sensitivity of mosquitoes is tuned with the temporal modulation of molecular factors involved in the initial biochemical process of odor detection i.e., peri-receptor events" is not true since studies from Rund and Duffield previously revealed the daily modulation of OBP gene expression. It also contradicts the next sentence: "The findings of circadian-dependent elevation of xenobiotic metabolizing enzymes in the olfactory system of both Ae. aegypti and An. culicifacies are consistent with previous literature (26, 31), and we postulate that these proteins may contribute to the regulation of odorant detection in mosquitoes."

      The use of "circadian" in the discussion of the results is also misleading as only diel rhythms were evaluated in the present study.

      "Given the potentially larger odor space in mosquitoes (like other hematophagous insects) (16, 58)." This is not really what these references show.

      "Given the potentially larger odor space in mosquitoes (like other hematophagous insects) (16, 58), it can be hypothesized that detection of any specific signal in such a noisy environment, mosquitoes may have evolved a sophisticated mechanism for rapid (i) odor mobilization and (ii) odorant clearance, to prevent anosmia (24)." One could argue that this is a requirement for all insects, regardless of the size of their olfactory repertoire.

      "Taken together, we hypothesize that circadian-dependent activation of the peri-receptor events may modulate olfactory sensitivity and are key for the onset of peak navigation time in each mosquito species." This is not entirely accurate since spontaneous locomotor activity rhythms are also observed in the absence of olfactory stimulation. While "navigation" does imply olfactory-guided behaviors, "peak navigation time" appears to be driven by other processes. See, for example, all studies testing mosquito activity rhythms in locomotor activity monitors.

      "Due to technical limitations, and considering the substantial data on the circadian-dependent molecular rhythmicity" please clarify what the technical limitations were. Is this something that prevented the authors specifically, or something tied to mosquito biology and would prevent anybody from doing it? Also, why couldn't the transcriptomic analysis be performed on An. gambiae?

      "In contrast to An. gambiae, the time-dose interactions had a higher significant impact on the antennal sensitivity of Ae. aegypti. An. gambiae showed a conserved pattern in the daily rhythm of olfactory sensitivity, peaking at ZT1-3 and ZT18-20." These two sentences are very confusing. Doesn't it simply mean that the co-variation is not linear or not the same across odors? In addition, what does it mean for a pattern to be more conserved? How can one conclude about the "conserved" nature of a pattern by looking at time-dependent variations in dose-response curves?

      "Together these data, we interpret that mosquito's olfactory sensitivity possibly does not follow a fixed temporal trait" is unclear and suggests that the authors are discussing global versus odor-specific rhythms. Please rephrase.

      "Moreover, we hypothesize that under standard insectary conditions, mosquitoes may not need to exhibit foraging flight activity either for nectar or blood, and during the time course, it may minimize their olfactory rhythm, which is obligately required for wild mosquitoes." This hypothesis is not supported by the results of the study and contradicts work by others (Rund et al., Eilerts et al., Gentile et., etc).

      The same comment applies to "Therefore, it is reasonable to think that the mosquitoes used for EAG studies may have adapted well under insectary settings and, hence carry weak olfactory rhythm." as this statement is not supported by results of the present study or comparisons of the results to previous studies based on field-caught mosquitoes. Although it is an interesting question to ask in the future, it should be stated as a future research avenue rather than a working hypothesis that results from the present study.

      "Aedes aegypti displayed a peak in antennal sensitivity at ZT18-20 to the higher concentrations of plant and vertebrate host-associated odorants tested. Given the time-of-day dependent multiple peaks (at ZT6-8 and ZT18-20 for benzaldehyde and at ZT12-14 and ZT18-20 for nonanal) in antennal sensitivity to different odorants, our data supports the previous observation of bimodal activity pattern of Ae. aegypti (50)." Rephrase by saying that results are "aligned with the previous observations of bimodal activity". Olfactory rhythms don't "support" the activity patterns because olfactory processes and spontaneous locomotor activity are independent processes.

      "our preliminary data indicate that Anopheles spp. may possess comparatively higher olfactory sensitivity to a substantial number of odorants as compared to Aedes spp." Consider removing this sentence unless the way the data has been normalized to allow for comparisons between species is clarified.

      In "A significant decrease in odorant sensitivity for all the volatile odors tested in the CYP450-silenced Ae. aegypti," please change "silenced" to "reduced" because RNAi doesn't silence (i.e. knockout) gene expression.

      The title "Neuronal serine protease consolidates brain function and olfactory detection" is extremely misleading. Do the authors refer to memory consolidation, which has not been tested here? What is brain function consolidation??

      The reference used in "Despite their tiny brain size, mosquitoes, like other insects, have an incredible power to process and memorize circadian-guided olfactory information (7)." is not appropriate. Also, "circadian-guided" is unclear. Consider replacing it with "circadian-gated".

      What is the "the homeostatic process of the brain"?

      "the temporal oscillation of the sleep-wake cycle of any organism is managed by the encoding of experience during wake, and consolidation of synaptic change during inactive (sleep) phases, respectively (70)." By experience, do the authors refer to learning? This seems out of topic as this process has not been evaluated here.

      "We speculate that after the commencement of the active phase (ZT6-ZT12), the serine peptidase family of proteins in the brain of Ae. aegypti mosquitoes may play an important function in consolidating brain actions (after ZT12) and aid circadian-dependent memory formation." The value of this statement is unclear. Circadian-dependent memory formation is not being evaluated here, and the results from the present study do not directly support this speculation, also because other processes involved in memory formation are not evaluated here. This seems at odds with the literature on learning and memory.

      "Subsequent work on electrophysiological and neuro-imaging studies are needed to demonstrate the role of neuronal-serine proteases in the reorganization of perisynaptic structure." Sure. But the link between "the role of neuronal-serine proteases in the reorganization of perisynaptic structure" and rhythms in olfactory sensitivity is unclear.

      As a general comment, EAGs seem inappropriate to evaluate the effect of the central-brain processing in the regulation of peripheral olfactory processes. This is a critical comment that needs to be considered by the authors and clarified in the manuscript. If rhythms of central brain processes are important for olfactory-guided behaviors, these should be evaluated at the level of the central brain or via behavioral metrics. The effect of the RNAi knockdowns on peripheral sensitivity is interesting, but its link with central processes is unclear and doesn't support the speculations made by the authors about learning and memory.

      Methods

      No explanations are provided for how the EAG data are normalized to allow comparisons between species.

      Figures

      Figure 1: The daily rhythm depicted in A, are not representative of the actual profiles. See: Benoit, J. B., & Vinauger, C. (2022). Chapter 32: Chronobiology of blood-feeding arthropods: influences on their role as disease vectors. In Sensory ecology of disease vectors (pp. 815-849). Wageningen Academic Publishers. Or any other paper on mosquito activity rhythms.

      Figure 3 and 4: The EAG results are plotted twice. This is redundant and misleading as it makes the reader think there is more data than actually presented.

      Figure 5: Please clarify the sample size for each panel. In C - F, what would be used as a reference? In other words, what is a Relative EAG Response of 1? And if it is "relative", are the units really mV? In E and F, it would be great to show how the Ethanol control compares to the no solvent condition. This could be placed in supplementary materials.

      Figures 5 and 6, given the dispersion in the EAG data, the treatments where N=40 appear robust, but the interpretation of results from treatments where N=6 may be limited due to the low sample size. This limitation is visible in Figure 5F, for example, where ABT-Aceto is different from Cont-Aceta but not PBO-Aceto because one individual shows a higher response.

      Figure S6: how does this support that synaptic plasticity is influenced by "Time-of-day dependent modulation of serine protease genes in the brain"?

      Minor comments

      What do the authors mean by "consolidation of brain functions"? Memory consolidation? Please clarify.

      In "Similar to previous studies (26), the expression of a limited number of rhythmic genes was visualized in Ae. aegypti" please replace "visualized" with "observed".

      Figure 2A, please clarify in the caption what FDR stands for.

      In "To further establish this proof-of-concept in An. gambiae, three potent CYP450 inhibitors, aminobenzotriazole(52), piperonyl butoxide(53), and schinandrin A (54), was applied topically on the head capsule of 5-6-day-old female mosquitoes" replace "was applied" with "were applied".

      "Interestingly, our species-time interaction studies revealed that An. gambiae exhibits time-of-day dependent significantly high antennal sensitivity to at least four chemical odorants compared to Ae. aegypti, except phenol." is unclear. Please reword.

      In "Similar observations were also noticed with An. stephensi." replace "noticed" with "made".

      Significance

      Such a study has the potential to be valuable for the field, but its value and significance are hindered by an accumulation of overstatements, the fact that prior work in the field has been minimized or omitted, and a lack of support for the stated conclusions.

      In this context, the advances are only slightly incremental compared to the work produced by Rund et al., and the mechanistic hypotheses emitted to link the genes selected for knockdown experiments and olfactory sensitivity are not clearly supported by the evidence presented here. The main strength of the paper is to show the role of CYP450 in olfactory sensitivity.

      The audience is fairly broad and includes insect neuro-ethologists, molecular biologists, and chronobiologists.

      Our field of expertise:

      • Mosquito chemosensation
      • Learning and memory
      • Chronobiology
      • Electrophysiology
      • Medical entomology
    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful to the reviewers for their constructive comments. The following is our point-to-point responses.

      Reviewer #1 (Recommendations For The Authors):

      Point 1- Abstract: advanced morning peak « opposite » to pdf/pdfr mutants. To my knowledge, the alteration of PDF/PDFR suppresses the morning peak. I am not sure that an advance of the peak is « opposite » to its inhibition?

      Mutants with disruptions in CNMa or CNMaR display advanced morning activity, indicating an enhanced state. Mutants with disruptions in Pdf or Pdfr exhibit no morning anticipation, suggesting a promoting role of these genes in morning anticipation. Therefore, our revised version is: “Specific elimination of each from clock neurons revealed that loss of the neuropeptide CNMa in two posterior dorsal clock neurons (DN1ps) or its receptor (CNMaR) caused advanced morning activity, indicating a suppressive role of CNMa-CNMaR on morning anticipation, opposite to the promoting role of PDF-PDFR on morning anticipation.” (Line 43-51)

      Point 2- Fig 1K-L: the authors should show the sleep phenotype of the homozygous nAChRbeta2 mutant (if not lethal) for a direct comparison with the FRT/FLP genotype and thus evaluate the efficiency of the system.

      We have incorporated sleep profiles of nAChRbeta2 mutant and W1118 into Fig 1K-L. nAChRbeta2 mutants (red) exhibited a sleep amount comparable to that of pan-neural nAChRbeta2 knockout flies (dark red), as shown below.

      Author response image 1.

      Point 3- Dh31-EGFP-FRT expression patterns look different in figS1 A (or fig1 H) and J. why that?

      We re-examined the original data. Both (with R57C10-GAL4 for Fig. S1A, right, S1J, left) are Dh31EGFP.FRT samples displayed below which demonstrated consistent primary expression subsets. Any observed disparities in region "e" could potentially be attributed to variations during dissection.

      Author response image 2.

      Point 4- The knockdown experiments with the elav-switch (RU486) system (fig S2) do not seem to be as efficient as the HS-FLP system (fig 1H-J). The conclusions on the efficiency should be toned down.

      We have revised accordingly: "Near Complete Disruption of Target Genes by GFPi and Flp-out Based cCCTomics" (Line 130): "Knocking out at the adult stage using either hsFLP driven Flp-out (Golic and Lindquist, 1989) (Fig. 1H-1J) or neural (elav-Switch) driven shRNAGFP (Nicholson et al., 2008; Osterwalder et al., 2001) (Fig. S2A-S2I), also resulted in the elimination of most, though not all, GFP signals." (Line 145-149)

      Point 5- Fig 2H-J: the LD behavioral phenotype of pdfr pan-neuronal cripsr does not seem to correspond to what is described in the literature for the pdfr mutant (han), see hyun et al 2005 (no morning anticipation and advanced evening peak). I understand that the activity index is lower than controls but fig2H shows a large anticipatory activity that seems really unusual, and no advanced evening peak is observed. I think that the authors should show the CRISPR flies and pdfr mutants together, to better compare the phenotypes.

      Thank you for pointing out that the phenotypes of pan-neuronal knockout of PDFR by unmodified Cas9 (Fig. 2H-2I of the previous version) whose morning anticipation still exist (Fig, 2H of the previous manuscript), although the significant decrease of morning anticipation index (Fig 2I of the previous manuscript) and advanced evening activity are not as pronounced as observed in han5304 (Fig. 3C in Hyun et al., 2005).

      First, we have separated the activity plots of Fig. 2H of previous manuscript, as shown below. The activity from ZT18 to ZT24 shows a tendency of decreasing from ZT18 to ZT21 and a tendency of increasing from ZT21 to ZT24. The lowest activity before dawn during ZT18 to ZT24 shows at about ZT21, and the activity at ZT18 is comparable to the activity at ZT24. This is significantly different compared to the two control groups, whose activity tends to increase activity from ZT18 to ZT24 with an activity peak at ZT24.

      The activity from ZT6 to ZT12 increased much faster in Pdfr knockout flies and get to an activity plateau at about ZT11 compared to two control groups with a slower activity increasing from ZT6 to ZT12 with no activity plateau but an activity peak at ZT12.

      Author response image 3.

      Second, we have incorporated the phenotype of Pdfr mutants we previously generated (Pdfr-attpKO Deng et al., 2019) with Pdfr pan-neuronal knockout by Cas9.HC. This mutant lacks all seven transmembrane regions of Pdfr (a). The phenotypes are very similar between Pdfr-attpKO flies and Pdfr pan-neuronal knockout flies. In this experimental repeat, we found that a much more obvious advanced evening activity peak is observed both in pan-neuronal knockout flies and Pdfr-attpKO flies.

      To further analyze the phenotypes of Pdfr pan-neuronal knockout flies by Cas9.HC, we referred to the literature. The activity pattern at ZT18 to ZT24 (activity tends to decrease from ZT18 to ZT21 and tends to increase from ZT21 to ZT24, with the lowest activity before dawn occurring at about ZT21, and activity at ZT18 comparable to activity at ZT24) is also reported in Pdfr knockout flies such as Fig3C and 3H in Hyun et al., 2005, Fig 2B in Lear et al., 2009, Fig 3B in Zhang et al., 2010, Fig .5A in Guo et al., 2014, and Fig 5B in Goda et al., 2019. Additionally, the less pronounced advanced evening activity peak compared to han5304 (Fig. 3C in Hyun et al., 2005) is also reported in Fig. 2B in Lear et al., 2009, Fig. 3B in Zhang et al., 2010, and Fig. 5B in Goda et al., 2019. We consider that this difference is more likely to be caused by environmental conditions or recording strategies (DAM system vs. video tracing).

      Therefore, we revised the text to: “Pan-neuronal knockout of Pdfr resulted in a tendency towards advanced evening activity and weaker morning anticipation compared to control flies (Fig. 2H-2I), which is similar to Pdfr-attpKO flies. These phenotypes were not as pronounced as those reported previously, when han5304 mutants exhibited a more obvious advanced evening peak and no morning anticipation (Hyun et al., 2005)”.

      Author response image 4.

      Point 6-The authors should provide more information about the DD behavior (power is low, but how about the period of rhythmic flies, which is shortened in pdf (renn et al) and pdfr (hyun et al) mutants).

      We have incorporated period data into Fig. 2I. Indeed, conditional knock out of Pdfr by Cas9.HC driven by R57C10-GAL4 shortens the period length, as shown below (previous data), also in Fig. 2I of the revised version.

      In the revised Fig. 2I, we tested 45 Pdfr-attpKO flies during DD condition (3 out of 48 flies died during video tracing in DD condition), and only one fly was rhythmic. In contrast, 9 out of 48 Pdfr pan-neuronal knockout flies were rhythmic.

      Author response image 5.

      Point 7- P15 and fig6. The authors indicate that type II CNMa neurons do not show advanced morning activity as type I do, but Figs 6 I and K seem to show some advance although less important than type I. I am not sure that this supports the claim that type I is the main subset for the control of morning activity. This should be toned down.

      We have re-organized Fig. 6 and revised the summary of these results as: “However, Type II neurons-specific CNMa knockout (CNMa ∩ GMR91F02) showed weaker advanced morning activity without advanced morning peak (Fig. 6N), while Type I neurons-specific CNMa knockout did (Fig. 6J), indicating a possibility that these two type I CNMa neurons constitute the main functional subset regulating the morning anticipation activity of fruit fly”. (Line 400-405)

      Point 8- Figs 6M and N: is power determined from DD data? if yes, how about the period and arrhythmicity? Please also provide the LD activity profiles for the mutants and rescued pdfr genotypes.

      Yes, the power was determined from the DD data. In the new version of the manuscript, we have included the activity plots for the LD phase in supplementary Fig S13, as well as shown below (A, B), and the period and arrhythmicity data for the DD phase in Fig. 6S and Table S7. We have also refined the related description as follows: “Moreover, knocking out Pdfr by GMR51H05, GMR79A11 and CNMa GAL4, which cover type I CNMa neurons, decreased morning anticipation of flies (Fig. 6T, Fig. S13B). However, the decrease in morning anticipation observed in the Pdfr knockout by CNMa-GAL4 was not as pronounced as with the other two drivers. Because the presumptive main subset of functional CNMa is also PDFR-positive, there is a possibility that CNMa secretion is regulated by PDF/PDFR signal”. (Line 413-419)

      Author response image 6.

      Point 9- Fig 7: does CNMaR affect DD behavior? This should be tested.

      We analyzed the CNMaR-/- activity in the dark-dark condition over a span of six days. Results revealed a higher power in CNMaR mutants compared to control flies (Power: 93.5±41.9 (CNMaR-/-, n=48) vs 47.3±31.6 (w1118, n=47); Period: 23.7±0.3 h (CNMaR-/-, n=46) vs 23.7±0.3 h (w1118, n=47); arrhythmic rate 2/48 (CNMaR-/-) vs 0/47 (w1118)). Considering that mutating CNMa had no obvious effect on DD behavior, even if CNMaR affects DD behavior, it cannot be attributed to CNMa signal, we did not further repeat and analyze DD behavior of CNMaR mutant. We believe this raises another question beyond the scope of our current discussion.

      Reviewer #2 (Recommendations For The Authors):

      Point 1-One major concern is the apparent discrepancies in clock network gene expression using the Flp-Out and split-LexA approaches compared to what is known about the expression of several transmitter and peptide-related genes. For example, it is well established that the 5th-sLNv expresses CHAT (along with a single LNd), yet there appears to be no choline acetyltransferase (ChAT) signal in the 5th-sLNv as assayed by the Split-LexA approach (Fig. 4). This approach also suggests that DH31 is expressed in the s-LNvs, which, as one of the most intensely studied clock neuron are known to express PDF and sNPF, but not DH31. The results also suggest that the sLNvs express ChAT, which they do not. Remarkably PDF is not included in the expression analysis, this peptide is well known to be expressed in only two subgroups of clock neurons, and would therefore be an excellent test case for the expression analysis in Fig. 4. PDF should therefore be added to analysis shown in Fig. 4. Another discrepancy is PdfR, which split LexA suggests is expressed in the Large LNvs but not the small LNvs, the opposite of what has been shown using both reporter expression and physiology. The authors do acknowledge that discrepancies exist between their data and previous work on expression within the clock network (lines 237 and 238). However, the extent of these discrepancies is not made clear and calls into question the accuracy of Flp-Out and Split LexA approaches.

      The concerns mentioned above are:

      (1) sLNvs express PDF and sNPF but not Dh31;

      (2) ChAT presents in 5th-sLNv and one LNd but not in other sLNvs;

      (3) PDFR presents in sLNvs but not l-LNvs.

      (4) PDF is not included in the analysis.

      To verify the accuracy of these intersection analyses, all related to PDF positive neurons (except 5th-sLNv and LNds), we stained PDF and examined the co-localization between PDF-positive LNvs and the respective drivers ChAT-KI-LexA, Pdfr-KI -LexA, Dh31-KI -LexA, and Pdf-KI -LexA.

      First, Dh31-KI-LexA labeled four s-LNvs, as shown below (also in Fig. S9A). Therefore, the results of the intersection analysis of Dh31-KI-LexA with Clk856-GAL4 are correct. The difference in the results compared to previous literature is attributed to Dh31-KI-LexA labels different neurons than the previous driver or antibody.

      Second, no s-LNv was labeled by ChAT-KI -LexA as shown below. We rechecked our intersection data and found that we analyzed 10 brains of ChAT-KI-LexA∩Clk856-GAL4 while only two brains showed sLNvs positively. To enhance the accuracy of intersection analysis results, we marked all positive signal records when positive subsets were found in less than 1/3 of the total analyzed brains (Table S4).

      Third, one l-LNv and at least two s-LNvs were labeled by Pdfr-KI-LexA, as shown below (also in Fig. S9B). Fourth, Pdf-KI-LexA labels all PDF-positive neurons, but the intersection analysis by Pdf-KI-LexA and Clk856-GAL4 only showed scattered signals, as shown below (D, also in Fig. S9C). For these cases, we found some positive signals expected but not observed in our dissection. The possible reason could be the inefficiency of LexAop-FRT-myr::GFP driven by LexA. Therefore, our intersection results must miss some positive signals.

      Author response image 7.

      Finally, we revised the text to (Line 286-317):

      To assess the accuracy of expression profiles using CCT drivers, we compared our dissection results with previous reports. Initially, we confirmed the expression of CCHa1 in two DN1s (Fujiwara et al., 2018), sNFP in four s-LNvs and two LNds(Johard et al., 2009), and Trissin in two LNds (Ma et al., 2021), aligning with previous findings. Additionally, we identified the expression of nAChRα1, nAChRα2, nAChRβ2, GABA-B-R2, CCHa1-R, and Dh31-R in all or subsets of LNvs, consistent with suggestions from studies using ligands or agonists in LNvs (Duhart et al., 2020; Fujiwara et al., 2018; Lelito and Shafer, 2012; Shafer et al., 2008) (Table S4).

      Regarding previously reported Nplp1 in two DN1as (Shafer et al., 2006), we found approximately five DN1s positive for Nplp-KI-LexA, indicating a broader expression than previously reported. A similar pattern emerged in our analysis of Dh31-KI-LexA, where four DN1s, four s-LNvs, and two LNds were identified, contrasting with the two DN1s found in immunocytochemical analysis (Goda et al., 2016). Colocalization analysis of Dh31-KI-LexA and anti-PDF revealed labeling of all PDF-positive s-LNvs but not l-LNvs (Fig S9A), suggesting that the differences may arise from the broader labeling of 3' end knock-in LexA drivers or the amplitude effect of the binary expression system. The low protein levels might go undetected in immunocytochemical analysis. This aligns with transcriptome analysis findings showing Nplp1 positive in DN1as, a cluster of CNMa-positive DN1ps, and a cluster of DN3s (Ma et al., 2021), which is more consistent with our dissection.

      Despite the well-known expression of PDF in LNvs and PDFR in s-LNvs (Renn et al., 1999; Shafer et al., 2008), we did not observe stable positive signals for both in Flp-out intersection experiments, although both Pdf-KI-LexA and Pdfr-KI-LexA label LNvs as expected (Fig S9B-S9C). We also noted fewer positive neurons in certain clock neuron subsets compared to previous reports, such as NPF in three LNds and some LNvs (Erion et al., 2016; He et al., 2013; Hermann et al., 2012; Johard et al., 2009; Lee et al., 2006) and ChAT in four LNds and the 5th s-LNv (Johard et al., 2009; Duhart et al., 2020) (Table S4). We attribute this limitation to the inefficiency of LexAop-FRT-myr::GFP driven by LexA, acknowledging that our intersection results may miss some positive signals.

      Point 2-Related to this, the authors rather inaccurately suggest that the field's understanding of PdfR expression within the clock neuron network is "inconsistent" and "variable" (lines 368-377). This is not accurate. It is true that the first attempts to map PdfR expression with antisera and GAL4s were inaccurate. However, subsequent work by several groups has produced strong convergent evidence that with the exception of the l-LNvs after several days post-eclosion, PdfR is expressed in the Cryptochrome expressing a subset of the clock neuron network. This section of the study should be revised.

      We thank the reviewer for pointing this out. As we have already addressed and revised the related part in the RESULTS section (Line 308-317), we have now removed this part from the DISCUSSION section of the revised version.

      Point 3-One minor issue that would avoid unnecessary confusion by readers familiar with the circadian literature is the say that activity profiles are plotted in the study. The authors have centered their averaged activity profiles on the 12h of darkness. This is the opposite of the practice of the field, and it leads to some initial confusion in the examination of the morning and evening peak data. The authors may wish to avoid this by centering their activity plots on the 12h light phase, which would put the morning peak on the left and the evening peak on the right. This is the way the field is accustomed to examining locomotor activity profiles.

      The centering of averaged activity profiles on the 12 h of darkness is done to highlight the phenotype of advanced morning activity. To prevent any confusion among readers, we have included a sentence in the figure legend explaining the difference in our activity profiles compared to previous literatures: "Activity profiles were centered of the 12 h darkness in all figures with evening activity on the left and morning activity on the right, which is different from general circadian literatures. (Fig. 2H legend)" (Line 957-959))

      Point 4-The authors conclude that the loss of PDF and CNMa have opposite effects on the morning peak of locomotor activity (line 392). But they also acknowledge, briefly, that things are not that simple: loss of CNMa causes a phase advance, but loss of PDF causes a loss or reduction in the anticipatory peak. It is still significant to find a peptide transmitter with the clock neuron network that regulates morning activity, but the authors should revise their conclusion regarding the opposing actions of PDF and CNMa, which is not well supported by the data.

      We have revised the relevant parts.

      ABSTRACT: “Specific elimination of each from clock neurons revealed that loss of the neuropeptide CNMa in two posterior dorsal clock neurons (DN1ps) or its receptor (CNMaR) caused advanced morning activity, indicating a suppressive role of CNMa-CNMaR on morning anticipation, opposite to the promoting role of PDF-PDFR on morning anticipation.” (Line 43-48)

      DISCUSSION: “Furthermore, given that the morning anticipation vanishing phenotype of Pdf or Pdfr mutant indicates a promoting role of PDF-PDFR signal, while the enhanced morning anticipation phenotype of CNMa mutant suggests an inhibiting role of CNMa signal, we consider the two signals to be antagonistic.” (Line 492-495)

      Point 5-The authors should acknowledge, cite, and incorporate the substantive discussion of CNMa peptide and the DN1p neuronal class in Reinhard et al. 2022 (Front Physiol. 13: 886432).

      We have revised the text accordingly and cited this paper: “Type I with two neurons whose branches projecting to the anterior region, as in CNMa∩GMR51H05, CNMa∩Pdfr, and CNMa∩GMR79A11 (Fig. 6E, 5G, 6H), and type II with four neurons branching on the posterior side with few projections to the anterior region, as in CNMa∩GMR91F02 (Fig. 6F). These two types of DN1ps’ subsets were also reported and profound discussed previously (Lamaze et al., 2018; Reinhard et al., 2022)”. (Line 393-397)

      Reviewer #3 (Recommendations For The Authors):

      Point 1-Throughout the manuscript figure legends (axis, genotypes, etc) are too small to be appreciated. Fig. 1. Panel A. The labels are very difficult to read.

      We have attempted to enlarge the font as much as possible in the revised version.

      Point 2-Fig. 1. H-J Why is efficiency not mentioned in all the examples?

      In the revised manuscript, the results of Fig 1H-1J are discussed in the revised version (Line 145-147). The reason that we did not calculate the exact efficiency is that the GFP intensity is not stable enough which might change during dissection, mounting or intensity of laser in our experimental process. Therefore, in all results related to GFP signal (Fig. 1B-1J, Fig. S1, Fig. S2, Fig. 2B-2D), we relied on qualitative judgment rather than quantitative judgment, unless the GFP signal was easily quantifiable (such as in cases with limited cells or no GFP signal in the experimental group).

      Point 3-Fig. 1. Panel L, left (light phase): the statistical comparisons are not clearly indicated (the same happens in Figs 3Q and 3R).

      We have now re-arranged Fig. 1L and Fig. 3Q-3R to make the statistical comparisons clear in the new version.

      Point 4-Line 792. Could induced be introduced?

      Yes, we have now corrected this typo.

      Point 5-Fig. S1. Check labels for consistency. GMR57C10 Gal4 driver is most likely R57C10.

      We have now revised the labels (Fig. S1).

      Point 6-Fig. S2. If the experiments were repeated and several brains were observed, the authors should include the efficiency and the number of flies as reported in Fig. S1.

      We have now added the number of flies in Fig. S2 as reported in Fig. S1. As Response to Point 2 mentioned, due to the instability of the GFP signal, we are unable to provide a quantitative efficiency in this context.

      Point 7-Fig S4. The fig legend describes panels I-J which are not shown in the current version of the manuscript.

      We now have deleted them.

      Point 8-Fig 2I. Surprising values for morning anticipation indexes even for controls (0.5 would indicate ¨no anticipation¨; in controls, the expected values would be >>0.5, as most of the activity is concentrated right before the transition. Could the authors explain this unexpected result?

      We have revised the description of the calculation in the methods section (Line 612). After calculating the ratio of the last three hours of activity to the total six hours of activity, the results were further subtracted by 0.5. Therefore, the index should be ≤0.5. When the index is equal to 0, it indicates no morning anticipation.

      Point 9-Fig 2K/L. The authors mention that not all genes are effectively knocked out with their strategy. Could this be accounted for the specific KD strategy, its duration, or the promotor strength? It is surprising no explanation is provided in the text (page 9 line 179).

      In our pursuit of establishing a broadly effective method for gene editing, Fig. 2H-2L and Fig. 2D revealed that previous attempts have fallen short of achieving this objective. The observed inefficiency may be attributed to the intensity of the promoter, resulting in inadequate expression. Alternatively, the insufficient duration of the operation may also contribute to the lack of success. However, in the context of sleep and rhythm research applications, the age of the fruit fly tests is typically fixed, limiting the potential to enhance efficiency by extending the manipulation time. Moreover, increasing the expression level may pose challenges related to cytotoxicity, as reported in previous studies (Port et al., 2014). We refrain from offering specific explanations, as we lack a definitive plan and cannot provide additional robust evidence to support the above speculations. Consequently, in our ongoing efforts, we aim to enhance the efficiency of the tool system while operating within the current constraints.

      Point 10-Page 9, line 179. Can the authors include a brief description of the reason for the different modifications? Only one was referenced.

      We have revised related part in the manuscript (Line 223-231):

      Cas9.M9: We fused a chromatin-modulating peptide (Ding et al., 2019), HMGN1 183 (High mobility group nucleosome binding domain 1), at the N-terminus of Cas9 and HMGB1 184 (High mobility group protein B1) at its C-terminus with GGSGP linker, termed Cas9.M9.

      Cas9.M6: We also obtained a modified Cas9.M6 with HMGN1 at the N-terminus and an undefined peptide (UDP) at the C-terminus. (NOTE:UDP was gained by accident)

      Cas9.M0: We replaced the STARD linker between Cas9 and NLS in Cas9.HC with GGSGP the linker (Zhao et al., 2016), termed Cas9.M0

      Point 11-The authors tested the impact of KO nAChR2 across the different versions of conditional disruption (Fig 1K-L, Fig 2L, Fig 3R). It is surprising they observe a difference in daytime sleep upon knocking down with Cas9.HC (2L) but not with Cas9.M9 (3R) and the reverse is seen for night-time sleep. Could the authors provide an explanation? Efficiency is not the issue at stake, is it?

      In Fig. 2K, the day sleep of flies (R57C10-GAL4/UAS-sgRNAnAChRbeta2; UAS-Cas9/+) was significantly decreased compared to flies (R57C10-GAL4/UAS-sgRNAnAChRbeta2; +/+), but not when compared to flies (R57C10-GAL4/+; UAS-Cas9/+). Our criterion for asserting a difference is that the experimental group must show a significant distinction from both control groups. Therefore, we concluded that there was no significant difference between the experimental group and the control groups in Fig. 2K.

      Point 12-Fig. 4. Which of the two strategies described in A-B was employed to assemble the expression profile of CCT genes in clock neurons shown in C? This information should be part of the fig legend.

      We have now revised the legend as follows: “(A-B) Schematic of intersection strategies used in Clk856 labelled clock neurons dissection, Flp-out strategy (A) and split-LexA strategy (B). The exact strategy used for each gene is annotated in Table S5.”

      Point 13-Similarly, how many brains were analyzed to give rise to the table shown in C?

      We have now revised the legend of Table S4 to address this concern. As indicated in: “The largest N# for each gene in Table S4 is the brain number analyzed for each gene”.

      Point 14-Finally, the sentence ¨The figure is...¨ requires revision.

      We have now revised it: “The exact cell number for each subset is annotated in Table S4”.

      Point 15-Legend to Table S3. The authors have done an incredible job testing many gRNAs for each gene potentially relevant for communication. However, there is very little information to make the most out of it; for instance, the legend does not inform why many of the targeted genes do not appear to have been tested any further. It would be useful to the reader to discern whether despite being the 3 most efficient gRNAs, they were still not effective in targeting the gene of interest, or whether they showed off-targets, or it was simply a matter of testing the educated guesses. This information would be invaluable for the reader.

      First, we designed and generated transgenic UAS-sgRNA fly lines for all these sgRNAs. We randomly selected 14 receptor genes, known for their difficulty in editing based on our experience, to assess the efficiency of our strategy, as depicted in Fig. 3M-3P, Fig. S5, and Fig. S6. We believe these results are representative and indicative of the efficiency of sgRNAs designed using our process and applied with the modified Cas9.

      Secondly, we acknowledge your valid concern. While we selected sgRNAs with no predicted off-target effects through various prediction models (outlined in the Methods under C-cCCTomics sgRNA design), we did not conduct whole-genome sequencing. Consequently, we can only assert that the off-target possibility is relatively low. To address potential misleading effects arising from off-target concerns, it is essential to validate these results through mutants, RNAi, or alternative UAS-sgRNAs targeting the same gene.

      Point 16-Table S4. Some of the data presented derives from observations made in 1-2 brains for a specific cluster; isn´t it too little to base a decision on whether a certain gene is (or not) expressed? It is surprising since the same CCT line was observed/analysed in more brains for other clusters. Can the authors explain the rationale?

      The N# number represents the GFP positive number, and we have revised the legend of Table S4. The largest N# number denotes the total number of brains analyzed for a specific CCT line. It's possible that, due to variations in our dissection or mounting process, some clusters were only observed in 1-2 brains out of the total brains analyzed. To enhance the accuracy of intersection analysis results, we marked all positive signal records when positive subsets were found in less than 1/3 of the total analyzed brains (Table S4).

      Point 17-The paragraph describing this data in the results section needs revising (lines 233-243).

      We have now revised this. (Line 286-317)

      Point 18-While it is customary for authors to attempt to improve the description of the activity patterns by introducing new parameters (i.e. MAPI and EAPI, lines 253-258) it would be interesting to understand the difference between the proposed method and the one already in use (which compares the same parameter, i.e., the slope (defined as ¨the slope of the best-fitting linear regression line over a period of 6 h prior to the transition¨, i.e., Lamaze et al. 2020 and many others). Is there a need to introduce yet another one?

      This approach is necessary. The slope defined by Lamaze et al. utilizes data from only 2 time points, which may not accurately capture the pattern within a period before light on or off. Linear regression is not well-suited for a single fly due to the high variability in activity at each time point, making it challenging to fit the model at the individual level. The parameters we have introduced (MAPI and EAPI) in this paper are concise and can be applied at the individual level, effectively reflecting the morning or evening anticipation characteristics of each fly.

      As an alternative, the activity plot of a certain fly line could be represented by an average of all flies' activity in one experiment. This would make linear regression easier to fit. However, several independent experiments are required for statistical robustness, necessitating the inclusion of hundreds of flies for each strain in a single analysis.

      Point 19-In general, the legends of supplementary figures are a bit too brief. S7 and S8: it is not clear which of the two intersectional strategies were used (it would benefit whoever is interested in replicating the experiments). Legend to Fig S8 should read ¨similar to Fig S7¨.

      We have now revised the legend and included “The exact strategy used for each gene is annotated in Table S5” in the legend.

      Point 20-The legend in Table S6 should clearly state the genotypes examined. What does the marking in bold refer to?

      We have now revised annotation of Table S6. Marking in bold refer to results out of one SD compared to control group.

      Point 21-Line 314. The sentence needs revision.

      We have revised these sentences.

      Point 22-Line 391 (and also in the results section). The authors attempt to describe the CNMa phenotype as the opposite of pdf/pdfr mutant phenotypes. However, no morning anticipation/advanced morning anticipation are not necessarily opposite phenotypes.

      We have revised related description.

      ABSTRACT: “Specific elimination of each from clock neurons revealed that loss of the neuropeptide CNMa in two posterior dorsal clock neurons (DN1ps) or its receptor (CNMaR) caused advanced morning activity, indicating a suppressive role of CNMa-CNMaR on morning anticipation, opposite to the promoting role of PDF-PDFR on morning anticipation.” (Line 43-48)

      DISCUSSION: “Furthermore, given that the morning anticipation vanishing phenotype of Pdf or Pdfr mutant indicates a promoting role of PDF-PDFR signal, while the enhanced morning anticipation phenotype of CNMa mutant suggests an inhibiting role of CNMa signal, we consider the two signals to be antagonistic.” (Line 492-495)

      Reference

      Deng, B., Li, Q., Liu, X., Cao, Y., Li, B., Qian, Y., Xu, R., Mao, R., Zhou, E., Zhang, W., et al. (2019). Chemoconnectomics: mapping chemical transmission in Drosophila. Neuron 101, 876-893.e874.

      Ding, X., Seebeck, T., Feng, Y., Jiang, Y., Davis, G.D., and Chen, F. (2019). Improving CRISPR-Cas9 genome editing efficiency by fusion with chromatin-modulating peptides. Crispr j 2, 51-63.

      Duhart, J.M., Herrero, A., de la Cruz, G., Ispizua, J.I., Pírez, N., and Ceriani, M.F. (2020). Circadian Structural Plasticity Drives Remodeling of E Cell Output. Curr Biol 30, 5040-5048.e5045.

      Erion, R., King, A.N., Wu, G., Hogenesch, J.B., and Sehgal, A. (2016). Neural clocks and Neuropeptide F/Y regulate circadian gene expression in a peripheral metabolic tissue. eLife 5, e13552.

      Fujiwara, Y., Hermann-Luibl, C., Katsura, M., Sekiguchi, M., Ida, T., Helfrich-Förster, C., and Yoshii, T. (2018). The CCHamide1 neuropeptide expressed in the anterior dorsal neuron 1 conveys a circadian signal to the ventral lateral neurons in Drosophila melanogaster. Front Physiol 9, 1276.

      Goda, T., Tang, X., Umezaki, Y., Chu, M.L., Kunst, M., Nitabach, M.N.N., and Hamada, F.N. (2016). Drosophila DH31 neuropeptide and PDF receptor regulate night-onset temperature preference. J Neurosci 36, 11739-11754.

      Goda, T., Umezaki, Y., Alwattari, F., Seo, H.W., and Hamada, F.N. (2019). Neuropeptides PDF and DH31 hierarchically regulate free-running rhythmicity in Drosophila circadian locomotor activity. Sci Rep 9, 838.

      Guo, F., Cerullo, I., Chen, X., and Rosbash, M. (2014). PDF neuron firing phase-shifts key circadian activity neurons in Drosophila. Elife 3.

      He, C., Cong, X., Zhang, R., Wu, D., An, C., and Zhao, Z. (2013). Regulation of circadian locomotor rhythm by neuropeptide Y-like system in Drosophila melanogaster. Insect Mol Biol 22, 376-388.

      Hermann, C., Yoshii, T., Dusik, V., and Helfrich-Förster, C. (2012). Neuropeptide F immunoreactive clock neurons modify evening locomotor activity and free-running period in Drosophila melanogaster. J Comp Neurol 520, 970-987.

      Hyun, S., Lee, Y., Hong, S.T., Bang, S., Paik, D., Kang, J., Shin, J., Lee, J., Jeon, K., Hwang, S., et al. (2005). Drosophila GPCR Han is a receptor for the circadian clock neuropeptide PDF. Neuron 48, 267-278.

      Johard, H.A., Yoishii, T., Dircksen, H., Cusumano, P., Rouyer, F., Helfrich-Förster, C., and Nässel, D.R. (2009). Peptidergic clock neurons in Drosophila: ion transport peptide and short neuropeptide F in subsets of dorsal and ventral lateral neurons. J Comp Neurol 516, 59-73.

      Lamaze, A., Krätschmer, P., Chen, K.F., Lowe, S., and Jepson, J.E.C. (2018). A Wake-Promoting Circadian Output Circuit in Drosophila. Curr Biol 28, 3098-3105.e3093.

      Lear, B.C., Zhang, L., and Allada, R. (2009). The neuropeptide PDF acts directly on evening pacemaker neurons to regulate multiple features of circadian behavior. PLoS Biol 7, e1000154.

      Lee, G., Bahn, J.H., and Park, J.H. (2006). Sex- and clock-controlled expression of the neuropeptide F gene in Drosophila. 103, 12580-12585.

      Lelito, K.R., and Shafer, O.T. (2012). Reciprocal cholinergic and GABAergic modulation of the small ventrolateral pacemaker neurons of Drosophila's circadian clock neuron network. J Neurophysiol 107, 2096-2108.

      Ma, D., Przybylski, D., Abruzzi, K.C., Schlichting, M., Li, Q., Long, X., and Rosbash, M. (2021). A transcriptomic taxonomy of Drosophila circadian neurons around the clock. Elife 10.

      Port, F., Chen, H.M., Lee, T., and Bullock, S.L. (2014). Optimized CRISPR/Cas tools for efficient germline and somatic genome engineering in Drosophila. Proc Natl Acad Sci USA 111, E2967-2976.

      Reinhard, N., Schubert, F.K., Bertolini, E., Hagedorn, N., Manoli, G., Sekiguchi, M., Yoshii, T., Rieger, D., and Helfrich-Förster, C. (2022). The Neuronal Circuit of the Dorsal Circadian Clock Neurons in Drosophila melanogaster. Front Physiol 13, 886432.

      Renn, S.C., Park, J.H., Rosbash, M., Hall, J.C., and Taghert, P.H. (1999). A pdf neuropeptide gene mutation and ablation of PDF neurons each cause severe abnormalities of behavioral circadian rhythms in Drosophila. Cell 99, 791-802.

      Shafer, O.T., Helfrich-Förster, C., Renn, S.C., and Taghert, P.H. (2006). Reevaluation of Drosophila melanogaster's neuronal circadian pacemakers reveals new neuronal classes. J Comp Neurol 498, 180-193.

      Shafer, O.T., Kim, D.J., Dunbar-Yaffe, R., Nikolaev, V.O., Lohse, M.J., and Taghert, P.H. (2008). Widespread receptivity to neuropeptide PDF throughout the neuronal circadian clock network of Drosophila revealed by real-time cyclic AMP imaging. Neuron 58, 223-237.

      Zhang, L., Chung, B.Y., Lear, B.C., Kilman, V.L., Liu, Y., Mahesh, G., Meissner, R.A., Hardin, P.E., and Allada, R. (2010). DN1(p) circadian neurons coordinate acute light and PDF inputs to produce robust daily behavior in Drosophila. Curr Biol 20, 591-599.

      Zhao, P., Zhang, Z., Lv, X., Zhao, X., Suehiro, Y., Jiang, Y., Wang, X., Mitani, S., Gong, H., and Xue, D. (2016). One-step homozygosity in precise gene editing by an improved CRISPR/Cas9 system. Cell Res 26, 633-636.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This fundamental study provides an unprecedented understanding of the roles of different combinations of NaV channel isoforms in nociceptors' excitability, with relevance for the design of better strategies targeting NaV channels to treat pain. Although the experimental combination of electrophysiological, modeling, imaging, molecular biology, and behavioral data is convincing and supports the major claims of the work, some conclusions need to be strengthened by further evidence or discussion. The work may be of broad interest to scientists working on pain, drug development, neuronal excitability, and ion channels.

      Reviewer #1 (Public Review):

      Summary:

      In this work, Xie, Prescott, and colleagues have reevaluated the role of Nav1.7 in nociceptive sensory neuron excitability. They find that nociceptors can make use of different sodium channel subtypes to reach equivalent excitability. The existence of this degeneracy is critical to understanding neuronal physiology under normal and pathological conditions and could explain why Nav subtype-selective drugs have failed in clinical trials. More concretely, nociceptor repetitive spiking relies on Nav1.8 at DIV0 (and probably under normal conditions in vivo), but on Nav1.7 and Nav1.3 at DIV4-7 (and after inflammation in vivo).

      The conclusions of this paper are mostly well supported by data, and these findings should be of broad interest to scientists working on pain, drug development, neuronal excitability, and ion channels.

      Strengths:

      (1.1) The authors have employed elegant electrophysiology experiments (including specific pharmacology and dynamic clamp) and computational simulations to study the excitability of a subpopulation of DRGs that would very likely match with nociceptors (they take advantage of using transgenic mice to detect Nav1.8-expressing neurons). They make a strong point showing the degeneracy that occurs at the ion channel expression level in nociceptors, adding this new data to previous observations in other neuronal types. They also demonstrate that the different Nav subtypes functionally overlap and are able to interchange their "typical" roles in action potential generation. As Xie, Prescott, and colleagues argue, the functional implications of the degenerate character of nociceptive sensory neuron excitability need to be seriously taken into account regarding drug development and clinical trials with Nav subtype-selective inhibitors.

      Weaknesses:

      (1.2) The next comments are minor criticisms, as the major conclusions of the paper are well substantiated. Most of the results presented in the article have been obtained from experiments with DRG neuron cultures, and surely there is a greater degree of complexity and heterogeneity about the degeneracy of nociceptors excitability in the "in vivo" condition. Indeed, the authors show in Figures 7 and 8 data that support their hypothesis and an increased Nav1.7's influence on nociceptor excitability after inflammation, but also a higher variability in the nociceptors spiking responses. On the other hand, DRG neurons targeted in this study (YFP (+) after crossing with Nav1.8-Cre mice) are >90% nociceptors, but not all nociceptors express Nav1.8 in vivo. As shown by Li et al., 2016 ("Somatosensory neuron types identified by high-coverage single-cell RNA-sequencing and functional heterogeneity"), there is a high heterogeneity of neuron subtypes within sensory neurons. Therefore, some caution should be taken when translating the results obtained with the DRG neuron cultures to the more complex "in vivo" panorama.

      We agree that most but not all Nav1.8+ DRG cells are nociceptors and that not all nociceptors express Nav1.8. We targeted small neurons that also express (or at some point expressed) Nav1.8, thus excluding larger neurons that express Nav1.8. This allowed us to hone in on a relatively homogeneous set of neurons, which is crucial when testing different neurons to compare between conditions (as opposed to testing longitudinally in the same neuron, which is not feasible). We expect all neurons are degenerate but likely on the basis of different ion channel combinations. Indeed, even within small Nav1.8+ neurons, other channels that we did not consider likely contribute to the degenerate regulation (as now better reflected in the revised Discussion).

      That said, there are multiple sources of heterogeneity. We suspect that heterogeneity is more increased after inflammation than after axotomy because all DRG neurons experience axotomy when cultured whereas neurons experience inflammation differently in vivo depending on whether their axon innervates the inflamed area (now explained on lines 214-215). This is not so much about whether the insult occurs in vivo or in vitro, but about how homogeneously neurons are affected by the insult. Granted, neurons are indeed more likely to be heterogeneously affected in vivo since conditions are more complex. But our goal in testing PF-71 in behavioral tests (Fig. 8) was to show that changes observed in nociceptor excitability in Figure 7, despite heterogeneity, were predictive of changes in drug efficacy. In short, we establish Nav interchangeability by comparing neurons in culture (Figs 1-6), but we then show that similar Nav shifts can develop in vivo (Fig 7) with implications for drug efficacy (Fig 8). Such results should alert readers to the importance of degeneracy for drug efficacy (which is our main goal) even without a complete picture of nociceptor degeneracy or DRG neuron heterogeneity. Additions to the Discussion (lines 248-259, 304-308) are intended to highlight these considerations.

      (1.3) Although the authors have focused their attention on Nav channels, it should be noted that degeneracy concerning other ion channels (such as potassium ion channels) could also impact the nociceptor excitability. The action potential AHP in Figure 1, panel A is very different comparing the DIV0 (blue) and DIV4-7 examples. Indeed, the conductance density values for the AHP current are higher at DIV0 than at DIV7 in the computational model (supplementary table 5). The role of other ion channels in order to obtain equivalent excitability should not be underestimated.

      We completely agree. We focused on Nav channels because of our initial observation with TTX and because of industry’s efforts to develop Nav subtype-selective inhibitors, whose likelihood of success is affected by the changes we report. But other channels are presumably changing, especially given observed changes in the AHP shape (now mentioned on lines 304-308). Investigation should be expanded to include these other channels in future studies.

      Reviewer #2 (Public Review):

      Summary:

      The authors have noted in preliminary work that tetrodotoxin (TTX), which inhibits NaV1.7 and several other TTX-sensitive sodium channels, has differential effects on nociceptors, dramatically reducing their excitability under certain conditions but not under others. Partly because of this coincidental observation, the aim of the present work was to re-examine or characterize the role of NaV1.7 in nociceptor excitability and its effects on drug efficacy. The manuscript demonstrates that a NaV1.7-selective inhibitor produces analgesia only when nociceptor excitability is based on NaV1.7. More generally and comprehensively, the results show that nociceptors can achieve equivalent excitability through changes in differential NaV inactivation and NaV expression of different NaV subtypes (NaV 1.3/1.7 and 1.8). This can cause widespread changes in the role of a particular subtype over time. The degenerate nature of nociceptor excitability shows functional implications that make the assignment of pathological changes to a particular NaV subtype difficult or even impossible.

      Thus, the analgesic efficacy of NaV1.7- or NaV1.8-selective agents depends essentially on which NaV subtype controls excitability at a given time point. These results explain, at least in part, the poor clinical outcomes with the use of subtype-selective NaV inhibitors and therefore have major implications for the future development of Nav-selective analgesics.

      Strengths:

      (2.1) The above results are clearly and impressively supported by the experiments and data shown. All methods are described in detail, presumably allow good reproducibility, and were suitable to address the corresponding question. The only exception is the description of the computer model, which should be described in more detail.

      We failed to report basic information such as the software, integration method and time step in the original text. This information is now provided on lines 476-477. Notably, the full code is available on ModelDB plus all equations including the values for all gating parameters are provided in Supplementary Table 5 and values for maximal conductance densities for DIV0 and DIV7 models are provided in Supplementary Table 6. Changes in conductance densities to simulate different pharmacological conditions are reported in the relevant figure legends (now shown in red). We did not include model details in the main text to avoid disrupting the flow of the presentation, but all the model details are reported in the Methods, tables and/or figure legends.

      (2.2) The results showing that nociceptors can achieve equivalent excitability through changes in differential NaV inactivation and expression of different NaV subtypes are of great importance in the fields of basic and clinical pain research and sodium channel physiology and pharmacology, but also for a broad readership and community. The degenerate nature of nociceptor excitability, which is clearly shown and well supported by data has large functional implications. The results are of great importance because they may explain, at least in part, the poor clinical outcomes with the use of subtype-selective NaV inhibitors and therefore have major implications for the future development of Nav-selective analgesics.

      In summary, the authors achieved their overall aim to enlighten the role of NaV1.7 in nociceptor excitability and the effects on drug efficacy. The data support the conclusions, although the clinical implications could be highlighted in a more detailed manner.

      Weaknesses:

      As mentioned before, the results that nociceptors can achieve equivalent excitability through changes in differential NaV inactivation and NaV expression of different NaV subtypes are impressive. However, there is some "gap" between the DRG culture experiments and acutely dissociated DRGs from mice after CFA injection. In the extensive experiments with cultured DRG neurons, different time points after dissociation were compared. Although it would have been difficult for functional testing to examine additional time points (besides DIV0 and DIV47), at least mRNA and protein levels should have been determined at additional time points (DIV) to examine the time course or whether gene expression (mRNA) or membrane expression (protein) changes slowly and gradually or rapidly and more abruptly.

      Characterizing the time course of NaV expression changes is worthwhile but, insofar as such details are not necessary to establish that excitability is degenerate, it was not include in the current study. Furthermore, since mRNA levels do not parallel the functional changes in Nav1.7 (Figure 6A), we do not think it would be helpful to measure mRNA levels at intermediate time points. Measuring protein levels would be more informative, however, as now explained on lines 362-369, neurons were recorded at intermediate time points in initial experiments and showed a lot of variability. Methods that could track fluorescently-tagged NaV channels longitudinally (i.e. at different time points in the same cell) would be well suited for this sort of characterization, but will invariably lead to more questions about membrane trafficking, phosphorylation, etc. We agree that a thorough characterization would be interesting but we think it is best left for a future study.

      It would also be interesting to clarify whether the changes that occur in culture (DIV0 vs. DIV47) are accompanied by (pro-)inflammatory changes in gene and protein expression, such as those known for nociceptors after CFA injection. This would better link the following data demonstrating that in acutely dissociated nociceptors after CFA injection, the inflammationinduced increase in NaV1.7 membrane expression enhances the effect of (or more neurons respond to) the NaV1.7 inhibitor PF-71, whereas fewer CFA neurons respond to the NaV1.8 inhibitor PF-24.

      These are some of the many good questions that emerge from our results. We are not particularly keen to investigate what happens over several days in culture, since this is not so clinically relevant, but it would be interesting to compare changes induced by nerve injury in vivo (which usually involves neuroinflammatory changes) and changes induced by inflammation. Many previous studies have touched on such issues but we are cautious about interpreting transcriptional changes, and of course all of these changes need to be considered in the context of cellular heterogeneity. It would be interesting to decipher if changes in NaV1.7 and NaV1.8 are directly linked so that an increase in one triggers a decrease in the other, and vice versa. But of course many other channels are also likely to change (as discussed above), and they too warrant attention, which makes the problem quite difficult. We look forward to tackling this in future work.

      The results shown explain, at least in part, the poor clinical outcomes with the use of subtypeselective NaV inhibitors and therefore have important implications for the future development of Nav-selective analgesics. However, this point, which is also evident from the title of the manuscript, is discussed only superficially with respect to clinical outcomes. In particular, the promising role of NaV1.7, which plays a role in nociceptor hyperexcitability but not in "normal" neurons, should be discussed in light of clinical results and not just covered with a citation of a review. Which clinical results of NaV1.7-selective drugs can now be better explained and how?

      We wish to avoid speculating on which particular clinical results are better explained because our study was not designed for that. Instead, our take-home message (which is well supported; see Discussion on lines 309-321) is that NaV1.7-selective drugs may have a variable clinical effect because nociceptors’ reliance on NaV1.7 is itself variable – much more than past studies would have readers believe. At the end of the results (line 235), which is, we think, what prompted the reviewer’s comment, we point to the Discussion. The corollary is that accounting for degeneracy could help account for variability in drug efficacy, which would of course be beneficial. The challenge (as highlighted in the Abstract, lines 21-22) is that identifying the dominant Nav subtype to predict drug efficacy is difficult. We certainly don’t have all the answers, but we hope our results will point readers in a new direction to help answer such questions.

      Another point directly related to the previous one, which should at least be discussed, is that all the data are from rodents, or in this case from mice, and this should explain the clinical data in humans. Even if "impediment to translation" is briefly mentioned in a slightly different context, one could (as mentioned above) discuss in more detail which human clinical data support the existence of "equivalent excitability through different sodium channels" also in humans.

      We are not aware of human data that speak directly to nociceptor degeneracy but degeneracy has been observed in diverse species; if anything, human neurons are probably even more degenerate based on progressive expansion of ion channel types, splice variants, etc. over evolution. Of course species differences extend beyond degeneracy and are always a concern for translation, because of a species difference in the drug target itself or because preclinical pain testing fails to capture the most clinically important aspects of pain (which we mention on line 35). Line 39 now reiterates that these explanations for translational difficulties are not mutually exclusive, but that degeneracy deserves greater consideration that is has hitherto received. Indeed, throughout our paper we imply that degeneracy may contribute to the clinical failure of Nav subtype-specific drugs, but those failures are certainly not evidence of degeneracy. In the Discussion (line 320-321), we now cite a recent review article on degeneracy in the context of epilepsy, and point out how parallels might help inform pain research. We wish we had a more direct answer to the reviewer’s request; in the absence of this, we hope our results motivate readers to seek out these answers in future research.

      Although speculative, it would be interesting for readers to know whether a treatment regimen based on "time since injury" with NaV1.7 and NaV1.8 inhibitors might offer benefits. Based on the data, could one hypothesize that NaV1.7 inhibitors are more likely to benefit (albeit in the short term) in patients with neuropathic pain with better patient selection (e.g., defined interval between injury and treatment)?

      We like that our data prompt this sort of prediction. However, this is potentially complicated since the injury may be subtle, which is to say that the exact timing may not be known. There are scenarios (e.g. postoperative pain) where the timing of the insult is known, but in other cases (e.g. diabetic neuropathy) the disease process is quite insidious, and different neurons might have progressed through different stages depending on how they were exposed to the insult. Our own experiments with CFA are a case in point. Notwithstanding the potential difficulties about gauging the time course, any way of predicting which Nav subtype is dominant could help more strategically choose which drug to use.

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors used patch-clamp to characterize the implication of various voltagegated Na+ channels in the firing properties of mouse nociceptive sensory neurons. They report that depending on the culture conditions NaV1.3, NaV1.7, and NaV1.8 have distinct contributions to action potential firing and that similar firing patterns can result from distinct relative roles of these channels. The findings may be relevant for the design of better strategies targeting NaV channels to treat pain.

      Strengths:

      The paper addresses the important issue of understanding, from an interesting perspective, the lack of success of therapeutic strategies targeting NaV channels in the context of pain. Specifically, the authors test the hypothesis that different NaV channels contribute in a plastic manner to action potential firing, which may be the reason why it is difficult to target pain by inhibiting these channels. The experiments seem to have been properly performed and most conclusions are justified. The paper is concisely written and easy to follow.

      Weaknesses:

      (1) The most critical issue I find in the manuscript is the claim that different combinations of NaV channels result in equivalent excitability. For example, in the Abstract it is stated that: "...we show that nociceptors can achieve equivalent excitability using different combinations of NaV1.3, NaV1.7, and NaV1.8". The gating properties of these channels are not identical, and therefore their contributions to excitability should not be the same. I think that the culprit of this issue is that the authors reach their conclusion from the comparison of the (average) firing rate determined over 1 s current stimulation in distinct conditions. However, this is not the only parameter that determines how sensory neurons convey information. For instance, the time dependence of the instantaneous frequency, the actual firing pattern, may be important too. Moreover, the use of 1 s of current stimulation might not be sufficient to characterize the firing pattern if one wants to obtain conclusions that could translate to clinical settings (i.e., sustained pain). A neuron in which NaV1.7 is the main contributor is expected to have a damping firing pattern due to cumulative channel inactivation, whereas another depending mainly on NaV1.8 is expected to display more sustained firing. This is actually seen in the results of the modelling.

      This concern seems to boil down to how equivalent is equivalent? The spike shape or the full inputoutput curve for a DIV0 neuron (Nav1.8-dominant) is never equivalent to what’s seen in a DIV47 neuron (Nav1.7-dominant), but nor are any two DIV0 neurons strictly equivalent, and likewise for any two DIV4-7 neurons. Our point is that DIV0 and DIV4-7 neurons are a far more similar (less discriminable) in their excitability than expected from the qualitative difference in their TTX sensitivity (and from repeated claims in the literature that Nav1.7 is necessary for spike generation in nociceptors). Nav isoforms need not be identical to operate similarly; for instance, Nav1.8 tends to activate at “suprathreshold” voltages, but this depends on the value of threshold; if threshold increases, Nav1.8 can activate at subthreshold voltages (see Fig 5). We have modified lines 155- 175 to help clarify this.

      We completely agree that firing rate is not the only way to convey sensory information, and of course injecting current directly into the cell body via a patch pipette is not a natural stimulus. These are all factors to keep in mind when interpreting our data. Nonetheless, our data show that excitability is similar between DIV0 and DIV 4-7, so much so that data from any one neuron (without pharmacological tests or capacitance measurements) would likely not reveal if that cell is DIV0 or DIV4-7; this “indiscriminability” qualifies as “equivalent” for our purposes, and is consistent with phrasing used by other authors studying degeneracy. Notably, not every DIV4-7 neuron exhibits spike height attenuation (see Fig. 1A), likely because of concomitant changes in the AHP that were not captured in our computer model or directly tested in our experiments. This highlights that other channel changes may also contribute to degeneracy and the maintenance of repetitive spiking.

      (2) In Fig. 1, is 100 nM TTX sufficient to inhibit all TTX-sensitive NaV currents? More common in literature values to fully inhibit these currents are between 300 to 500 nM. The currents shown as TTX-sensitive in Fig. 1D look very strange (not like the ones at Baseline DIV4-7). It seems that 100 nM TTX was not enough, leading to an underestimation of the amplitude of the TTXsensitive currents.

      As now summarized in Supplementary Table 3 (which is newly added), 100 nM TTX is >20x the EC50 for Nav1.3 and Nav1.7 (but is still far below the EC50 for Nav1.8). Based on this, TTXsensitive channels are definitely blocked in our TTX experiments.

      (3) Page 8, the authors conclude that "Inflammation caused nociceptors to become much more variable in their reliance of specific NaV subtypes". However, how did the authors ensure that all neurons tested were affected by the CFA model? It could be that the heterogeneity in neuron properties results from distinct levels of effects of CFA.

      We agree with the reviewer. We also believe that variable exposure to CFA is the most likely explanation for the heightened variability in TTX-sensitivity reported in Figure 7 (now more clearly explained on lines 214-215). One could try co-injecting a retrograde dye with the CFA to label cells innervating the injection site, but differential spread of the CFA and dye are liable to preclude any good concordance. Alternatively, a pain model involving more widespread (systemic) inflammation might cause a more homogeneous effect. But, our main goal with CFA injections was to show that a Nav1.8®Nav1.7 switch can occur in vivo (and is therefore not unique to culturing), and that demonstration is true even if some neurons do not switch. Subsequent testing in Figure 8 shows that enough neurons switch to have a meaningful effect in terms of the behavioral pharmacology. So, notwithstanding tangential concerns, we think our CFA experiments succeeded in showing that Nav channels can switch in vivo and that this impacts drug efficacy.

      Recommendations for the authors:

      All reviewers agreed that these results are solid and interesting. However, the reviewers also raised several concerns that should be addressed by the authors to improve the strength of the evidence presented. Revisions considered to be essential include:

      (1) Discuss how degeneracy concerning other ion channels (such as potassium ion channels) could also impact nociceptor excitability (reviewer #1). Additionally, the translation of results from DRG neuron cultures to "in vivo" nociceptors should be better discussed.

      We have added a new paragraph to the Discussion (line 248-259) to remind readers that despite our focus on Nav channels, other ion channels likely also change (and that these changes involve diverse regulatory mechanisms that require further investigation). Likewise, despite our focus on the changes caused by culturing neurons, we remind readers that subtler, more clinically relevant in vivo perturbations can likewise cause a multitude of changes. We end that paragraph by emphasizing that although accounting for all the contributing components is required to fully understand a degenerate system, meaningful progress can be made by studying a subset of the components. We want to emphasize this because there is some middle ground between focusing on one component at a time (which is the norm) vs. trying to account for everything (which is an infeasible ideal). Additional text on lines 304-308 also addresses related points.

      (2) Discuss how different combinations of NaV channels result in equivalent excitability, in the context of the experimental conditions used (see main comment by reviewer #3). It should also be discussed in more detail which human clinical data support the existence of "equivalent excitability through different sodium channels" also in humans (reviewer #2).

      Regarding the first part of this comment, reviewer 3 wrote in the public review that “The gating properties of these channels are not identical, and therefore their contributions to excitability should not be the same.” Differences in gating properties are commonly used to argue that different Nav subtypes mediate different phases of the spike, for example, that Nav1.7 initiates the spike whereas Nav1.8 mediates subsequent depolarization because Nav1.7 and Nav1.8 activate at perithreshold and suprathrehold voltages, respectively (see lines 134-135, now shown in red). But such comparison is overly simplistic insofar as it neglects the context in which ion channels operate. For instance, if Nav1.7 is not expressed or fully inactivates, voltage threshold will be less negative, enabling Nav1.8 to contribute to spike initiation; in other words, previously “suprathreshold” voltages become “perithreshold”. Figure 5 is dedicated to explaining this context-sensitivity; specifically, we demonstrate with simulations how Nav1.8 takes over responsibility for initiating a spike when Na1.7 is absent or inactivated. Text on lines 155- 184 has been edited to help clarify this. Regarding the second part of this comment, we are not aware of any direct evidence from human sensory neurons that different sodium channels produce equivalent excitability, but that is certainly what we expect. We suggest that failure of Nav subtype-specific drugs is, at least in part, because of degeneracy, but such failures do not demonstrate degeneracy unless other contributing factors can be excluded (which they can’t). Recognizing degeneracy is difficult, and so variability that might be explained by degeneracy will go unexplained or attributed to other factors unless, by design or serendipity, experiments quantify the effects of degeneracy (as we have attempted to do here). We now cite a recent review article on degeneracy and epilepsy (line 320), which addresses relevant themes that might help inform pain research; for instance, most existing antiseizure medications act on multiple targets whereas more recently developed single-target drugs have proven largely ineffective. This is similar to but better documented than for analgesics. With this in mind, we revised the text to emphasize the circumstantial nature of existing evidence and the need to test more directly for degeneracy (lines 320-323).

      (3) Extend the discussion about the poor clinical outcomes with the use of subtype-selective NaV inhibitors. In particular, the promising role of NaV1.7, which plays a role in nociceptor hyperexcitability but not in "normal" neurons, should be discussed in light of clinical results and not just covered with a citation of a review. Which clinical results of NaV1.7-selective drugs can now be better explained and how? (reviewer #2)

      As discussed above, we are cautious avoid speculating on which clinical results are attributable to degeneracy. Instead, our take-home message (see Discussion, lines 309-323) is that NaV1.7selective drugs may have a variable clinical effect because nociceptors’ reliance on NaV1.7 is itself variable – much more than past studies would have readers believe. The corollary is that accounting for degeneracy could help account for variability in drug efficacy, which would of course be beneficial. The challenge (as highlighted in the Abstract, lines 21-22) is that identifying the dominant Nav subtype to predict drug efficacy is not trivial. Interpreting clinical data is also complicated by the fact that we are either dealing with genetic mutations (with unclear compensatory changes) or pharmacological results (where NaV1.7-selective drugs have a multitude of problems that might contribute to their lack of efficacy, separate from effects of degeneracy). We have striven to contextualize our results (e.g. last paragraph of results, lines 222-235). We think this is the most we can reasonably say based on the limitations of existing clinical data.

      (4) Provide a clearer and more detailed description of the computational model (reviewers #2 and #3).

      We added important details on line 476-477 but, in our honest opinion, we think our computational model is thoroughly explained. The issue seems to boil down to whether details are included in the Results vs. being left for the Methods, tables and figure legends. We prefer the latter.

      (5) Better clarify the effects of the CFA model, to provide further evidence relating inflammation with nociceptors variability (reviewers #2 and #3)

      As explained in response to a specific point by reviewer #3, we believe that variable exposure to CFA explains the heightened variability in TTX-sensitivity reported in Figure 7 (now explained on lines 214-215). One could try co-injecting a retrograde dye with the CFA to label cells innervating the injection site, but differential spread of the inflammation and dye are liable to preclude any good concordance. Alternatively, a pain model involving more widespread (systemic) inflammation might cause a more homogeneous effect. But, our main goal with CFA injections was to show that a Nav1.8®Nav1.7 switch can occur in vivo (and is therefore not unique to culturing); that demonstration holds true even if some neurons do not switch. Subsequent testing (Fig 8) shows that enough neurons switch to drug efficacy assessed behaviorally. This is emphasized with new text on lines 225-227. Overall, we think our CFA experiments succeed in showing that Nav channels can switch in vivo and, despite variability, that this occurs in enough neurons to impact drug efficacy.

      (6) Revise the text according to all recommendations raised by the reviewers and listed in the individual reviews.

      Detailed responses are provided below for all feedback and changes to the text were made whenever necessary, as identified in our responses.

      Reviewer #1 (Recommendations For The Authors):

      Minor points/recommendations:

      Protein synthesis inhibition by cercosporamide could be the direct cause of a smaller-thanexpected increase in Nav1.7 levels at DIV5. But for Nav1.8, there is a mitigation in the increased levels at DIV5, that only could be explained by several indirect mechanisms, including membrane trafficking and posttranslational modifications (phosphorylation, SUMOylation, etc.) on Nav1.8 or protein regulators of Nav1.8 channels. The authors suggest that "translational regulation is crucial", but also insinuate that other processes (membrane trafficking, etc.) could contribute to the observed outcome. It is difficult to assess the relative importance of these different explanations without knowing the exact mechanisms that are acting here.

      We agree. We relied on electrophysiology (and pharmacology) to measure functional changes, but we wanted to verify those data with another method. We expected mRNA levels to parallel the functional changes but, when that did not pan out, we proceeded to look at protein levels. Perhaps we should have stopped there, but by blocking protein translation, we show that there is not enough Nav1.7 protein already available that can be trafficked to the membrane. That does not explain why Nav1.8 levels drop. Our immunohistochemistry could not tease apart membrane expression from overall expression, which limits interpretation. We have enhanced the text to discuss this (lines 200-204), but further experiments are needed. Though admittedly incomplete, our initial finding help set the stage for future experiments on this matter.

      Page 15, typo: "contamination from genomic RNA" -> "contamination from genomic DNA" (appears twice).

      This has been corrected on lines 420 and 421.

      Page 17: I could not find the computer code at ModelDB (http://modeldb.yale.edu/267560). It seems to be an old web link. It should be available at some web repository.

      We confirmed that the link works. Entry is password-protected (password = excitability; see line 476). Password protection will be removed once the paper is officially published.

      Page 19, reference 36, typo: "Inhibitio of" -> "Inhibition of".

      This has been corrected (line 557).

      Page 33, typo: "are significantly larger than differences at DIV1" -> "are significantly larger than differences at DIV0".

      This has been corrected (line 796).

      Page 35, figure 6 legend. The number of experiments (n) is not indicated for panel C data.

      N = 3 is now reported (line 828).

      Reviewer #2 (Recommendations For The Authors):

      p. 3/4 and Data of Fig. 6: It should be commented on why days 1-3 were not investigated. An investigation of the time course (by higher frequency testing) would certainly have an added value because it would be possible to deduce whether the changes develop slowly and gradually, or whether the excitability induced by different NaVs changes suddenly. At least mRNA and protein levels should be determined at additional time points to examine the time course or whether gene expression (mRNA) or membrane expression (protein) changes slowly and gradually or rapidly and more abruptly. It would also be interesting to clarify whether the changes that occur in culture (DIV0 vs. DIV4-7) are accompanied by (pro-)inflammatory changes in gene and protein expression, such as those known for nociceptors after CFA injection. Or is the latter question clear in the literature?

      We now explain (lines 362-369) that intermediate time points (DIV1-3) were tested in initial current clamp recordings. Those data showed that TTX-sensitivity stabilized by DIV4 and differed from the TTX-insensitivity observed at DIV0. TTX-sensitivity was mixed at DIV1-3 and crosscell variability complicated interpretation. Subsequent experiments were prioritized to clarify why NaV1.7 is not always critical for nociceptor excitability, contrary to past studies. Our efforts to measure mRNA and protein levels were primarily to validate our electrophysiological findings; we are also interested in deciphering the underlying regulatory processes but this is an entire study on its own. Unfortunately, the existing literature does not help or point to an explanation for the Nav1.7/1.8 shift we observed.

      Our evidence that mRNA levels do not parallel functional changes argues against pursuing transcriptional changes in Nav1.7, though transcriptional changes in other factors might be important. Interpretation of immuno quantification would be complicated by the high variability we observed with the physiology at intermediate time points and, furthermore, we cannot resolve surface expression from overall expression based on available antibodies. Methods conducive to longitudinal measurements would be more appropriate (as now mentioned on line 367-369). In short, a lot more work is required to understand the mechanisms involved in the switch, but we think the existing demonstration suffices to show that NaV1.7 and NaV1.8 protein levels vary, with crucial implications for which Nav subtype controls nociceptor excitability, and important implications for drug efficacy. Explaining why and how quickly those protein levels change will be no small feat is best left for a future study.

      p. 4 and following: In order to enable the interpretation of the used concentration of PF-24, PF71, and ICA, the respective IC50 should be indicated.

      A table (now Supplementary Table 3; line 861) has been added to report EC50 values for all drugs for blocking NaV1.7, NaV1.8 and NaV1.3. The concentrations we used are included on that table for easy comparison.

      p. 5, end of the middle paragraph: Here it should be briefly explained -for less familiar readers- why NaV1.1 cannot be causative (ICA inhibits NaV1.1 and 1.3).

      We now explain (lines 117-120) that NaV1.1 is expressed almost exclusively in medium-diameter (A-delta) neurons whereas NaV1.3 is known to be upregulated in small-diameter neurons, and so the effect we observe in small neurons is most likely via blockade NaV1.3.

      p. 6, lines 4/5: At least once it should read computer model instead of model.

      “Computer” has been added the first time we refer to DIV0 or DIV4-7 computer models (lines 138-139)

      p. 6: the difference between Fig. 4B and Fig. 4 - Figure suppl. 1 should be mentioned briefly.

      We now explain (lines 150-154) that Fig. 4B involves replacing a native channel with a different virtual channel (to demonstrate their interchangeability) whereas and Fig. 4 - Figure supplement 1 involves replacing a native channel with the equivalent virtual channel (as a positive control).

      p. 6/7: the text and the conclusions regarding Figure 5 are difficult to follow. Somewhat more detailed explanations of why which data demonstrate or prove something would be helpful.

      The text describing Figure 5 (lines 155-175) has been revised to provide more detail.

      p. 7, last sentence of the first paragraph: How is this supported by the data? Or should this sentence be better moved to the discussion?

      This sentence (now lines 182-184) is designed as a transition. The first half – “a subtype’s contribution shifts rapidly (because of channel inactivation)” – summarizes the immediately preceding data (Figure 5). The second half – “or slowly (because of [changes in conductance density])” – introduces the next section. The text show in square brackets has been revised. We hope this will be clearer based on revisions to the associated text.

      p. 7, second paragraph, line 3: Please delete one "at both".

      Corrected

      p. 7, second paragraph: Please explain why different time points (DIV4-7, DIV5, or DIV7) were used or studied.

      Initial electrophysiological experiments determined that TTX sensitivity stabilized by DIV 4 (see response to opening point) and we did not maintain neurons longer than 7 days, and so neurons recorded between DIV4 and 7 were pooled. If non-electrophysiological tests were conducted on a specific day within that range, we report the specific day, but any day within the DIV4-7 range is expected to give comparable results. This is now explained on lines 365-367.

      p. 8: the text regarding Fig. 7 should also include the important data (e.g. percentage of neurons showing repetitive spinking) mentioned in the legend.

      This text (lines 216-220) has been revised to include the proportion of neurons converted by PF71 and PF-24 and the associated statistical results.

      Fig. 1: third panel (TTX-sensitive current...) of D & Fig. 2 subpanel of A (Nav1.8 current...). These panels should be explained or mentioned in the text and/or legends.

      We now explain in the figure legends (lines 708-710; 714-715; 736-738) how those currents are found through subtraction.

      Fig. 2 - figure supplement 2. One might consider taking Panel A to Fig. 2 so that the comparison to DIV0 is apparent without switching to Suppl. Figs.

      We left this unchanged so that Figures 2 and 3 are equivalently organized, with negative control data left to the supplemental figures. Elife formatting makes it easy to reach the supplementary figure from the main figure, so we hope this won’t be an impediment to readers.

      Fig. 6 C, middle graph (graph of Nav1.7): Please re-check, whether DIV5 none vs. 24 h and none vs. 120 h are really significantly different with such a low p-value.

      We re-checked the statistics and the difference pointed out by the reviewer is significant at p=0.007. We mistakenly reported p<0.001 for all comparisons, and so this p value has been corrected; all the other p values are indeed <0.001. Notably, the data are summarized as median ± quartile because of their non-Gaussian distribution; this is now explained on line 827 (as a reminder to the statement on lines 461-462). Quartiles are more comparable to SD than to SEM (in that quartiles and SD represent the distribution rather than confidence in estimating the mean, like SEM), and so medians can differ very significantly even if quartiles overlap, as in this case.

      Reviewer #3 (Recommendations For The Authors):

      (1) A critical issue in the manuscript is the use of teleological language. It is likely that this is not the intention, but careful revision of the language should be done to avoid the use of expressions that confer purpose to a biological process. Please, find below a list of statements that I consider require correction.

      • In the Abstract, the first sentence: "Nociceptive sensory neurons convey pain signals to the CNS using action potentials". Neurons do not really "use" action potentials, they have no will or purpose to do so. Action potentials are not tools or means to be "used" by neurons. Other examples of misuse of the verb "use" are found in several other sentences:

      "...nociceptors can achieve equivalent excitability using different combinations of NaV1.3, NaV1.7, and NaV1.8"

      "Flexible use of different NaV subtypes - an example of degeneracy - compromises..."

      "Nociceptors can achieve equivalent excitability using different sodium channel subtypes" "...degeneracy - the ability of a biological system to achieve equivalent function using different components..."

      "...nociceptors can achieve equivalent excitability using different sodium channel subtypes..."

      "Our results show that nociceptors can achieve similar excitability using different NaV channels" "...the spinal dorsal horn circuit can achieve similar output using different synaptic weight combinations..."

      "Contrary to the view that certain ion channels are uniquely responsible for certain aspects of neuronal function, neurons use diverse ion channel combinations to achieve similar function" "In summary, our results show that nociceptors can achieve equivalent excitability using different NaV subtypes"

      “Use” can mean to put into action (without necessarily implying intention). Based on definitions of the word in various dictionaries, we feel we are well within the realm of normal usage of this term. In trying to achieve a clear and succinct writing style, we have stuck with our original word choice.

      • At the end of page 5 and in the legend of Fig. 7, the word "encourage" is not properly used in the sentence "The ability of NaV1.3, NaV1.7 and NaV1.8 to each encourage repetitive spiking is seemingly inconsistent with the common view...". Encouraging is really an action of humans or animals on other humans or animals.

      Like for “use”, we verified our usage in various dictionaries and we do not think that most readers will be confused or disturbed by our word choice. We use “encourage” to explain that increasing NaV1.3, NaV1.7 or NaV1.8 can increase the likelihood of repetitive spiking; we avoided “cause” because the probability of repetitive spiking is not raised to 100%, since other factors must always be considered.

      • In the Abstract and other places in the manuscript, the word "responsibility" seems to be wrongly employed. It is true that one can say, for instance, on page 4 last paragraph "we sought to identify the NaV subtype responsible for repetitive spiking at each time point". However, to confer channels with the human quality of having "responsibility" for something does not seem appropriate. See also page 8 last paragraph, the first paragraph of the Discussion, and the three paragraphs of page 11.

      Again, we must respectfully disagree with the reviewer. We appreciate that this reviewer does not like our writing style but we do not believe that our style violates English norms.

      (2) In the first sentence of the Abstract, nociceptive sensory neurons do not convey "pain signals". Pain is a sensation that is generated in the brain.

      “Pain” is used as an adjective for “signal” and is used to help identify the type of signal. Nonetheless, since the word count allowed for it, we now refer to “pain-related signals” (line 10).

      (3) I do not see the point of plotting the firing rate as a function of relative stimulus amplitude (normalized to the rheobase, e.g., Fig. 1A bottom panels, Fig. 2B, bottom-right, Fig. 2 Supp2A right, Fig. 3 B bottom panels, etc) instead of as a function of the actual stimulus amplitude. I have the impression that this maneuver hides information. This is equivalent to plotting the current amplitudes as a function of the voltage normalized by the voltage threshold for current activation, which is obviously not done.

      This is how the experiments were performed, so it would be impossible to perform the statistical analysis using the absolute amplitudes post-hoc; specifically, stimulus intensities were tested at increments defined relative to rheobase rather than in absolute terms. There are pros and cons to each approach, and both approaches are commonly used. Notably, we report the value of rheobase on the figures so that readers can, with minimal arithmetic, convert to absolute stimulus intensities. No information is hidden by our approach.

      (4) On page 4 it is stated that "We show later that similar changes develop in vivo following inflammation with consequences for drug efficacy assessed behaviourally (see Fig. 8), meaning the NaV channel reconfiguration described above is not a trivial epiphenomenon of culturing". However, what happens in culture may have nothing in common with what happens in vivo during inflammation. Thus, the latter data may not serve to answer whether the culture conditions induce artifacts or not. I suggest tuning down this statement by changing "meaning" to "suggesting".

      On line 97, we now write “suggesting”.

      (5) Page 5, first paragraph, I miss a clear description of the mathematical models. Having to skip to the Methods section to look for the details of the models as the artifices introduced to simulate different conditions is rather inconvenient.

      So as not to disrupt the flow of the presentation with methodological details, we only provide a short description of the model in the Results. We have slightly expanded this to point out that the conductance-based model is also single-compartment (line 111). We provide a very thorough description of our model in the Methods, especially considering all the details provided in Supplementary Tables 1, 5 and 6. We also report conductance densities and % changes in figure legends (lines 722, 747-748; now shown in red). This is also true for Figure 3-figure supplement 2 (lines 756-759). We tried very hard to find a good balance that we hope most readers will appreciate.

      (6) Page 6, second paragraph, simulations do not serve to "measure" currents.

      The sentence been revised to indicate that simulations were used to “infer” currents during different phases of the spike (line 155).

      (7) Page 7, regarding the tile of the subsection "Control of changes in NaV subtype expression between DIV0 and DIV4-7", the authors measured the levels of expression, but not really the mechanisms "controlling" them. I suggest writing "changes in NaV subtype expression between DIV0 and DIV4-7"

      We have removed “control of” from the section title (line 185)

      (8) What was the reason for adding a noise contribution in the model?

      We now explain that noise was added to reintroduce the voltage noise that is otherwise missing from simulations (line 474). For instance, in the absence of noise, membrane potential can approach voltage threshold very slowly without triggering a spike, which does not happen under realistically noisy conditions. Of course membrane potential fluctuates noisily because of stochastic channel opening and a multitude of other reasons. This is not a major issue for this study, and so we think our short explanation should suffice.

      (9) Please, define the concept of degeneracy upon first mention.

      Degeneracy is now succinctly defined in the abstract (line 20).

    1. And if someone is in the bathroom, they’re 10-100 (or 10-200 as the case may be), but they’re definitely not “in the can”, which is what you say when a scene is completed.

      In cinema, " In the can " doesn't mean what we think it does. However the people working on the set know the linguo, and understand it means the scene is complete.

    1. Author Response

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This important study reports a novel mechanism linking DHODH inhibition-mediated pyrimidine nucleotide depletion to antigen presentation. Alternative means of inducing antigen presentation provide therapeutic opportunities to augment immune checkpoint blockade for cancer treatment. While the solid mechanistic data in vitro are compelling, in vivo assessments of the functional relevance of this mechanism are still incomplete.

      Public Reviews:

      We thank all Reviewers for their insightful comments and excellent suggestions.

      Reviewer #1 (Public Review):

      The manuscript by Mullen et al. investigated the gene expression changes in cancer cells treated with the DHODH inhibitor brequinar (BQ), to explore the therapeutic vulnerabilities induced by DHODH inhibition. The study found that BQ treatment causes upregulation of antigen presentation pathway (APP) genes and cell surface MHC class I expression, mechanistically which is mediated by the CDK9/PTEFb pathway triggered by pyrimidine nucleotide depletion.

      No comment from authors

      The combination of BQ and immune checkpoint therapy demonstrated a synergistic (or additive) anti-cancer effect against xenografted melanoma, suggesting the potential use of BQ and immune checkpoint blockade as a combination therapy in clinical therapeutics.

      No comment from authors

      The interesting findings in the present study include demonstrating a novel cellular response in cancer cells induced by DHODH inhibition. However, whether the increased antigen presentation by DHODH inhibition actually contributed to the potentiation of the efficacy of immune-check blockade (ICB) is not directly examined is the limitation of the study.

      No comment from authors for preceding text, comment addresses the following text

      Moreover, the mechanism of the increased antigen presentation pathway by pyrimidine depletion mediated by CDK9/PTEFb was not validated by genetic KD or KO targeting by CDK9/PTEFb pathways.

      We appreciate this comment, and we would like to explain why we did not pursue these approaches. According to DepMap, CRISPR/Cas9-mediated knockout of CDK9 in cancer cell lines is almost universally deleterious, scoring as “essential” in 99.8% (1093/1095) of all cell lines tested (see Author response image 1 below). This makes sense, as P-TEFb is required for productive RNA polymerase II elongation of most mammalian genes. As such, it was not feasible to generate cell lines with stable genetic knockout of CDK9 to test our hypothesis.

      While knockdown of CDK9 by RNA interference could support our results, DepMap data seems to indicate that RNAi-mediated knockdown of CDK9 is generally ineffective in silencing its activity, as this perturbation scored as “essential” in only 6.2% (44/710) of tested cell lines. This suggests that incomplete depletion of CDK9 will likely not be sufficient to block APP induction downstream of nucleotide depletion. Furthermore, RNAi-mediated depletion of CDK9 may trigger transcriptional changes in the cell by virtue of its many documented protein-protein interactions, and it would be difficult to establish a consistent “time zero” at which point CDK9 protein depletion is substantial but secondary effects of this have not yet occurred to a significant degree. These factors constitute major limitations of experiments using RNAi-mediated knockdown of CDK9.

      Author response image 1.

      Essentiality score from CRISPR and RNAi perturbation of CDK9 in cancer cell lines https://depmap.org/portal/gene/CDK9?tab=overview&dependency=RNAi_merged

      At any rate, we provide evidence that three different inhibitors of CDK9 (flavopiridol, dinaciclib, and AT7519) all inhibit our effect of interest (Fig 4B). The same results were observed using a previously validated CDK9-directed proteolysis targeting chimera (PROTAC2), and this was reversed by addition of excess pomalidomide (Fig 4C), which correlated with the presence/absence of CDK9 on western blot under the exact same conditions (Fig 4D).

      It is formally possible that all CDK9 inhibitors we tested are blocking BQ-mediated APP induction by some shared off-target mechanism (or perhaps by two or more different off-target mechanisms) AND this CDK9-independent target also happens to be degraded by PROTAC2. However, this would be an extraordinarily non-parsimonious explanation for our results, and so we contend that we have provided compelling evidence for the requirement of CDK9 for BQ-mediated APP induction.

      Finally, high concentrations of BQ have been reported to show off-target effects, sensitizing cancer cells to ferroptosis, and the authors should discuss whether the dose used in the in vivo study reached the ferroptotic sensitizing dose or not.

      We are intrigued by the results shown to us by Reviewer #1 in the linked preprint (Mishima et al 2022, https://doi.org/10.21203/rs.3.rs-2190326/v1). We have also observed in our unpublished data that very high concentrations of BQ (>150µM) cause loss of cell viability that is not rescued by uridine supplementation and that occurs even in DHODH knockout cells. This effect of high-dose BQ must be DHODH-independent. We also agree that Mishima et al provide compelling evidence that the ferroptosis-sensitizing effect of high-dose BQ treatment is due (at least in large part) to inhibition of FSP1.

      Although we showed that DHODH is strongly inhibited in tumor cells in vivo (Fig 5C), we did not directly measure the concentration of BQ in the tumor or plasma. Sykes et al (PMID: 27641501) found that the maximum plasma concentration (Cmax) for [BQ]free following a single IP administration in C57Bl6/J mice (15mg/kg) is approximately 3µM, while the Cmax for [BQ]total was around 215µM. Because polar drug molecules bound to serum proteins (predominantly albumin) are not available to bind other targets, [BQ]free is the relevant parameter.

      Given a Cmax for [BQ]free of 3µM and half-life of 12.0 hours, we estimate that the steady-state [BQ]free with daily IP injections at this dose is around 4µM. Since we used an administration schedule of 10mg/kg every 24 hours, we estimate that the steady-state plasma [BQ]free in our system was 2.67µM (assuming initial Cmax of 2µM and half-life of 12.0 hours).

      To derive an upper-bound estimate for the Cmax of [BQ]free over the 12-day treatment period (Fig 5A-D), we will use the observed data for 15mg/kg dose, and we will assume that 1) there is no clearance of BQ whatsoever and 2) that [BQ]free increases linearly with increasing [BQ]total. This yields a maximum free BQ concentration of 12 x 3 = 36µM.

      Therefore, we consider it very unlikely that plasma concentrations of free BQ in our experiment exceeded the lower limit of the ferroptosis-sensitizing dose range reported by Mishima et al. However, without direct pharmacokinetic analysis, we cannot say for sure what the maximal [BQ]free was under our experimental conditions.

      Reviewer #2 (Public Review):

      In their manuscript entitled "DHODH inhibition enhances the efficacy of immune checkpoint blockade by increasing cancer cell antigen presentation", Mullen et al. describe an interesting mechanism of inducing antigen presentation. The manuscript includes a series of experiments that demonstrate that blockade of pyrimidine synthesis with DHODH inhibitors (i.e. brequinar (BQ)) stimulates the expression of genes involved in antigen presentation. The authors provide evidence that BQ mediated induction of MHC is independent of interferon signaling. A subsequent targeted chemical screen yielded evidence that CDK9 is the critical downstream mediator that induces RNA Pol II pause release on antigen presentation genes to increase expression. Finally, the authors demonstrate that BQ elicits strong anti-tumor activity in vivo in syngeneic models, and that combination of BQ with immune checkpoint blockade (ICB) results in significant lifespan extension in the B16-F10 melanoma model. Overall, the manuscript uncovers an interesting and unexpected mechanism that influences antigen presentation and provides an avenue for pharmacological manipulation of MHC genes, which is therapeutically relevant in many cancers. However, a few key experiments are needed to ensure that the proposed mechanism is indeed functional in vivo.

      The combination of DHODH inhibition with ICB reflects more of an additive response instead of a synergistic combination. Moreover, the temporal separation of BQ and ICB raises the question of whether the induction of antigen presentation with BQ is persistent during the course of delayed ICB treatment. To confidently conclude that induction of antigen presentation is a fundamental component of the in vivo response to DHODH inhibition, the authors should examine whether depletion of immune cells can reduce the therapeutic efficacy of BQ in vivo.

      We concur with this assessment.

      Moreover, they should examine whether BQ treatment induces antigen presentation in non-malignant cells and APCs to determine the cancer specificity.

      Although we showed that this occurs in HEK-293T cells, we appreciate that this cell line is not representative of human cells of any organ system in vivo. So, we agree it is important to determine if DHODH inhibition induces antigen presentation in human tissues and professional antigen presenting cells, and this is an excellent focus for future studies.

      However, it should also be noted that increased antigen presentation in non-malignant host tissues would not be expected to generate an autoimmune response, because host tissues likely lack strong neoantigens, and whatever immunogenic peptides they may have would likely be presented via MHC-I at baseline (i.e. even in the absence of DHODH inhibitor treatment), since all nucleated cells express MHC-I.

      This argument is strongly supported by clinical experience/data, as DHODH inhibitors (leflunomide and teriflunomide) are commonly used to treat rheumatoid arthritis and multiple sclerosis. While the pathophysiology of these autoimmune syndromes is complex, it is thought that both diseases are driven by aberrant T-cell attack on host tissues, mediated by incorrect recognition of host antigens presented via MHC-I (as well as MHC-II) as “foreign.”

      If increased antigen presentation in host tissues (downstream of DHODH inhibition) could lead to a de novo autoimmune response, then administration of DHODH inhibitors would be expected to exacerbate T-cell driven autoimmune disease rather than ameliorate it. Randomized controlled trials have consistently found that treatment with DHODH inhibitors leads to improvement of rheumatoid arthritis and multiple sclerosis symptoms, which is the opposite of what one would expect if DHODH inhibitors are causing de novo autoimmune reactions in human patients.

      Finally, although the authors show that DHODH inhibition induces expression of both MHC-I and MHC-II genes at the RNA level, only MHC-I is validated by flow cytometry given the importance of MHC-II expression on epithelial cancers, including melanoma, MHC-II should be validated as well.

      We fully agree with this statement. We attempted to quantify cell surface MHC-II expression by FACS using the same method as for MHC-I (Figs 1G-H, 2D, and 3F). We did not detect cell surface MHC-II in any of our cancer cell lines, despite the use of high-dose interferon gamma and other stimulants (which robustly increase MHC-II mRNA in our system) in an attempt to induce expression. However, because we did not use cells known to express MHC-II as a positive control (e.g. B-cell leukemia cell lines or primary splenocytes), we do not know if our results are due to some technical failure (perhaps related to our protocol/reagents) or if they reflect a true absence of cell surface MHC-II in our cell lines.

      If the latter is true, that implies that either 1) MHC-II mRNA is not translated or 2) that it is translated, but our cancer cell lines lack one or more elements of the machinery required for MHC-II antigen presentation.

      In any case, it is important to determine if DHODH inhibition increases MHC-II at the cell surface of cancer cells using appropriate positive and negative controls, as this could have important implications for cancer immunotherapy.

      [As a minor point, melanoma is not an epithelial cancer, as it is derived from neural crest lineage cells (melanocytes)]

      Overall, the paper is clearly written and presented. With the additional experiments described above, especially in vivo, this manuscript would provide a strong contribution to the field of antigen presentation in cancer. The distinct mechanisms by which DHODH inhibition induces antigen presentation will also set the stage for future exploration into alternative methods of antigen induction.

      Reviewer #3 (Public Review):

      Mullen et al present an important study describing how DHODH inhibition enhances efficacy of immune checkpoint blockade by increasing cell surface expression of MHC I in cancer cells. DHODH inhibitors have been used in the clinic for many years to treat patients with rheumatoid arthritis and there has been a growing interest in repurposing these inhibitors as anti-cancer drugs. In this manuscript, the Singh group build on their previous work defining combinatorial strategies with DHODH inhibitors to improve efficacy. The authors identify an increase in expression of genes involved in the antigen presentation pathway and MHC I after BQ treatment and they narrow the mechanism to be strictly pyrimidine and CDK9/P-TEFb dependent. The authors rationalize that increased MHC I expression induced by DHODH inhibition might favor efficacy of dual immune checkpoint blockade. This combinatorial treatment prolonged survival in an immunocompetent B16F10 melanoma model.

      [No comment from authors]

      Previous studies have shown that DHODH inhibitors can increase expression of innate immunity-related genes but the role of DHODH and pyrimidine nucleotides in antigen presentation has not been previously reported. A strength of the manuscript is the use of multiple controls across a panel of cell lines to exclude off-target effects and to confirm that effects are exclusively dependent on pyrimidine depletion. Overall, the authors do a thorough characterization of the mechanism that mediates MHC I upregulation using multiple strategies. Furthermore, the in vivo studies provide solid evidence for combining DHODH inhibitors with immune checkpoint blockade.

      No comment from authors

      However, despite the use of multiple cell lines, most experiments are only performed in one cell line, and it is hard to understand why particular gene sets, cell lines or time points are selected for each experiment. It would be beneficial to standardize experimental conditions and confirm the most relevant findings in multiple cell lines.

      We appreciate this comment, and we understand how the use of various cell lines may seem puzzling. We would like to explain how our cell line panel evolved over the course of the study. Our first indication that BQ caused APP upregulation came from transcriptomics experiments (Figs 1A-D, S1A) performed as part of a previous study investigating BQ resistance (Mullen et al, 2023 Cancer Letters). In that study, we used CFPAC-1 as a model for BQ sensitivity and S2-013 as a model for BQ resistance. We did RNA sequencing +/- BQ in these cell lines to look for gene expression patterns that might underlie resistance/sensitivity to BQ. When analyzing this data, we serendipitously discovered the APP/MHC phenomenon, which gave rise to the present study.

      Our next step was to extend these findings to cancer cell lines of other histologies, and we prioritized cell lines derived from common cancer types for which immunotherapy (specifically ICB) are clinically approved. This is why A549 (lung adenocarcinoma), HCT116 (colorectal adenocarcinoma), A375 (cutaneous melanoma), and MDA-MB-231 (triple-negative breast cancer) cell lines were introduced.

      Because PDAC is considered to have an especially “immune-cold” tumor microenvironment, we reasoned that even dramatically increasing cancer cell antigen presentation may be insufficient to elicit an effective anti-tumor immune response in vivo. So we shifted our focus towards melanoma, because a subset of melanoma patients is very responsive to ICB and loss of antigen presentation (by direct silencing or homozygous loss-of-function mutations in MHC-I components such as B2M, or by functional loss of IFN-JAK1/2-STAT signaling) has been shown to mediate ICB resistance in human melanoma patients. This is why we extended our findings to B16F10 murine melanoma cells, intending to use them for in vivo studies with syngeneic immunocompetent recipient mice.

      The PDAC cell line MiaPaCa2 was introduced because a collaborator at our institution (Amar Natarajan) happened to have IKK2 knockout MiaPaCa2 cells, which allowed us to genetically validate our inhibitor results showing that IKK1 and IKK2 (crucial effectors for NF-kB signaling) are dispensable for our effect of interest.

      Ultimately, realizing that our results spanned various human and murine cell lines, we chose to use HEK-293T cells to validate the general applicability of our findings to proliferating cells in 2D culture, since HEK-293T cells (compared to our cancer cell lines) have relatively few genetic idiosyncrasies and express MHC-I at baseline.

      The differential in vivo survival depending on dosing schedule is interesting. However, this section could be strengthened with a more thorough evaluation of the tumors at endpoint.

      Overall, this is an interesting manuscript proposing a mechanistic link between pyrimidine depletion and MHC I expression and a novel therapeutic strategy combining DHODH inhibitors with dual checkpoint blockade. These results might be relevant for the clinical development of DHODH inhibitors in the treatment of solid tumors, a setting where these inhibitors have not shown optimal efficacy yet.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The main issue is that it did not directly examine whether the increased antigen presentation by DHODH inhibition contributed to the potentiation of the efficacy of immune-check blockade (ICB). The additional effect of BQ in the xenograft tumor study was not examined to determine if it was due to increased antigen presentation toward the cancer cells or due to merely cell cycle arrest effect by pyrimidine depletion in the tumor cells. The different administration timing of ICB with BQ treatment (Fig 5E) would not be sufficient to answer this issue.

      We agree with this assessment and, and we believe the experiment proposed by Reviewer #2 below (comparing the efficacy of BQ in Rag-null versus immunocompetent recipients) would address this question directly. We also think that using a more immunogenic cell line for this experiment (such as B16F10 transduced with ovalbumin or some other strong neoantigen) would be useful given the poor immunogenicity and lack of any defined strong neoantigen in B16F10 cells. An orthogonal approach would be to engraft cancer cells with or without B2M knockout into immunocompetent recipient mice (+/- BQ treatment) to further implicate MHC-I and antigen presentation. These questions will be addressed in future studies.

      (2) Additionally, in the in vivo study, the increase in surface MHC1 in the protein level in by BQ treatment was not examined in the tumor samples, and it was not confirmed whether increased antigen presentation by BQ treatment actually promoted an anti-cancer immune response in immune cells. To support the story presented in the study, these data would be necessary.

      We attempted to show this by immunohistochemistry, but unfortunately the anti-H2-Db antibody that we obtained for this purpose did not have satisfactory performance to assess this in our tissue samples harvested at necropsy.

      (3) The mechanism of the increased antigen presentation pathway by pyrimidine depletion mediated by CDK9/PTEFb was not validated by genetic KD or KO targeting by CDK9/PTEFb pathways. In general, results only by the inhibitor assay have a limitation of off-target effects.

      Please see our above reply to Reviewer #1 comment making this same point, where we spell out our rationale for not pursuing these experiments.

      (4) High concentrations of BQ (> 50 uM) have been reported to show off-target effects, sensitizing cancer cells to ferroptosis, an iron-mediated lipid peroxidation-dependent cell death, independent of DHODH inhibition (https://www.researchsquare.com/article/rs-2190326/v1). It would be needed to discuss whether the dose used in the in vivo study reached the ferroptotic sensitizing dose or not.

      Please see our above reply to Reviewer #1 comment making this same point, where we explain why we are very confident that the BQ dose administered in our animal experiments was far below the minimum reported BQ dose required to sensitize cancer cells to ferroptosis in vitro.

      Reviewer #2 (Recommendations For The Authors):

      Major Points

      (1) According to the proposed model, BQ mediated induction of antigen presentation is a contributing factor to the efficacy of this therapeutic strategy. If this is true, then depletion of immune cells should reduce the therapeutic efficacy of BQ in vivo. The authors should perform the B16-F10 transplant experiments in either Rag null mice (if available) or with CD8/CD4 depletion. The expectation would be that T cell depletion (or MHC loss with genetic manipulation) should reduce the efficacy of BQ treatment. Absent this critical experiment, it is difficult to confidently conclude that induction of antigen presentation is a fundamental component of the in vivo response to DHODH inhibition.

      We agree with this assessment and the proposed experiment comparing the response in Rag-null versus immunocompetent recipients. We also think that using a more immunogenic cell line for this experiment (such as B16F10 transduced with ovalbumin or some other strong neoantigen) would be useful given the poor immunogenicity and lack of any defined strong neoantigen in B16F10 cells. An orthogonal approach would be to engraft cancer cells with or without B2M knockout into immunocompetent recipient mice (+/- BQ treatment) to further implicate MHC-I and antigen presentation. These questions will be addressed in future studies.

      (2) Does BQ treatment induce antigen presentation in non-malignant cells? APCs? If the induction of antigen presentation is not cancer specific and related to a pyrimidine depletion stress response, then there is a possibility that healthy tissues will also exhibit a similar phenotype, raising concerns about the specificity of a de novo immune response. The authors should examine antigen presentation genes in healthy tissues treated with BQ.

      We agree it is important to examine if our findings regarding nucleotide depletion and antigen presentation are true of APCs and other non-transformed cells, but we are not so concerned about the possibility of raising an immune response against non-malignant host tissues, as explained above. We have reproduced the relevant section below:

      “However, it should also be noted that increased antigen presentation in non-malignant host tissues would not be expected to generate an autoimmune response, because host tissues likely lack strong neoantigens, and whatever immunogenic peptides they may have would likely be presented via MHC-I at baseline, since all nucleated cells express MHC-I.

      This argument is strongly supported by clinical experience/data, as DHODH inhibitors (leflunomide and teriflunomide) are commonly used to treat rheumatoid arthritis and multiple sclerosis. While the pathophysiology of these autoimmune syndromes is complex, it is thought that both diseases are driven by aberrant T-cell attack on host tissues, mediated by incorrect recognition of host antigens presented via MHC-I (as well as MHC-II) as “foreign.”

      If increased antigen presentation in host tissues (downstream of DHODH inhibition) could lead to a de novo autoimmune response, then administration of DHODH inhibitors would be expected to exacerbate T-cell driven autoimmune disease rather than ameliorate it. Randomized controlled trials have consistently found that treatment with DHODH inhibitors leads to improvement of rheumatoid arthritis and multiple sclerosis symptoms, which is the opposite of what one would expect if DHODH inhibitors are causing de novo autoimmune reactions in human patients.”

      (3) In the title, the authors claim that DHODH enhances the efficacy of ICB. However, the experiment shown in Figure 5D does not demonstrate this. The Kaplan Meier curves reflect more of an additive response versus a synergistic combination. Furthermore, the concurrent treatment of BQ and ICB seems to inhibit the efficacy of ICB due to BQ toxicity in immune cells. This result seems to contradict the title.

      We do not agree with this assessment. Given that the effect of dual ICB alone was very marginal, while the effect of BQ monotherapy was quite marked, we cannot conclude from Fig 5 that BQ treatment inhibited ICB efficacy due to immune suppression.

      (4) Related to Point 3, the temporal separation of BQ and ICB raises the question of whether the induction of antigen presentation with BQ is persistent during the course of delayed ICB treatment. One explanation for the results is that BQ treatment reduces tumor burden, and then a subsequent course of ICB also reduces tumor burden but not that the two therapies are functioning in synergy. To address this, the authors should measure the duration of BQ mediated induction of antigen presentation after stopping treatment.

      We agree that the alternative explanation proposed by Reviewer #2 is possible and we appreciate the suggestion to test the stability of APP induction after stopping BQ treatment.

      (5) In Figure 1, the authors show that DHODH inhibition induces expression of both MHC-I and MHC-II genes at the RNA level. However, they only validate MHC-I by flow cytometry. A simple experiment to evaluate the effect of BQ treatment on MHC-II surface expression would provide important additional mechanistic insight into the immunomodulatory effects of DHODH inhibition, especially given recent literature reinforcing the importance of MHC-II expression on epithelial cancers, including melanoma (Oliveira et al. Nature 2022).

      We fully agree with this statement. We attempted to quantify cell surface MHC-II expression by FACS using the same method as for MHC-I (Figs 1G-H, 2D, and 3F). We did not detect cell surface MHC-II in any of our cancer cell lines, despite the use of high-dose interferon gamma and other stimulants (which robustly increase MHC-II mRNA in our system) in an attempt to induce expression. However, because we did not use cells known to express MHC-II as a positive control (e.g. B-cell leukemia cell lines or primary splenocytes), we do not know if our results are due to some technical failure (perhaps related to our protocol/reagents) or if they reflect a true absence of cell surface MHC-II in our cell lines.

      If the latter is true, that implies that either 1) MHC-II mRNA is not translated or 2) that it is translated, but our cancer cell lines lack one or more elements of the machinery required for MHC-II antigen presentation.

      In any case, it is important to determine if DHODH inhibition increases MHC-II at the cell surface of cancer cells using appropriate positive and negative controls, as this could have important implications for cancer immunotherapy.

      [As a minor point, melanoma is not an epithelial cancer, as it is derived from neural crest lineage cells (melanocytes)]

      Minor Points

      (1) The authors show ChIP-seq tracks from Tan et al. for HLA-B. However, given the pervasive effect of Ter treatment across many HLA genes, the authors should either show tracks at additional loci, or provide a heatmap of read density across more loci. This would substantiate the mechanistic claim that RNA Pol II occupancy and activity across antigen presentation genes is the major driver of response to DHODH inhibition as opposed to mRNA stabilization/increased translation.

      We appreciate this suggestion. We have changed Fig 4 by replacing the HLA-B track (old Fig 4E) with a representation of fold change (Ter/DMSO) in Pol II occupancy versus fold change (Ter/DMSO) in mRNA abundance for 23 relevant genes (new Fig 4G); both of these datasets were obtained from the Tan et al manuscript. This new figure panel (Fig 4G) also shows linear regression analysis demonstrating that Pol II occupancy and mRNA expression are significantly correlated for APP genes. While we recognize that this data in itself is not formal proof of our hypothesis, it does strongly support the notion that increased transcription is responsible for the increased mRNA abundance of APP genes that we have observed.

      (2) A compelling way to demonstrate a change in antigen presentation is through mass spectrometry based immunopeptidomics. Performing immunopeptidomic analysis of BQ treated cell lines would provide substantial mechanistic insight into the outcome of BQ treatment. While this approach may be outside the scope of the current work, the authors should speculate on how this treatment may specifically alter the antigenic landscape where future directions would include empirical immunopeptidomics measurements.

      We fully agree with this comment. While the abundance of cancer cell surface MHC-I is an important factor for anticancer immunity, another crucial factor is the identity of peptides that are presented. Treatments that cause presentation of more immunogenic peptides can enhance T-cell recognition even in the absence of a relative change in cell surface MHC-I abundance.

      While we did not perform the immunopeptidomics experiments described, we can offer some speculation regarding this comment. As shown in Fig 1D-E, transcriptomics experiments suggest that immunoproteasome subunits (PSMB8, PSMB9, PSMB10) are upregulated upon DHODH inhibition. If this change in mRNA levels translates into greater immunoproteasome activity (which was not tested in our study), this would be expected to alter the repertoire of peptides available for presentation and could thereby change the immunopeptidome.

      However, this hypothesis requires direct testing, and we hope future studies will delineate the effects of DHODH inhibition and other cancer therapies on the immunopeptidome, as this area of research will have important clinical implications.

      (3) While the signaling through CDK9 seems convincing, it still does not provide a mechanistic link between depleted pyrimidines and CDK9 activity. The authors should speculate on the mechanism that signals to CDK9.

      We agree with the assessment. A mechanistic link between depleted pyrimidines and CDK9 activity will be a subject of future studies.

      (4) Related to minor point 2, the authors should consider a genetic approach to confirm the importance of CDK9. While the pharmacological approach, including multiple mechanistically distinct CDK9 inhibitors provides strong evidence, an additional experiment with genetic depletion of CDK9 (CRISPR KO, shRNA, etc) would provide compelling mechanistic confirmation.

      Reviewer #1 raised this very same point, and we agree. Please see our reply to Reviewer #1, which details why we did not pursue this approach and argues that the evidence we present is compelling even in absence of genetic manipulation.

      Additionally, please see the new Fig 4E and 4F, which is a repeat of Fig 4B using HCT116 cells. Figure 4E shows that, in this cell line, CDK9 inhibitors (flavopiridol, dinaciclib, and AT7519) block BQ-mediated APP induction, while PROTAC2 does not. Figure 4F shows that (for reasons we cannot fully explain) PROTAC2 does not lead to CDK9 degradation in HCT116 cells. This data strongly implicates CDK9, because it excludes a CDK9-degradation-independent effect of PROTAC2.

      (5) Figure 2B needs a legend.

      Thank you for pointing this out. We have added a legend to Fig 2B.

      (6) The authors should comment in the discussion on how this strategy may be particularly useful in patients harboring genetic or epigenetic loss of interferon signaling, a known mechanism of ICB resistance. Perhaps DHODH inhibition could rescue MHC expression in cells that are deficient in interferon sensing.

      Thank you for this suggestion! We have amended the Discussion section to mention this important point. Please see paragraph 2 of the revised Discussion section where we have added the following text:

      “Because BQ-mediated APP induction does not require interferon signaling, this strategy may have particular relevance for clinical scenarios in which tumor antigen presentation is dampened by the loss or silencing of cancer cell interferon signaling, which has been demonstrated to confer both intrinsic and acquired ICB resistance in human melanoma patients.”

      Reviewer #3 (Recommendations For The Authors):

      The authors present convincing evidence of the mechanism by which pyrimidine nucleotides regulate MHC I levels and about the potential of combining DHODH inhibitors with dual immune checkpoint blockade (ICB). This is an interesting paper given the clinical relevance of DHODH inhibitors. The studies raise some questions, and some points might need clarifying as below:

      • In Figure 2C, why do the authors focus on these two genes in the uridine rescue? These are important genes mediating antigen presentation, but it might be more interesting to see how H2-Db and H2-Kb expression correlate with the protein data shown in Fig 2D. Fig. 2C-2D is a relevant control, so it would be important to validate in a different cancer cell line (e.g. one of the PDAC cell lines used for the RNAseq).

      We appreciate this comment. Although Fig 3C shows that BQ-induced expression of H2-Db, H2-Kb, and B2m is reversed by uridine (in B16F10 cells), we recognize that this was not the best placement for this data, as it can easily be overlooked here since uridine reversal is not the main point of Fig 3C. We have left Fig 3C as is, because we think that the uridine reversal demonstrated in that panel serves as a good internal positive control for reversal of BQ-mediated APP induction in that experiment.

      We have repeated the experiments shown in the original Fig 2C and substituted the original Fig 2C with a new Fig 2C and Fig S2B, which show both Tap1 and Nlrc5 as well as H2-Db, H2-Kb, and B2m after treatment with either BQ (new Fig 2C) or teriflunomide (new Fig S2B). The original Fig S2B is now Fig S2C, and it shows that uridine has no effect on the expression of any of the genes assayed in the new Fig 2C or S2B.

      The reversibility of cell surface MHC-I induction was also validated in HCT116 cells (Fig 3F). We included the uridine reversal in Fig 3F to avoid duplicating the control and BQ FACS data in multiple panels.

      We have also added the qPCR data for HCT116 cells showing this same phenotype (at the mRNA level), which is the new Fig S2D.

      We decided to prioritize HCT116 cells for our mechanistic studies (Figures S2D, S4A, and 4E-F) because previous reports indicate that it is diploid and therefore less genetically deranged compared to our other cancer cell lines.

      • Figure 2F shows an elegant experiment to discard off-target effects related to cell death and to confirm that the increased MHC I expression is uniquely dependent on pyrimidines. DHODH has recently been involved in ferroptosis, a highly immunogenic type of cell death. What are the authors´ thoughts on BQ-induced ferroptosis as a possible contributor to the effects of ICB? Does BQ + ferroptosis inhibitor (ferrostatin) affect cell surface MHC I and/or expression of antigen processing genes?

      The potential role of DHODH in ferroptosis protection (Mao et al 2021) has important implications, so we are glad that multiple reviewers raised questions concerning ferroptosis. We did not directly test the effect of ferroptosis inducing agents (with or without BQ) on MHC-I/APP expression, but that is certainly a worthwhile line of investigation.

      The DHODH/ferroptosis issue is complicated by a study pointed out by Reviewer #1 that challenges the role of DHODH inhibition in BQ-mediated ferroptosis sensitization (Mishima et al, 2022). This study argues that high-dose BQ treatment causes FSP1 inhibition, and this underlies the effect of BQ on the cellular response to ferroptosis-inducing agents.

      Regardless of whether BQ-induced ferroptosis-sensitization is dependent on DHODH, FSP1, or some other factor, the Mao and Mishima studies agree that a relatively high dose of BQ is required to observe these effects (100-200µM for most cell lines and >50µM even in the most ferroptosis-sensitive cell lines). As we explained above, we consider it very unlikely that the in vivo BQ exposure in our experiments (Fig 5) was high enough to cause significant ferroptosis, especially in the absence of any dedicated ferroptosis-inducing agent (which is typically required to cause ferroptosis even in the presence of high-dose BQ).

      • The authors nail down the mechanism to CDK9 (Fig 4). However, all these experiments are performed in 293T cells. I would like to see a repeat of Fig. 4B in a cancer cell line (either PDAC or B16). Also, does BQ have any effect on CDK9 expression/protein levels?

      We have added two figure panels that address this comment (new Fig 4E and 4F). Figure 4E (which is a repeat of Fig 4B with HCT116 cells) shows that CDK9 inhibitors (flavopiridol, AT7519, and dinaciclib) reverse BQ-mediated APP induction in HCT116 cells (this agrees with Fig S4A showing that flavopiridol reverses MHC induction by various nucleotide synthesis inhibitors in this cell line), but PROTAC2 does not. Figure 4F shows that PROTAC2 (for reasons we cannot explain) does not cause CDK9 degradation in HCT116 cells. This adds further support to our thesis that CDK9 is a critical mediator of BQ-mediated APP induction (because how else can this pattern of results be explained?). The text of the Results section has been amended to reflect this.

      We chose to use HCT116 cells for this repeat experiment 1) to align with Fig S4A and 2) because, as previously mentioned, we consider HCT116 to be a good cell line for mechanistic studies because of its relative lack of idiosyncratic genetic features (compared to CFPAC-1, for example, which was derived from a patient with cystic fibrosis).

      • What are the differences in tumor size for the experiment shown in Figure 5E? What about tumor cell death in the ICB vs. BQ+ICB groups?

      Because this was a survival assay, direct comparisons of tumor volumes between groups was not possible at later time points, since mice that die or have to be euthanized are removed from their experimental group, which lowers the average group tumor burden at subsequent time points. Although tumor volume was the most common euthanasia criteria reached, a subset of mice were either found dead or had to be euthanized for other reasons attributed to their tumor burden (moribund state, inability to ambulate or stand, persistent bleeding from tumor ulceration, severe loss of body mass, etc.). This confounds any comparison of endpoint measurements (such as immunohistochemical quantification of tumor cell death markers, T-cell markers, etc.).

      • The different response in the concurrent vs delayed treatment is very interesting. The authors suggest two possible mechanisms to explain this: "1) Concurrent BQ dampens the initial anticancer immune response generated by dual ICB, or b) cancer cell MHC-I and related genes are not maximally upregulated at the time of ICB administration with concurrent treatment". However, and despite the caveat of comparing the in vitro to the in vivo setting, Fig 2D shows upregulation of MHC I already at 24h of treatment in B16 cells. Have the authors checked T cell infiltration in the concurrent and delayed treatment setting?

      For the same reasons described in response to the preceding comment, tumors harvested upon mouse death/euthanasia from our survival experiment were not suitable for cross-cohort comparison of tumor endpoint measurements. An additional experiment in which mice are necropsied at a prespecified time point (before any mice have died or reached euthanasia criteria, as in the experiment for Fig 5A-D) would be required to answer this question.

      • Page 5, line 181 -do the authors mean "nucleotide salvage inhibitors" instead of "synthesis"?

      We believe the reviewer is referring to the following sentence:

      “The other drugs screened included nucleotide synthesis inhibitors (5-fluorouracil, methotrexate, gemcitabine, and hydroxyurea), DNA damage inducers (oxaliplatin, irinotecan, and cytarabine), a microtubule targeting drug (paclitaxel), a DNA methylation inhibitor (azacytidine), and other small molecule inhibitors (Fig 2F).”

      In this context, we believe our use of “synthesis” instead of “salvage” is correct, because methotrexate and 5-FU inhibit thymidylate synthase (which mediates de novo dTTP synthesis), while gemcitabine and hydroxyurea inhibit ribonucleotide reductase (which mediates de novo synthesis of all dNTPs).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations

      Recommendation #1: Address potential confounds in the experimental design:

      (1a) Confounding factors between baseline to early learning. While the visual display of the curved line remains constant, there are at least three changes between these two phases: 1) the presence of reward feedback (the focus of the paper); 2) a perturbation introduced to draw a hidden, mirror-symmetric curved line; 3) instructions provided to use reward feedback to trace the line on the screen (intentionally deceitful). As such, it remains unclear which of these factors are driving the changes in both behavior and bold signals between the two phases. The absence of a veridical feedback phase in which participants received reward feedback associated with the shown trajectory seems like a major limitation.

      (1b) Confounding Factors Between Early and Late Learning. While the authors have focused on interpreting changes from early to late due to the explore-exploit trade-off, there are three additional factors possibly at play: 1) increasing fatigue, 2) withdrawal of attention, specifically related to individuals who have either successfully learned the perturbation within the first few trials or those who have simply given up, or 3) increasing awareness of the perturbation (not clear if subjective reports about perturbation awareness were measured.). I understand that fMRI research is resource-intensive; however, it is not clear how to rule out these alternatives with their existing data without additional control groups. [Another reviewer added the following: Why did the authors not acquire data during a control condition? How can we be confident that the neural dynamics observed are not due to the simple passage of time? Or if these effects are due to the task, what drives them? The reward component, the movement execution, increased automaticity?]

      We have opted to address both of these points above within a single reply, as together they suggest potential confounding factors across the three phases of the task. We would agree that, if the results of our pairwise comparisons (e.g., Early > Baseline or Late > Early) were considered in isolation from one another, then these critiques of the study would be problematic. However, when considering the pattern of effects across the three task phases, we believe most of these critiques can be dismissed. Below, we first describe our results in this context, and then discuss how they address the reviewers’ various critiques.

      Recall that from Baseline to Early learning, we observe an expansion of several cortical areas (e.g., core regions in the DMN) along the manifold (red areas in Fig. 4A, see manifold shifts in Fig. 4C) that subsequently exhibit contraction during Early to Late learning (blue areas in Fig. 4B, see manifold shifts in Fig. 4D). We show this overlap in brain areas in Author response image 1 below, panel A. Notably, several of these brain areas appear to contract back to their original, Baseline locations along the manifold during Late learning (compare Fig. 4C and D). This is evidenced by the fact that many of these same regions (e.g., DMN regions, in Author response image 1 panel A below) fail to show a significant difference between the Baseline and Late learning epochs (see Author response image 1 panel B below, which is taken from supplementary Fig 6). That is, the regions that show significant expansion and subsequent contraction (in Author response image 1 panel A below) tend not to overlap with the regions that significantly changed over the time course of the task (in Author response image 1 panel B below).

      Author response image 1.

      Note that this basic observation above is not only true of our regional manifold eccentricity data, but also in the underlying functional connectivity data associated with individual brain regions. To make this second point clearer, we have modified and annotated our Fig. 5 and included it below. Note the reversal in seed-based functional connectivity from Baseline to Early learning (leftmost brain plots) compared to Early to Late learning (rightmost brain plots). That is, it is generally the case that for each seed-region (A-C) the areas that increase in seed-connectivity with the seed region (in red; leftmost plot) are also the areas that decrease in seed-connectivity with the seed region (in blue; rightmost plot), and vice versa. [Also note that these connectivity reversals are conveyed through the eccentricity data — the horizontal red line in the rightmost plots denote the mean eccentricity of these brain regions during the Baseline phase, helping to highlight the fact that the eccentricity of the Late learning phase reverses back towards this Baseline level].

      Author response image 2.

      Critically, these reversals in brain connectivity noted above directly counter several of the critiques noted by the reviewers. For instance, this reversal pattern of effects argues against the idea that our results during Early Learning can be simply explained due to the (i) presence of reward feedback, (ii) presence of the perturbation or (iii) instructions to use reward feedback to trace the path on the screen. Indeed, all of these factors are also present during Late learning, and yet many of the patterns of brain activity during this time period revert back to the Baseline patterns of connectivity, where these factors are absent. Similarly, this reversal pattern strongly refutes the idea that the effects are simply due to the passage of time, increasing fatigue, or general awareness of the perturbation. Indeed, if any of these factors alone could explain the data, then we would have expected a gradual increase (or decrease) in eccentricity and connectivity from Baseline to Early to Late learning, which we do not observe. We believe these are all important points when interpreting the data, but which we failed to mention in our original manuscript when discussing our findings.

      We have now rectified this in the revised paper, where we now write in our Discussion:

      “Finally, it is important to note that the reversal pattern of effects noted above suggests that our findings during learning cannot be simply attributed to the introduction of reward feedback and/or the perturbation during Early learning, as both of these task-related features are also present during Late learning. In addition, these results cannot be simply explained due to the passage of time or increasing subject fatigue, as this would predict a consistent directional change in eccentricity across the Baseline, Early and Late learning epochs.”

      However, having said the above, we acknowledge that one potential factor that our findings cannot exclude is that they are (at least partially) attributable to changes in subjects’ state of attention throughout the task. Indeed, one can certainly argue that Baseline trials in our study don’t require a great deal of attention (after all, subjects are simply tracing a curved path presented on the screen). Likewise, for subjects that have learned the hidden shape, the Late learning trials are also likely to require limited attentional resources (indeed, many subjects at this point are simply producing the same shape trial after trial). Consequently, the large shift in brain connectivity that we observe from Baseline to Early Learning, and the subsequent reversion back to Baseline-levels of connectivity during Late learning, could actually reflect a heightened allocation of attention as subjects are attempting to learn the (hidden) rewarded shape. However, we do not believe that this would reflect a ‘confound’ of our study per se — indeed, any subject who has participated in a motor learning study would agree that the early learning phase of a task is far more cognitively demanding than Baseline trials and Late learning trials. As such, it is difficult to disentangle this ‘attention’ factor from the learning process itself (and in fact, it is likely central to it).

      Of course, one could have designed a ‘control’ task in which subjects must direct their attention to something other than the learning task itself (e.g., divided attention paradigm, e.g., Taylor & Thoroughman, 2007, 2008, and/or perform a secondary task concurrently (Codol et al., 2018; Holland et al., 2018), but we know that this type of manipulation impairs the learning process itself. Thus, in such a case, it wouldn’t be obvious to the experimenter what they are actually measuring in brain activity during such a task. And, to extend this argument even further, it is true that any sort of brain-based modulation can be argued to reflect some ‘attentional’ process, rather than modulations related to the specific task-based process under consideration (in our case, motor learning). In this regard, we are sympathetic to the views of Richard Andersen and colleagues who have eloquently stated that “The study of how attention interacts with other neural processing systems is a most important endeavor. However, we think that over-generalizing attention to encompass a large variety of different neural processes weakens the concept and undercuts the ability to develop a robust understanding of other cognitive functions.” (Andersen & Cui, 2007, Neuron). In short, it appears that different fields/researchers have alternate views on the usefulness of attention as an explanatory construct (see also articles from Hommel et al., 2019, “No one knows what attention is”, and Wu, 2023, “We know what attention is!”), and we personally don’t have a dog in this fight. We only highlight these issues to draw attention (no pun intended) that it is not trivial to separate these different neural processes during a motor learning study.

      Nevertheless, we do believe these are important points worth flagging for the reader in our paper, as they might have similar questions. To this end, we have now included in our Discussion section the following text:

      “It is also possible that some of these task-related shifts in connectivity relate to shifts in task-general processes, such as changes in the allocation of attentional resources (Bédard and Song, 2013; Rosenberg et al., 2016) or overall cognitive engagement (Aben et al., 2020), which themselves play critical roles in shaping learning (Codol et al., 2018; Holland et al., 2018; Song, 2019; Taylor and Thoroughman, 2008, 2007; for a review of these topics, see Tsay et al., 2023). Such processes are particularly important during the earlier phases of learning when sensorimotor contingencies need to be established. While these remain questions for future work, our data nevertheless suggest that this shift in connectivity may be enabled through the PMC.”

      Finally, we should note that, at the end of testing, we did not assess participants' awareness of the manipulation (i.e., that they were, in fact, being rewarded based on a mirror image path). In hindsight, this would have been a good idea and provided some value to the current project. Nevertheless, it seems clear that, based on several of the learning profiles observed (e.g., subjects who exhibited very rapid learning during the Early Learning phase, more on this below), that many individuals became aware of a shape approximating the rewarded path. Note that we have included new figures (see our responses below) that give a better example of what fast versus slower learning looks like. In addition, we now note in our Methods that we did not probe participants about their subjective awareness re: the perturbation:

      “Note that, at the end of testing, we did not assess participants’ awareness of the manipulation (i.e., that they were, in fact, being rewarded based on a mirror image path of the visible path).”

      Recommendation #2: Provide more behavioral quantification.

      (2a) The authors chose to only plot the average learning score in Figure 1D, without an indication of movement variability. I think this is quite important, to give the reader an impression of how variable the movements were at baseline, during early learning, and over the course of learning. There is evidence that baseline variability influences the 'detectability' of imposed rotations (in the case of adaptation learning), which could be relevant here. Shading the plots by movement variability would also be important to see if there was some refinement of the moment after participants performed at the ceiling (which seems to be the case ~ after trial 150). This is especially worrying given that in Fig 6A there is a clear indication that there is a large difference between subjects' solutions on the task. One subject exhibits almost a one-shot learning curve (reaching a score of 75 after one or two trials), whereas others don't seem to really learn until the near end. What does this between-subject variability mean for the authors' hypothesized neural processes?

      In line with these recommendations, we have now provided much better behavioral quantification of subject-level performance in both the main manuscript and supplementary material. For instance, in a new supplemental Figure 1 (shown below), we now include mean subject (+/- SE) reaction times (RTs), movement times (MTs) and movement path variability (our computing of these measures are now defined in our Methods section).

      As can be seen in the figure, all three of these variables tended to decrease over the course of the study, though we note there was a noticeable uptick in both RTs and MTs from the Baseline to Early learning phase, once subjects started receiving trial-by-trial reward feedback based on their movements. With respect to path variability, it is not obvious that there was a significant refinement of the paths created during late learning (panel D below), though there was certainly a general trend for path variability to decrease over learning.

      Author response image 3.

      Behavioral measures of learning across the task. (A-D) shows average participant reward scores (A), reaction times (B), movement times (C) and path variability (D) over the course of the task. In each plot, the black line denotes the mean across participants and the gray banding denotes +/- 1 SEM. The three equal-length task epochs for subsequent neural analyses are indicated by the gray shaded boxes.

      In addition to these above results, we have also created a new Figure 6 in the main manuscript, which now solely focuses on individual differences in subject learning (see below). Hopefully, this figure clarifies key features of the task and its reward structure, and also depicts (in movement trajectory space) what fast versus slow learning looks like in the task. Specifically, we believe that this figure now clearly delineates for the reader the mapping between movement trajectory and the reward score feedback presented to participants, which appeared to be a source of confusion based on the reviewers’ comments below. As can be clearly observed in this figure, trajectories that approximated the ‘visible path’ (black line) resulted in fairly mediocre scores (see score color legend at right), whereas trajectories that approximated the ‘reward path’ (dashed black line, see trials 191-200 of the fast learner) resulted in fairly high scores. This figure also more clearly delineates how fPCA loadings derived from our functional data analysis were used to derive subject-level learning scores (panel C).

      Author response image 4.

      Individual differences in subject learning performance. (A) Examples of a good learner (bordered in green) and poor learner (bordered in red). (B) Individual subject learning curves for the task. Solid black line denotes the mean across all subjects whereas light gray lines denote individual participants. The green and red traces denote the learning curves for the example good and poor learners denoted in A. (C) Derivation of subject learning scores. We performed functional principal component analysis (fPCA) on subjects’ learning curves in order to identify the dominant patterns of variability during learning. The top component, which encodes overall learning, explained the majority of the observed variance (~75%). The green and red bands denote the effect of positive and negative component scores, respectively, relative to mean performance. Thus, subjects who learned more quickly than average have a higher loading (in green) on this ‘Learning score’ component than subjects who learned more slowly (in red) than average. The plot at right denotes the loading for each participant (open circles) onto this Learning score component.

      The reviewers note that there are large individual differences in learning performance across the task. This was clearly our hope when designing the reward structure of this task, as it would allow us to further investigate the neural correlates of these individual differences (indeed, during pilot testing, we sought out a reward structure to the task that would allow for these intersubject differences). The subjects who learn early during the task end up having higher fPCA scores than the subjects who learn more gradually (or learn the task late). From our perspective, these differences are a feature, and not a bug, and they do not negate any of our original interpretations. That is, subjects who learn earlier on average tend to contract their DAN-A network during the early learning phase whereas subjects who learn more slowly on average (or learn late) instead tend to contract their DAN-A network during late learning (Fig. 7).

      (2b) In the methods, the authors stated that they scaled the score such that even a perfectly traced visible path would always result in an imperfect score of 40 patients. What happens if a subject scores perfectly on the first try (which seemed to have happened for the green highlighted subject in Fig 6A), but is then permanently confronted with a score of 40 or below? Wouldn't this result in an error-clamp-like (error-based motor adaptation) design for this subject and all other high performers, which would vastly differ from the task demands for the other subjects? How did the authors factor in the wide between-subject variability?

      We think the reviewers may have misinterpreted the reward structure of the task, and we apologize for not being clearer in our descriptions. The reward score that subjects received after each trial was based on how well they traced the mirror-image of the visible path. However, all the participant can see on the screen is the visible path. We hope that our inclusion of the new Figure 6 (shown above) makes the reward structure of the task, and its relationship to movement trajectories, much clearer. We should also note that, even for the highest performing subject (denoted in Fig. 6), it still required approximately 20 trials for them to reach asymptote performance.

      (2c) The study would benefit from a more detailed description of participants' behavioral performance during the task. Specifically, it is crucial to understand how participants' motor skills evolve over time. Information on changes in movement speed, accuracy, and other relevant behavioral metrics would enhance the understanding of the relationship between behavior and brain activity during the learning process. Additionally, please clarify whether the display on the screen was presented continuously throughout the entire trial or only during active movement periods. Differences in display duration could potentially impact the observed differences in brain activity during learning.

      We hope that with our inclusion of the new Supplementary Figure 1 (shown above) this addresses the reviewers’ recommendation. Generally, we find that RTs, MTs and path variability all decrease over the course of the task. We think this relates to the early learning phase being more attentionally demanding and requiring more conscious effort, than the later learning phases.

      Also, yes, the visible path was displayed on the screen continuously throughout the trial, and only disappeared at the 4.5 second mark of each trial (when the screen was blanked and the data was saved off for 1.5 seconds prior to commencement of the next trial; 6 seconds total per trial). Thus, there were no differences in display duration across trials and phases of the task. We have now clarified this in the Methods section, where we now write the following:

      “When the cursor reached the target distance, the target changed color from red to green to indicate that the trial was completed. Importantly, other than this color change in the distance marker, the visible curved path remained constant and participants never received any feedback about the position of their cursor.”

      (2d) It is unclear from plots 6A, 6B, and 1D how the scale of the behavioral data matches with the scaling of the scores. Are these the 'real' scores, meaning 100 on the y-axis would be equivalent to 40 in the task? Why then do all subjects reach an asymptote at 75? Or is 75 equivalent to 40 and the axis labels are wrong?

      As indicated above, we clearly did a poor job of describing the reward structure of our task in our original paper, and we now hope that our inclusion of Figure 6 makes things clear. A ‘40’ score on the y-axis would indicate that a subject has perfectly traced the visible path whereas a perfect ‘100’ score would indicate that a subject has perfectly traced the (hidden) mirror image path.

      The fact that several of the subjects reach asymptote around 75 is likely a byproduct of two factors. Firstly, the subjects performed their movements in the absence of any visual error feedback (they could not see the position of a cursor that represented their hand position), which had the effect of increasing motor variability in their actions from trial to trial. Secondly, there appears to be an underestimation among subjects regarding the curvature of the concealed, mirror-image path (i.e., that the rewarded path actually had an equal but opposite curvature to that of the visible path). This is particularly evident in the case of the top-performing subject (illustrated in Figure 6A) who, even during late learning, failed to produce a completely arched movement.

      (2e) Labeling of Contrasts: There is a consistent issue with the labeling of contrasts in the presented figures, causing confusion. While the text refers to the difference as "baseline to early learning," the label used in figures, such as Figure 4, reads "baseline > early." It is essential to clarify whether the presented contrast is indeed "baseline > early" or "early > baseline" to avoid any misinterpretation.

      We thank the reviewers for catching this error. Indeed, the intended label was Early > Baseline, and this has now been corrected throughout.

      Recommendation #3. Clarify which motor learning mechanism(s) are at play.

      (3a) Participants were performing at a relatively low level, achieving around 50-60 points by the end of learning. This outcome may not be that surprising, given that reward-based learning might have a substantial explicit component and may also heavily depend on reasoning processes, beyond reinforcement learning or contextual recall (Holland et al., 2018; Tsay et al., 2023). Even within our own data, where explicit processes are isolated, average performance is low and many individuals fail to learn (Brudner et al., 2016; Tsay et al., 2022). Given this, many participants in the current study may have simply given up. A potential indicator of giving up could be a subset of participants moving straight ahead in a rote manner (a heuristic to gain moderate points). Consequently, alterations in brain networks may not reflect exploration and exploitation strategies but instead indicate levels of engagement and disengagement. Could the authors plot the average trajectory and the average curvature changes throughout learning? Are individuals indeed defaulting to moving straight ahead in learning, corresponding to an average of 50-60 points? If so, the interpretation of brain activity may need to be tempered.

      We can do one better, and actually give you a sense of the learning trajectories for every subject over time. In the figure below, which we now include as Supplementary Figure 2 in our revision, we have plotted, for each subject, a subset of their movement trajectories across learning trials (every 10 trials). As can be seen in the diversity of these trajectories, the average trajectory and average curvature would do a fairly poor job of describing the pattern of learning-related changes across subjects. Moreover, it is not obvious from looking at these plots the extent to which poor learning subjects (i.e., subjects who never converge on the reward path) actually ‘give up’ in the task — rather, many of these subjects still show some modulation (albeit minor) of their movement trajectories in the later trials (see the purple and pink traces). As an aside, we are also not entirely convinced that straight ahead movements, which we don’t find many of in our dataset, can be taken as direct evidence that the subject has given up.

      Author response image 5

      Variability in learning across subjects. Plots show representative trajectory data from each subject (n=36) over the course of the 200 learning trials. Coloured traces show individual trials over time (each trace is separated by ten trials, e.g., trial 1, 10, 20, 30, etc.) to give a sense of the trajectory changes throughout the task (20 trials in total are shown for each subject).

      We should also note that we are not entirely opposed to the idea of describing aspects of our findings in terms of subject engagement versus disengagement over time, as such processes are related at some level to exploration (i.e., cognitive engagement in finding the best solution) and exploitation (i.e., cognitively disengaging and automating one’s behavior). As noted in our reply to Recommendation #1 above, we now give some consideration of these explanations in our Discussion section, where we now write:

      “It is also possible that these task-related shifts in connectivity relates to shifts in task-general processes, such as changes in the allocation of attentional resources (Bédard and Song, 2013; Rosenberg et al., 2016) or overall cognitive engagement (Aben et al., 2020), which themselves play critical roles in shaping learning (Codol et al., 2018; Holland et al., 2018; Song, 2019; Taylor and Thoroughman, 2008, 2007; for a review of these topics, see Tsay et al., 2023). Such processes are particularly important during the earlier phases of learning when sensorimotor contingencies need to be established. While these remain questions for future work, our data nevertheless suggest that this shift in connectivity may be enabled through the PMC.”

      (3b) The authors are mixing two commonly used paradigms, reward-based learning, and motor adaptation, but provide no discussion of the different learning processes at play here. Which processes were they attempting to probe? Making this explicit would help the reader understand which brain regions should be implicated based on previous literature. As it stands, the task is hard to interpret. Relatedly, there is a wealth of literature on explicit vs implicit learning mechanisms in adaptation tasks now. Given that the authors are specifically looking at brain structures in the cerebral cortex that are commonly associated with explicit and strategic learning rather than implicit adaptation, how do the authors relate their findings to this literature? Are the learning processes probed in the task more explicit, more implicit, or is there a change in strategy usage over time? Did the authors acquire data on strategies used by the participants to solve the task? How does the baseline variability come into play here?

      As noted in our paper, our task was directly inspired by the reward-based motor learning tasks developed by Dam et al., 2013 (Plos One) and Wu et al., 2014 (Nature Neuroscience). What drew us to these tasks is that they allowed us to study the neural bases of reward-based learning mechanisms in the absence of subjects also being able to exploit error-based mechanisms to achieve learning. Indeed, when first describing the task in the Results section of our paper we wrote the following:

      “Importantly, because subjects received no visual feedback about their actual finger trajectory and could not see their own hand, they could only use the score feedback — and thus only reward-based learning mechanisms — to modify their movements from one trial to the next (Dam et al., 2013; Wu et al., 2014).”

      If the reviewers are referring to ‘motor adaptation’ in the context in which that terminology is commonly used — i.e., the use of sensory prediction errors to support error-based learning — then we would argue that motor adaptation is not a feature of the current study. It is true that in our study subjects learn to ‘adapt’ their movements across trials, but this shaping of the movement trajectories must be supported through reinforcement learning mechanisms (and, of course, supplemented by the use of cognitive strategies as discussed in the nice review by Tsay et al., 2023). We apologize for not being clearer in our paper about this key distinction and we have now included new text in the introduction to our Results to directly address this:

      “Importantly, because subjects received no visual feedback about their actual finger trajectory and could not see their own hand, they could only use the score feedback — and thus only reward-based learning mechanisms — to modify their movements from one trial to the next (Dam et al., 2013; Wu et al., 2014). That is, subjects could not use error-based learning mechanisms to achieve learning in our study, as this form of learning requires sensory errors that convey both the change in direction and magnitude needed to correct the movement.”

      With this issue aside, we are well aware of the established framework for thinking about sensorimotor adaptation as being composed of a combination of explicit and implicit components (indeed, this has been a central feature of several of our other recent neuroimaging studies that have explored visuomotor rotation learning, e.g., Gale et al., 2022 PNAS, Areshenkoff et al., 2022 elife, Standage et al., 2023 Cerebral Cortex). However, there has been comparably little work done on these parallel components within the domain of reinforcement learning tasks (though see Codol et al., 2018; Holland et al., 2018, van Mastrigt et al., 2023; see also the Tsay et al., 2023 review), and as far as we can tell, nothing has been done to date in the reward-based motor learning area using fMRI. By design, we avoided using descriptors of ‘explicit’ or ‘implicit’ in our study because our experimental paradigm did not allow a separate measurement of those two components to learning during the task. Nevertheless, it seems clear to us from examining the subjects’ learning curves (see supplementary figure 2 above), that individuals who learn very quickly are using strategic processes (such as action exploration to identify the best path) to enhance their learning. As we noted in an above response, we did not query subjects after the fact about their strategy use, which admittedly was a missed opportunity on our part.

      Author response image 6.

      With respect to the comment on baseline variability and its relationship to performance, this is an interesting idea and one that was explored in the Wu et al., 2014 Nature Neuroscience paper. Prompted by the reviewers, we have now explored this idea in the current data set by testing for a relationship between movement path variability during baseline trials (all 70 baseline trials, see Supplementary Figure 1D above for reference) and subjects’ fPCA score on our learning task. However, when we performed this analysis, we did not observe a significant positive relationship between baseline variability and subject performance. Rather, we actually found a trend towards a negative relationship (though this was non-significant; r=-0.2916, p=0.0844). Admittedly, we are not sure what conclusions can be drawn from this analysis, and in any case, we believe it to be tangential to our main results. We provide the results (at right) for the reviewers if they are interested. This may be an interesting avenue for exploration in future work.

      Recommendation #4: Provide stronger justification for brain imaging methods.

      (4a) Observing how brain activity varies across these different networks is remarkable, especially how sensorimotor regions separate and then contract with other, more cognitive areas. However, does the signal-to-noise ratio in each area/network influence manifold eccentricity and limit the possible changes in eccentricity during learning? Specifically, if a region has a low signal-to-noise ratio, it might exhibit minimal changes during learning (a phenomenon perhaps relevant to null manifold changes in the striatum due to low signal-to-noise); conversely, regions with higher signal-to-noise (e.g., motor cortex in this sensorimotor task) might exhibit changes more easily detected. As such, it is unclear how to interpret manifold changes without considering an area/network's signal-to-noise ratio.

      We appreciate where these concerns are coming from. First, we should note that the timeseries data used in our analysis were z-transformed (mean zero, 1 std) to allow normalization of the signal both over time and across regions (and thus mitigate the possibility that the changes observed could simply reflect mean overall signal changes across different regions). Nevertheless, differences in signal intensity across brain regions — particularly between cortex and striatum — are well-known, though it is not obvious how these differences may manifest in terms of a task-based modulation of MR signals.

      To examine this issue in the current data set, we extracted, for each subject and time epoch (Baseline, Early and Late learning) the raw scanner data (in MR arbitrary units, a.u.) for the cortical and striatal regions and computed the (1) mean signal intensity, (2) standard deviation of the signal (Std) and (3) temporal signal to noise ratio (tSNR; calculated by mean/Std). Note that in the fMRI connectivity literature tSNR is often the preferred SNR measure as it normalizes the mean signal based on the signal’s variability over time, thus providing a general measure of overall ‘signal quality’. The results of this analysis, averaged across subjects and regions, is shown below.

      Author response image 7.

      Note that, as expected, the overall signal intensity (left plot) of cortex is higher than in the striatum, reflecting the closer proximity of cortex to the receiver coils in the MR head coil. In fact, the signal intensity in cortex is approximately 38% higher than that in the striatum (~625 - 450)/450). However, the signal variation in cortex is also greater than striatum (middle plot), but in this case approximately 100% greater (i.e., (~5 - 2.5)/2.5)). The result of this is that the tSNR (mean/std) for our data set and the ROI parcellations we used is actually greater in the striatum than in cortex (right plot). Thus, all else being equal, there seems to have been sufficient tSNR in the striatum for us to have detected motor-learning related effects. As such, we suspect the null effects for the striatum in our study actually stem from two sources.

      The first likely source is the relatively lower number of striatal regions (12) as compared to cortical regions (998) used in our analysis, coupled with our use of PCA on these data (which, by design, identifies the largest sources of variation in connectivity). In future studies, this unbalance could be rectified by using finer parcellations of the striatum (even down to the voxel level) while keeping the same parcellation of cortex (i.e., equate the number of ‘regions’ in each of striatum and cortex). The second likely source is our use of a striatal atlas (the Harvard-Oxford atlas) that divides brain regions based on their neuroanatomy rather than their function. In future work, we plan on addressing this latter concern by using finer, more functionally relevant parcellations of striatum (such as in Tian et al., 2020, Nature Neuroscience). Note that we sought to capture these interrelated possible explanations in our Discussion section, where we wrote the following:

      “While we identified several changes in the cortical manifold that are associated with reward-based motor learning, it is noteworthy that we did not observe any significant changes in manifold eccentricity within the striatum. While clearly the evidence indicates that this region plays a key role in reward-guided behavior (Averbeck and O’Doherty, 2022; O’Doherty et al., 2017), there are several possible reasons why our manifold approach did not identify this collection of brain areas. First, the relatively small size of the striatum may mean that our analysis approach was too coarse to identify changes in the connectivity of this region. Though we used a 3T scanner and employed a widely-used parcellation scheme that divided the striatum into its constituent anatomical regions (e.g., hippocampus, caudate, etc.), both of these approaches may have obscured important differences in connectivity that exist within each of these regions. For example, areas such the hippocampus and caudate are not homogenous areas but themselves exhibit gradients of connectivity (e.g., head versus tail) that can only be revealed at the voxel level (Tian et al., 2020; Vos de Wael et al., 2021). Second, while our dimension reduction approach, by design, aims to identify gradients of functional connectivity that account for the largest amounts of variance, the limited number of striatal regions (as compared to cortex) necessitates that their contribution to the total whole-brain variance is relatively small. Consistent with this perspective, we found that the low-dimensional manifold architecture in cortex did not strongly depend on whether or not striatal regions were included in the analysis (see Supplementary Fig. 6). As such, selective changes in the patterns of functional connectivity at the level of the striatum may be obscured using our cortex x striatum dimension reduction approach. Future work can help address some of these limitations by using both finer parcellations of striatal cortex (perhaps even down to the voxel level)(Tian et al., 2020) and by focusing specifically on changes in the interactions between the striatum and cortex during learning. The latter can be accomplished by selectively performing dimension reduction on the slice of the functional connectivity matrix that corresponds to functional coupling between striatum and cortex.”

      (4b) Could the authors clarify how activity in the dorsal attention network (DAN) changes throughout learning, and how these changes also relate to individual differences in learning performance? Specifically, on average, the DAN seems to expand early and contract late, relative to the baseline. This is interpreted to signify that the DAN exhibits lesser connectivity followed by greater connectivity with other brain regions. However, in terms of how these changes relate to behavior, participants who go against the average trend (DAN exhibits more contraction early in learning, and expansion from early to late) seem to exhibit better learning performance. This finding is quite puzzling. Does this mean that the average trend of expansion and contraction is not facilitative, but rather detrimental, to learning? [Another reviewer added: The authors do not state any explicit hypotheses, but only establish that DMN coordinates activity among several regions. What predictions can we derive from this? What are the authors looking for in the data? The work seems more descriptive than hypothesis-driven. This is fine but should be clarified in the introduction.]

      These are good questions, and we are glad the reviewers appreciated the subtlety here. The reviewers are indeed correct that the relationship of the DAN-A network to behavioral performance appears to go against the grain of the group-level results that we found for the entire DAN network (which we note is composed of both the DAN-A and DAN-B networks). That is, subjects who exhibited greater contraction from Baseline to Early learning and likewise, greater expansion from Early to Late learning, tended to perform better in the task (according to our fPCA scores). However, on this point it is worth noting that it was mainly the DAN-B network which exhibited group-level expansion from Baseline to Early Learning whereas the DAN-A network exhibited negligible expansion. This can be seen in Author response image 8 below, which shows the pattern of expansion and contraction (as in Fig. 4), but instead broken down into the 17-network parcellation. The red asterisk denotes the expansion from Baseline to Early learning for the DAN-B network, which is much greater than that observed for the DAN-A network (which is basically around the zero difference line).

      Author response image 8.

      Thus, it appears that the DAN-A and DAN-B networks are modulated to a different extent during the task, which likely contributes to the perceived discrepancy between the group-level effects (reported using the 7-network parcellation) and the individual differences effects (reported using the finer 17-network parcellation). Based on the reviewers’ comments, this seems like an important distinction to clarify in the manuscript, and we have now described this nuance in our Results section where we now write:

      “...Using this permutation testing approach, we found that it was only the change in eccentricity of the DAN-A network that correlated with Learning score (see Fig. 7C), such that the more the DAN-A network decreased in eccentricity from Baseline to Early learning (i.e., contracted along the manifold), the better subjects performed at the task (see Fig. 7C, scatterplot at right). Consistent with the notion that changes in the eccentricity of the DAN-A network are linked to learning performance, we also found the inverse pattern of effects during Late learning, whereby the more that this same network increased in eccentricity from Early to Late learning (i.e., expanded along the manifold), the better subjects performed at the task (Fig. 7D). We should note that this pattern of performance effects for the DAN-A — i.e., greater contraction during Early learning and greater expansion during Late learning being associated with better learning — appears at odds with the group-level effects described in Fig. 4A and B, where we generally find the opposite pattern for the entire DAN network (composed of the DAN-A and DAN-B subnetworks). However, this potential discrepancy can be explained when examining the changes in eccentricity using the 17-network parcellation (see Supplementary Figure 8). At this higher resolution level we find that these group-level effects for the entire DAN network are being largely driven by eccentricity changes in the DAN-B network (areas in anterior superior parietal cortex and premotor cortex), and not by mean changes in the DAN-A network. By contrast, our present results suggest that it is the contraction and expansion of areas of the DAN-A network (and not DAN-B network) that are selectively associated with differences in subject learning performance.”

      Finally, re: the reviewers’ comments that we do not state any explicit hypotheses etc., we acknowledge that, beyond our general hypothesis stated at the outset about the DMN being involved in reward-based motor learning, our study is quite descriptive and exploratory in nature. Such little work has been done in this research area (i.e., using manifold learning approaches to study motor learning with fMRI) that it would be disingenuous to have any stronger hypotheses than those stated in our Introduction. Thus, to make the exploratory nature of our study clear to the reader, we have added the following text (in red) to our Introduction:

      “Here we applied this manifold approach to explore how brain activity across widely distributed cortical and striatal systems is coordinated during reward-based motor learning. We were particularly interested in characterizing how connectivity between regions within the DMN and the rest of the brain changes as participants shift from learning the relationship between motor commands and reward feedback, during early learning, to subsequently using this information, during late learning. We were also interested in exploring whether learning-dependent changes in manifold structure relate to variation in subject motor performance.”

      We hope these changes now make it obvious the intention of our study.

      (4c) The paper examines a type of motor adaptation task with a reward-based learning component. This, to me, strongly implicates the cerebellum, given that it has a long-established crucial role in adaptation and has recently been implicated in reward-based learning (see work by Wagner & Galea). Why is there no mention of the cerebellum and why it was left out of this study? Especially given that the authors state in the abstract they examine cortical and subcortical structures. It's evident from the methods that the authors did not acquire data from the cerebellum or had too small a FOV to fully cover it (34 slices at 4 mm thickness 136 mm which is likely a bit short to fully cover the cerebellum in many participants). What was the rationale behind this methodological choice? It would be good to clarify this for the reader. Related to this, the authors need to rephrase their statements on 'whole-brain' connectivity matrices or analyses - it is not whole-brain when it excludes the cerebellum.

      As we noted above, we do not believe this task to be a motor adaptation task, in the sense that subjects are not able to use sensory prediction errors (and thus error-based learning mechanisms) to improve their performance. Rather, by denying subjects this sensory error feedback they are only able to use reinforcement learning processes, along with cognitive strategies (nicely covered in Tsay et al., 2023), to improve performance. Nevertheless, we recognize that the cerebellum has been increasingly implicated in facets of reward-based learning, particularly within the rodent domain (e.g., Wagner et al., 2017; Heffley et al., 2018; Kostadinov et al., 2019, etc.). In our study, we did indeed collect data from the cerebellum but did not include it in our original analyses, as we wanted (1) the current paper to build on prior work in the human and macaque reward-learning domain (which focuses solely on striatum and cortex, and which rarely discusses cerebellum, see Averbeck & O’Doherty, 2022 & Klein-Flugge et al., 2022 for recent reviews), and, (2) allow this to be a more targeted focus of future work (specifically we plan on focusing on striatal-cerebellar interactions during learning, which are hypothesized based on the neuroanatomical tract tracing work of Bostan and Strick, etc.). We hope the reviewers respect our decisions in this regard.

      Nevertheless, we acknowledge that based on our statements about ‘whole-brain’ connectivity and vagueness about what we mean by ‘subcortex,’ that this may be confusing for the reader. We have now removed and/or corrected such references throughout the paper (however, note that in some cases it is difficult to avoid reference to “whole-brain” — e.g., “whole-brain correlation map” or “whole-brain false discovery rate correction”, which is standard terminology in the field).

      In addition, we are now explicit in our Methods section that the cerebellum was not included in our analyses.

      “Each volume comprised 34 contiguous (no gap) oblique slices acquired at a ~30° caudal tilt with respect to the plane of the anterior and posterior commissure (AC-PC), providing whole-brain coverage of the cerebrum and cerebellum. Note that for the current study, we did not examine changes in cerebellar activity during learning.”

      (4d) The authors centered the matrices before further analyses to remove variance associated with the subject. Why not run a PCA on the connectivity matrices and remove the PC that is associated with subject variance? What is the advantage of first centering the connectivity matrices? Is this standard practice in the field?

      Centering in some form has become reasonably common in the functional connectivity literature, as there is considerable evidence that task-related (or cognitive) changes in whole-brain connectivity are dwarfed by static, subject-level differences (e.g., Gratton, et al, 2018, Neuron). If covariance matrices were ordinary scalar values, then isolating task-related changes could be accomplished simply by subtracting a baseline scan or mean score; but because the space of covariance matrices is non-Euclidean, the actual computations involved in this subtraction are more complex (see our Methods). However, fundamentally (and conceptually) our procedure is simply ordinary mean-centering, but adapted to this non-Euclidean space. Despite the added complexity, there is considerable evidence that such computations — adapted directly to the geometry of the space of covariance matrices — outperform simpler methods, which treat covariance matrices as arrays of real numbers (e.g. naive substraction, see Dodero et al. & Ng et al., references below). Moreover, our previous work has found that this procedure works quite well to isolate changes associated with different task conditions (Areshenkoff et al., 2021, Neuroimage; Areshenkoff et al., 2022, elife).

      Although PCA can be adapted to work well with covariance matrix valued data, it would at best be a less direct solution than simply subtracting subjects' mean connectivity. This is because the top components from applying PCA would be dominated by both subject-specific effects (not of interest here), and by the large-scale connectivity structure typically observed in component based analyses of whole-brain connectivity (i.e. the principal gradient), whereas changes associated with task-condition (the thing of interest here) would be buried among the less reliable components. By contrast, our procedure directly isolates these task changes.

      References cited above:

      Dodero, L., Minh, H. Q., San Biagio, M., Murino, V., & Sona, D. (2015, April). Kernel-based classification for brain connectivity graphs on the Riemannian manifold of positive definite matrices. In 2015 IEEE 12th international symposium on biomedical imaging (ISBI) (pp. 42-45). IEEE.

      Ng, B., Dressler, M., Varoquaux, G., Poline, J. B., Greicius, M., & Thirion, B. (2014). Transport on Riemannian manifold for functional connectivity-based classification. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2014: 17th International Conference, Boston, MA, USA, September 14-18, 2014, Proceedings, Part II 17 (pp. 405-412). Springer International Publishing.

      (4e) Seems like a missed opportunity that the authors just use a single, PCA-derived measure to quantify learning, where multiple measures could have been of interest, especially given that the introduction established some interesting learning-related concepts related to exploration and exploitation, which could be conceptualized as movement variability and movement accuracy. It is unclear why the authors designed a task that was this novel and interesting, drawing on several psychological concepts, but then chose to ignore these concepts in the analysis.

      We were disappointed to hear that the reviewers did not appreciate our functional PCA-derived measure to quantify subject learning. This is a novel data-driven analysis approach that we have previously used with success in recent work (e.g., Areshenkoff et al., 2022, elife) and, from our perspective, we thought it was quite elegant that we were able to describe the entire trajectory of learning across all participants along a single axis that explained the majority (~75%) of the variance in the patterns of behavioral learning data. Moreover, the creation of a single behavioral measure per participant (what we call a ‘Learning score’, see Fig. 6C) helped simplify our brain-behavior correlation analyses considerably, as it provided a single measure that accounts for the natural auto-correlation in subjects’ learning curves (i.e., that subjects who learn quickly also tend to be better overall learners by the end of the learning phase). It also avoids the difficulty (and sometimes arbitrariness) of having to select specific trial bins for behavioral analysis (e.g., choosing the first 5, 10, 20 or 25 trials as a measure of ‘early learning’, and so on). Of course, one of the major alternatives to our approach would have involved fitting an exponential to each subject’s learning curves and taking measures like learning rate etc., but in our experience we have found that these types of models don’t always fit well, or derive robust/reliable parameters at the individual subject level. To strengthen the motivation for our approach, we have now included the following text in our Results:

      “To quantify this variation in subject performance in a manner that accounted the auto-correlation in learning performance over time (i.e., subjects who learned more quickly tend to exhibit better performance by the end of learning), we opted for a pure data-driven approach and performed functional principal component analysis (fPCA; (Shang, 2014)) on subjects’ learning curves. This approach allowed us to isolate the dominant patterns of variability in subject’s learning curves over time (see Methods for further details; see also Areshenkoff et al., 2022).”

      In any case, the reviewers may be pleased to hear that in current work in the lab we are using more model-based approaches to attempt to derive sets of parameters (per participant) that relate to some of the variables of interest described by the reviewers, but that we relate to much more dynamical (shorter-term) changes in brain activity.

      (4f) Overall Changes in Activity: The manuscript should delve into the potential influence of overall changes in brain activity on the results. The choice of using Euclidean distance as a metric for quantifying changes in connectivity is sensitive to scaling in overall activity. Therefore, it is crucial to discuss whether activity in task-relevant areas increases from baseline to early learning and decreases from early to late learning, or if other patterns emerge. A comprehensive analysis of overall activity changes will provide a more complete understanding of the findings.

      These are good questions and we are happy to explore this in the data. However, as mentioned in our response to query 4a above, it is important to note that the timeseries data for each brain region was z-scored prior to analysis, with the aim of removing any mean changes in activity levels (note that this is a standard preprocessing step when performing functional connectivity analysis, given that mean signal changes are not the focus of interest in functional connectivity analyses).

      To further emphasize these points, we have taken our z-scored timeseries data and calculated the mean signal for each region within each task epoch (Baseline, Early and Late learning, see panel A in figure below). The point of showing this data (where each z-score map looks near identical across the top, middle and bottom plots) is to demonstrate just how miniscule the mean signal changes are in the z-scored timeseries data. This point can also be observed when plotting the mean z-score signal across regions for each epoch (see panel B in figure below). Here we find that Baseline and Early learning have a near identical mean activation level across regions (albeit with slightly different variability across subjects), whereas there is a slight increase during late learning — though it should be noted that our y-axis, which measures in the thousandths, really magnifies this effect.

      To more directly address the reviewers’ comments, using the z-score signal per region we have also performed the same statistical pairwise comparisons (Early > Baseline and Late>Early) as we performed in the main manuscript Fig. 4 (see panel C in Author response image 9 below). In this plot, areas in red denote an increase in activity from Baseline to Early learning (top plot) and from Early to Late learning (bottom plot), whereas areas in blue denote a decrease for those same comparisons. The important thing to emphasize here is that the spatial maps resulting from this analysis are generally quite different from the maps of eccentricity that we report in Fig. 4 in our paper. For instance, in the figure below, we see significant changes in the activity of visual cortex between epochs but this is not found in our eccentricity results (compare with Fig. 4). Likewise, in our eccentricity results (Fig. 4), we find significant changes in the manifold positioning of areas in medial prefrontal cortex (MPFC), but this is not observed in the activation levels of these regions (panel C below). Again, we are hesitant to make too much of these results, as the activation differences denoted as significant in the figure below are likely to be an effect on the order of thousandths of a z-score (e.g., 0.002 > 0.001), but this hopefully assuages reviewers’ concerns that our manifold results are solely attributable to changes in overall activity levels.

      We are hesitant to include the results below in our paper as we feel that they don’t add much to the interpretation (as the purpose of z-scoring was to remove large activation differences). However, if the reviewers strongly believe otherwise, we would consider including them in the supplement.

      Author response image 9.

      Examination of overall changes in activity across regions. (A) Mean z-score maps across subjects for the Baseline (top), Early Learning (middle) and Late learning (bottom) epochs. (B) Mean z-score across brain regions for each epoch. Error bars represent +/- 1 SEM. (C) Pairwise contrasts of the z-score signal between task epochs. Positive (red) and negative (blue) values show significant increases and decreases in z-score signal, respectively, following FDR correction for region-wise paired t-tests (at q<0.05).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors aim to address a critical challenge in the field of bioinformatics: the accurate and efficient identification of protein binding sites from sequences. Their work seeks to overcome the limitations of current methods, which largely depend on multiple sequence alignments or experimental protein structures, by introducing GPSite, a multi-task network designed to predict binding residues of various molecules on proteins using ESMFold.

      Strengths:

      • Benchmarking. The authors provide a comprehensive benchmark against multiple methods, showcasing the performances of a large number of methods in various scenarios.

      • Accessibility and Ease of Use. GPSite is highlighted as a freely accessible tool with user-friendly features on its website, enhancing its potential for widespread adoption in the research community.

      RE: We thank the reviewer for acknowledging the contributions and strengths of our work!

      Weaknesses:

      • Lack of Novelty. The method primarily combines existing approaches and lacks significant technical innovation. This raises concerns about the original contribution of the work in terms of methodological development. Moreover, the paper reproduces results and analyses already presented in previous literature, without providing novel analysis or interpretation. This further diminishes the contribution of this paper to advancing knowledge in the field.

      RE: The novelty of this work is primarily manifested in four key aspects. Firstly, although we have employed several existing tools such as ProtTrans and ESMFold to extract sequence features and predict protein conformations, these techniques were hardly explored in the field of binding site prediction. We have successfully demonstrated the feasibility of substituting multiple sequence alignments with language model embeddings and training with predicted structures, providing a new solution to overcome the limitations of current methods for genome-wide applications. Secondly, though a few methods tend to capture geometric information based on protein surfaces or atom graphs, surface calculation and property mapping are usually time-consuming, while massage passing on full atom graphs is memory-consuming and thus challenging to process long sequences. Besides, these methods are sensitive towards details and errors in the predicted structures. To facilitate large-scale annotations, we have innovatively applied geometric deep learning to protein residue graphs for comprehensively capturing backbone and sidechain geometric contexts in an efficient and effective manner (Figure 1). Thirdly, we have not only exploited multi-task learning to integrate diverse ligands and enhance performance, but also shown its capability to easily extend to the binding site prediction of other unseen ligands (Figure 4 D-E). Last but not least, as a “Tools and Resources” article, we have provided a fast, accurate and user-friendly webserver, as well as constructed a large annotation database for the sequences in Swiss-Prot. Leveraging this database, we have conducted extensive analyses on the associations between binding sites and molecular functions, biological processes, and disease-causing mutations (Figure 5), indicating the potential of our tool to unveil unexplored biology underlying genomic data.

      We have now revised the descriptions in the “The geometry-aware protein binding site predictor (GPSite)” section to highlight the novelty of our work in a clearer manner:

      “In conclusion, GPSite is distinguished from the previous approaches in four key aspects. First, profiting from the effectiveness and low computational cost of ProtTrans and ESMFold, GPSite is liberated from the reliance on MSA and native structures, thus enabling genome-wide binding site prediction. Second, unlike methods that only explore the Cα models of proteins 25,40, GPSite exploits a comprehensive geometric featurizer to fully refine knowledge in the backbone and sidechain atoms. Third, the employed message propagation on residue graphs is global structure-aware and time-efficient compared to the methods based on surface point clouds 21,22, and memory-efficient unlike methods based on full atom graphs 23,24. Residue-based message passing is also less sensitive towards errors in the predicted structures. Last but not least, instead of predicting binding sites for a single molecule type or learning binding patterns separately for different molecules, GPSite applies multi-task learning to better model the latent relationships among different binding partners.”

      • Benchmark Discrepancies. The variation in benchmark results, especially between initial comparisons and those with PeSTo. GPSite achieves a PR AUC of 0.484 on the global benchmark but a PR AUC of 0.61 on the benchmark against PeSTo. For consistency, PeSTo should be included in the benchmark against all other methods. It suggests potential issues with the benchmark set or the stability of the method. This inconsistency needs to be addressed to validate the reliability of the results.

      RE: We thank the reviewer for the constructive comments. Since our performance comparison experiments involved numerous competitive methods whose training sets are disparate, it was difficult to compare or rank all these methods fairly using a single test set. Given the substantial overlap between our protein-binding site test set and the training set of PeSTo, we meticulously re-split our entire protein-protein binding site dataset to generate a new test set that avoids any overlap with the training sets of both GPSite and PeSTo and performed a separate evaluation, where GPSite achieves a higher AUPR than PeSTo (0.610 against 0.433). This is quite common in this field. For instance, in the study of PeSTo (Nat Commun 2023), the comparisons of PeSTo with MaSIF-site, SPPIDER, and PSIVER were conducted using one test set, while the comparison with ScanNet was performed on a separate test set.

      Based on the reviewer’s suggestion, we have now replaced this experiment with a direct comparison with PeSTo using the datasets from PeSTo, in order to enhance the completeness and convincingness of our results. The corresponding descriptions are now added in Appendix 1-note 2, and the results are added in Appendix 2-table 4. For convenience, we also attach the note and table here:

      “Since 340 out of 375 proteins in our protein-protein binding site test set share > 30% identity with the training sequences of PeSTo, we performed a separate comparison between GPSite and PeSTo using the training and test datasets from PeSTo. By re-training with simply the same hyperparameters, GPSite achieves better performance than PeSTo (AUPR of 0.824 against 0.797) as shown in Appendix 2-table 4. Furthermore, when using ESMFold-predicted structures as input, the performance of PeSTo decreases substantially (AUPR of 0.691), and the superiority of our method will be further reflected. As in 24, the performance of ScanNet is also included (AUPR of 0.720), which is also largely outperformed by GPSite.”

      Author response table 1.

      Performance comparison of GPSite with ScanNet and PeSTo on the protein-protein binding site test set from PeSTo 24

      Note: The performance of ScanNet and PeSTo are directly obtained from 24. PeSTo* denotes evaluation using the ESMFold-predicted structures as input. The metrics provided are the median AUPR, median AUC and median MCC. The best/second-best results are indicated by bold/underlined fonts.

      • Interface Definition Ambiguity. There is a lack of clarity in defining the interface for the binding site predictions. Different methods are trained using varying criteria (surfaces in MaSIF-site, distance thresholds in ScanNet). The authors do not adequately address how GPSite's definition aligns with or differs from these standards and how this issue was addressed. It could indicate that the comparison of those methods is unreliable and unfair.

      RE: We thank the reviewer for the comments. The precise definition of ligand-binding sites is elucidated in the “Benchmark datasets” section. Specifically, the datasets of DNA, RNA, peptide, ATP, HEM and metal ions used to train GPSite were collected from the widely acknowledged BioLiP database [PMID: 23087378]. In BioLiP, a binding residue is defined if the smallest atomic distance between the target residue and the ligand is <0.5 Å plus the sum of the Van der Waal’s radius of the two nearest atoms. Meanwhile, most comparative methods regarding these ligands were also trained on data from BioLiP, thereby ensuring fair comparisons.

      However, since BioLiP does not include data on protein-protein binding sites, studies for protein-protein binding site prediction may adopt slightly distinct label definitions, as the reviewer suggested. Here, we employed the protein-protein binding site data from our previous study [PMID: 34498061], where a protein-binding residue was defined as a surface residue (relative solvent accessibility > 5%) that lost more than 1 Å2 absolute solvent accessibility after protein-protein complex formation. This definition was initially introduced in PSIVER [PMID: 20529890] and widely applied in various studies (e.g., PMID: 31593229, PMID: 32840562). SPPIDER [PMID: 17152079] and MaSIF-site [PMID: 31819266] have also adopted similar surface-based definitions as PSIVER. On the other hand, ScanNet [PMID: 35637310] employed an atom distance threshold of 4 Å to define contacts while PeSTo [PMID: 37072397] used a threshold of 5 Å. However, it is noteworthy that current methods in this field including ScanNet (Nat Methods 2022) and PeSTo (Nat Commun 2023) directly compared methods using different label definitions without any alignment in their benchmark studies, likely due to the subtle distinctions among these definitions. For instance, the study of PeSTo directly performed comparisons with ScanNet, MaSIF-site, SPPIDER, and PSIVER. Therefore, we followed these previous works, directly comparing GPSite with other protein-protein binding site predictors.

      In the revised “Benchmark datasets” section, we have now provided more details for the binding site definitions in different datasets to avoid any potential ambiguity:

      “The benchmark datasets for evaluating binding site predictions of DNA, RNA, peptide, ATP, and HEM are constructed from BioLiP”; “A binding residue is defined if the smallest atomic distance between the target residue and the ligand is < 0.5 Å plus the sum of the Van der Waal’s radius of the two nearest atoms”; “Besides, the benchmark dataset of protein-protein binding sites is directly from 26, which contains non-redundant transient heterodimeric protein complexes dated up to May 2021. Surface regions that become solvent inaccessible on complex formation are defined as the ground truth protein-binding sites. The benchmark datasets of metal ion (Zn2+, Ca2+, Mg2+ and Mn2+) binding sites are directly from 18, which contain non-redundant proteins dated up to December 2021 from BioLiP.”

      While GPSite demonstrates the potential to surpass state-of-the-art methods in protein binding site prediction, the evidence supporting these claims seems incomplete. The lack of methodological novelty and the unresolved questions in benchmark consistency and interface definition somewhat undermine the confidence in the results. Therefore, it's not entirely clear if the authors have fully achieved their aims as outlined.

      The work is useful for the field, especially in disease mechanism elucidation and novel drug design. The availability of genome-scale binding residue annotations GPSite offers is a significant advancement. However, the utility of this tool could be hampered by the aforementioned weaknesses unless they are adequately addressed.

      RE: We thank the reviewer for acknowledging the advancement and value of our work, as well as pointing out areas where improvements can be made. As discussed above, we have now carried out the corresponding revisions in the revised manuscript to enhance the completeness and clearness of our work.

      Reviewer #2 (Public Review):

      Summary:

      This work provides a new framework, "GPsite" to predict DNA, RNA, peptide, protein, ATP, HEM, and metal ions binding sites on proteins. This framework comes with a webserver and a database of annotations. The core of the model is a Geometric featurizer neural network that predicts the binding sites of a protein. One major contribution of the authors is the fact that they feed this neural network with predicted structure from ESMFold for training and prediction (instead of native structure in similar works) and a high-quality protein Language Model representation. The other major contribution is that it provides the public with a new light framework to predict protein-ligand interactions for a broad range of ligands.

      The authors have demonstrated the interest of their framework with mostly two techniques: ablation and benchmark.

      Strengths:

      • The performance of this framework as well as the provided dataset and web server make it useful to conduct studies.

      • The ablations of some core elements of the method, such as the protein Language Model part, or the input structure are very insightful and can help convince the reader that every part of the framework is necessary. This could also guide further developments in the field. As such, the presentation of this part of the work can hold a more critical place in this work.

      RE: We thank the reviewer for recognizing the contributions of our work and for noting that our experiments are thorough.

      Weaknesses:

      • Overall, we can acknowledge the important effort of the authors to compare their work to other similar frameworks. Yet, the lack of homogeneity of training methods and data from one work to the other makes the comparison slightly unconvincing, as the authors pointed out. Overall, the paper puts significant effort into convincing the reader that the method is beating the state of the art. Maybe, there are other aspects that could be more interesting to insist on (usability, interest in protein engineering, and theoretical works).

      RE: We sincerely appreciate the reviewer for the constructive and insightful comments. As to the concern of training data heterogeneity raised by the reviewer, it is noteworthy that current studies in this field, such as ScanNet (Nat Methods 2022) and PeSTo (Nat Commun 2023), directly compare methods trained on different datasets in their benchmark experiments. Therefore, we have adhered to the paradigm in these previous works. According to the detailed recommendations by the reviewer, we have now improved our manuscript by incorporating additional ablation studies regarding the effects of training procedure and language model representations, as well as case studies regarding the predicted structure’s quality and GPSite-based function annotations. We have also refined the Discussion section to focus more on the achievements of this work. A comprehensive point-by-point response to the reviewer’s recommendations is provided below.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      Overall I think the work is slightly deserved by its presentation. Some improvements could be made to the paper to better highlight the significance of your contribution.

      RE: We thank the reviewer for recognizing the significance of our work!

      • Line 188: "As expected, the performance of these methods mostly decreases substantially utilizing predicted structures for testing because they were trained with high-quality native structures.

      This is a major ablation that was not performed in this case. You used the predicted structure to train, while the other did not. One better way to assess the interest of this approach would be to compare the performance of a network trained with only native structure to compare the leap in performance with and without this predicted structure as you did after to assess the interest of some other aspect of your method such as single to multitask.

      RE: We thank the reviewer for the valuable recommendation. We have now assessed the benefit of training with predicted instead of native structures, which brings an average AUPR increase of 4.2% as detailed in Appendix 1-note 5 and Appendix 2-table 9. For convenience, we also attach the note and table here:

      “We examined the performance under different training and evaluation settings as shown in Appendix 2-table 9. As expected, the model yields exceptional performance (average AUPR of 0.656) when trained and evaluated using native structures. However, if this model is fed with predicted structures of the test proteins, the performance substantially declines to an average AUPR of 0.573. This trend aligns with the observations for other structure-based methods as illustrated in Figure 2. More importantly, in the practical scenario where only predicted structures are available for the target proteins, training the model with predicted structures (i.e., GPSite) results in superior performance than training the model with native structures (average AUPR of 0.594 against 0.573), probably owing to the consistency between the training and testing data. For completeness, the results in Appendix 3-figure 2 are also included where GPSite is tested with native structures (average AUPR of 0.637).”

      Author response table 2.

      Performance comparison on the ten binding site test sets under different training and evaluation settings

      Note: The numbers in this table are AUPR values. “Pep” and “Pro” denote peptide and protein, respectively. “Avg” means the average AUPR values among the ten test sets. “native” and “predicted” denote applying native and predicted structures as input, respectively.

      • Line 263: "ProtTrans consistently obtains competitive or superior performance compared to the MSA profiles, particularly for the target proteins with few homologous sequences (Neff < 2)."

      This seems a bit far-fetched. If we see clearly in the figure that the performances are far superior for Neff < 2. The performances seem rather similar for higher Neff. Could the author evaluate numerically the significance of the improvement? MSA profiles outperform GPSite on 4 intervals and I don't know the distribution of the data.

      RE: We thank the reviewer for the valuable suggestion. We have now revised this sentence to avoid any potential ambiguity:

      “As evidenced in Figure 4B and Appendix 2-table 8, ProtTrans consistently obtains competitive or superior performance compared to the MSA profile. Notably, for the target proteins with few homologous sequences (Neff < 2), ProtTrans surpasses MSA profile significantly with an improvement of 3.9% on AUC (P-value = 4.3×10-8).”

      The detailed significance tests and data distribution are now added in Appendix 2-table 8 and attached below as Author response-table 3 for convenience:

      Author response table 3.

      Performance comparison between GPSite and the baseline model using MSA profile for proteins with different Neff values in the combined test set of the ten ligands

      Note: Significance tests are performed following the procedure in 12,25. If P-value < 0.05, the difference between the performance is considered statistically significant.

      • Line 285: "We first visualized the distributions of residues in this dataset using t-SNE, where the residues are encoded by raw feature vectors encompassing ProtTrans embeddings and DSSP structural properties, or latent embedding vectors from the shared network of GPSite. "

      Wouldn't embedding from single-task be more relevant to show the interest of multi-task training here? Is the difference that big when comparing embeddings from single-task training to embeddings from multi-task training? Otherwise, I think the evidence from Figure 4e is sufficient, the interest of multitasking could be well-shown by single-task vs. multi-task AUPR and a few examples or predictions that are improved.

      RE: We thank the reviewer for the comment. In the second paragraph of the “The effects of protein features and model designs” section, we have compared the performance of multi-task and single-task learning. However, the visualization results in Figure 4D are related to the third paragraph, where we conducted a downstream exploration of the possibility to extend GPSite to other unseen ligands. This is based on the hypothesis that the shared network in GPSite may have captured certain common ligand-binding mechanisms during the preceding multi-task training process. We visualized the distributions of residues in an unseen carbohydrate-binding site dataset using t-SNE, where the residues are encoded by raw feature vectors (ProtTrans and DSSP), or latent embedding vectors from the shared network trained before. Although the shared network has not been specifically trained on the carbohydrate dataset, the latent representations from GPSite effectively improve the discriminability between the binding and non-binding residues as shown in Figure 4D. This finding indicates that the shared network trained on the initial set of ten molecule types has captured common binding mechanisms and may be applied to other unseen ligands.

      We have now added more descriptions in this paragraph to avoid potential ambiguity:

      “Residues that are conserved during evolution, exposed to solvent, or inside a pocket-shaped domain are inclined to participate in ligand binding. During the preceding multi-task training process, the shared network in GPSite should have learned to capture such common binding mechanisms. Here we show how GPSite can be easily extended to the binding site prediction for other unseen ligands by adopting the pre-trained shared network as a feature extractor. We considered a carbohydrate-binding site dataset from 54 which contains 100 proteins for training and 49 for testing. We first visualized the distributions of residues in this dataset using t-SNE 55, where the residues are encoded by raw feature vectors encompassing ProtTrans embeddings and DSSP structural properties, or latent embedding vectors from the shared network of GPSite trained on the ten molecule types previously.”

      • Line291: "Employing these informative hidden embeddings as input features to train a simple MLP exhibits remarkable performance with an AUC of 0.881 (Figure 4E), higher than that of training a single-task version of GPSite from scratch (AUC of 0.853) or other state-of-the-art methods such as MTDsite and SPRINT-CBH."

      Is it necessary to introduce other methods here? The single-task vs multi-task seems enough for what you want to show?

      RE: We thank the reviewer for the comment. As discussed above, here we aim to show the potential of GPSite for the binding site prediction of unseen ligand (i.e., carbohydrate) by adopting the pre-trained shared network as a feature extractor. Thus, we think it’s reasonable to also include the performance of other state-of-the-art methods in this carbohydrate benchmark dataset as baselines.

      • Line 321: "Specifically, a protein-level binding score can be generated for each ligand by averaging the top k predicted scores among all residues. Empirically, we set k to 5 for metal ions and 10 for other ligands, considering that the binding interfaces of metal ions are usually smaller."

      Since binding sites are usually not localized on one single amino-acid, we can expect that most of the top k residues are localized around the same area of the protein both spatially and along the sequence. Is it something you observe and could consider in your method?

      RE: We thank the reviewer for the comment. We employed a straightforward method (top-k average) to convert GPSite’s residue-level annotations into protein-level annotations, where k was set empirically based on the distributions of the numbers of binding residues per sequence observed in the training set. We have not put much effort in optimizing this strategy since it mainly serves as a proof-of-concept experiment (Figure 5 A-C) to show the potential of GPSite in discriminating ligand-binding proteins. We have now revised this sentence to better explain how we selected k:

      “Specifically, a protein-level binding score indicating the overall binding propensity to a specific ligand can be generated by averaging the top k predicted scores among all residues. Empirically, we set k to 5 for metal ions and 10 for other ligands, considering the distributions of the numbers of binding residues per sequence observed in the training set.”

      As for the question raised by the reviewer, we can indeed expect that most of the top k predicted binding residues tend to cluster into several but not necessarily one area. For instance, certain macromolecules like DNA may interact with several protein surface patches due to their elongated structures (e.g., Author esponse-figure 1A). Another case may be a protein binding to multiple molecules of the same ligand type (e.g., Author response-figure 1B).

      Author response image 1.

      The structures of 4XQK (A) and 4KYW (B) in PDB.

      • Line 327: The accuracy of the GPSite protein-level binding scores is further validated by the ROC curves in Figure 5B, where GPSite achieves satisfactory AUC values for all ligands except protein (AUC of 0.608).

      Here may be a good place to compare yourself with others, do other frameworks experience the same problem? If so, AUC and AUPR are not relevant here, can you expose some recall scores for example?

      RE: We thank the reviewer for the valuable recommendation. We have conducted comprehensive method comparisons in the preceding “GPSite outperforms state-of-the-art methods” section, where GPSite surpasses all existing frameworks across various ligands. Here, the genome-wide analyses of Swiss-Prot in Figure 5 serve as a downstream demonstration of GPSite’s capacity for large-scale annotations. We didn’t compare with other methods since most of them are time-consuming or memory-consuming, thus unavailable to process sequences of substantial quantity or length. For example, it takes about 8 min for the MSA-based method GraphBind to annotate a protein with 500 residues, while it just takes about 20 s for GPSite (see Appendix 3-figure 1 for detailed runtime comparison). It is also challenging for the atom-graph-based method PeSTo to process structures more than 100 kDa (~1000 residues) on a 32 GB GPU as the authors suggested, while GPSite can easily process structures containing up to 2500 residues on a 16 GB GPU.

      Regarding the recall score mentioned by the reviewer, GPSite achieves a recall of 0.95 (threshold = 0.5) for identifying protein-binding proteins. This indicates that GPSite can accurately identify positive samples, but it also tends to misclassify negative samples as positive. In our original manuscript, we claimed that “This may be ascribed to the fact that protein-protein interactions are ubiquitous in living organisms while the Swiss-Prot function annotations are incomplete”. To better support this claim, we have now added two examples in Appendix 1-note 7, where GPSite confidently predicted the presences of the “protein binding” function (GO:0005515). Notably, this function was absent in these two proteins in the Swiss-Prot database at the time of manuscript preparation (release: 2023-05-03), but has been included in the latest release of Swiss-Prot (release: 2023-11-08). For convenience, we also attach the note here:

      “As depicted in Figure 5A, GPSite assigns relatively high prediction scores to the proteins without “protein binding” function in the Swiss-Prot annotations, leading to a modest AUC value of 0.608 (Figure 5B). This may be ascribed to the fact that protein-protein interactions are ubiquitous in living organisms while the Swiss-Prot function annotations are incomplete. To support this hypothesis, we present two proteins as case studies, both sharing < 20% sequence identity with the protein-binding training set of GPSite. The first case is Aminodeoxychorismate synthase component 2 from Escherichia coli (UniProt ID: P00903). GPSite confidently predicted this protein as a protein-binding protein with a high prediction score of 0.936. Notably, this protein was not annotated with the “protein binding” function (GO:0005515) or any of its GO child terms in the Swiss-Prot database at the time of manuscript preparation (https://rest.uniprot.org/unisave/P00903?format=txt&versions=171, release: 2023-05-03). However, in the latest release of Swiss-Prot (https://rest.uniprot.org/unisave/P00903?format=txt&versions=174, release: 2023-11-08) during manuscript revision, this protein is annotated with the “protein heterodimerization activity” function (GO:0046982), which is a child term of “protein binding”. In fact, the heterodimerization activity of this protein has been validated through experiments in the year of 1996 (PMID: 8679677), indicating the potential incompleteness of the Swiss-Prot annotations. The other case is Hydrogenase-2 operon protein HybE from Escherichia coli (UniProt ID: P0AAN1), which was also predicted as a protein-binding protein by GPSite (score = 0.909). Similarly, this protein was not annotated with the “protein binding” function in the Swiss-Prot database at the time of manuscript preparation (https://rest.uniprot.org/unisave/P0AAN1?format=txt&versions=108). However, in the latest release of Swiss-Prot (https://rest.uniprot.org/unisave/P0AAN1?format=txt&versions=111), this protein is annotated with the “preprotein binding” function (GO:0070678), which is a child term of “protein binding”. In fact, the preprotein binding function of this protein has been validated through experiments in the year of 2003 (PMID: 12914940). These cases demonstrate the effectiveness of GPSite for completing the missing function annotations in Swiss-Prot.”

      • Line 381: 'Despite the noteworthy advancements achieved by GPSite, there remains scope for further improvements. Given that the ESM Metagenomic Atlas 34 provides 772 million predicted protein structures along with pre-computed language model embeddings, self-supervised learning can be employed to train a GPSite model for predicting masked sequence and structure attributes, or maximizing the similarity between the learned representations of substructures from identical proteins while minimizing the similarity between those from different proteins using a contrastive loss function training from scratch. Additional opportunities for upgrade exist within the network architecture. For example, a variational Expectation-Maximization (EM) framework 58 can be adopted to handle the hierarchical graph structure inherent in proteins, which contains the top view of the residue graph and the bottom view of the atom graph inside a residue. Such an EM procedure enables training two separate graph neural networks for the two views while simultaneously allowing interaction and mutual enhancement between the two modules. Meta-learning could also be explored in this multi-task scenario, which allows fast adaptation to unseen tasks with limited labels.'

      I think this does not belong here. It feels like half of your discussion is not talking about the achievements of this paper but future very specific directions. Focus on the take-home arguments (performances of the model, ability to predict a large range of tasks, interest in key components of your model, easy use) of the paper and possible future direction but without being so specific.

      RE: We thank the reviewer for the valuable suggestion. We have now simplified the discussions on the future directions notably:

      “Despite the noteworthy advancements achieved by GPSite, there remains scope for further improvements. GPSite may be improved by pre-training on the abundant predicted structures in ESM Metagenomic Atlas, and then fine-tuning on binding site datasets. Besides, the hidden embeddings from ESMFold may also serve as informative protein representations. Additional opportunities for upgrade exist within the network architecture. For example, a variational Expectation-Maximization framework can be adopted to handle the hierarchical atom-to-residue graph structure inherent in proteins. Meta-learning could also be explored in this multi-task scenario, which allows fast adaptation to unseen tasks with limited labels.”

      • Overall there is also a lack of displayed structure. You should try to select a few examples of binding sites that were identified correctly by your method and not by others, if possible get some insights on why. Also, some negative examples could be interesting so as to have a better idea of the interest.

      RE: We thank the reviewer for the valuable recommendation. We have performed a case study for the structure of the glucocorticoid receptor in Figure 3 D-H to illustrate a potential reason for the robustness of GPSite. Moreover, we have now added a case study in Appendix 1-note 3 and Appendix 3-figure 5 to explain why GPSite sometimes is not as accurate as the state-of-the-art structure-based method. For convenience, we also attach the note and figure here:

      “Here we present an example of an RNA-binding protein, i.e., the ribosome biogenesis protein ERB1 (PDB: 7R6Q, chain m), to illustrate the impact of predicted structure’s quality. As shown in Appendix 3-figure 5, ERB1 is an integral component of a large multimer structure comprising protein and RNA chains (i.e., the state E2 nucleolar 60S ribosome biogenesis intermediate). Likely due to the neglect of interactions from other protein chains, ESMFold fails to predict the correct conformation of the ERB1 chain (TM-score = 0.24). Using this incorrect predicted structure, GPSite achieves an AUPR of 0.580, lower than GraphBind input with the native structure (AUPR = 0.636). However, the performance of GraphBind substantially declines to an AUPR of 0.468 when employing the predicted structure as input. Moreover, if GPSite adopts the native structure for prediction, a notable performance boost can be obtained (AUPR = 0.681).”

      Author response image 2.

      The prediction results of GPSite and GraphBind for the ribosome biogenesis protein ERB1. (A) The state E2 nucleolar 60S ribosome biogenesis intermediate (PDB: 7R6Q). The ribosome biogenesis protein ERB1 (chain m) is highlighted in blue, while other protein chains are colored in gray. The RNA chains are shown in orange. (B) The RNA-binding sites on ERB1 (colored in red). (C) The ESMFold-predicted structure of ERB1 (TM-score = 0.24). The RNA-binding sites are also mapped onto this predicted structure (colored in red). (D-G) The prediction results of GPSite and GraphBind for the predicted and native ERB1 structures. The confidence of the predictions is represented with a gradient of color from blue for non-binding to red for binding.

      Minor comments:

      • Line 169: "Note that since our test sets may partly overlap with the training sets of these methods, the results reported here should be the upper limits for the existing methods."

      Yes, but they were potentially not trained on the most recent structures in that case. These methods could also see improved performance with an updated training set.

      RE: We thank the reviewer for the comment. We have now deleted this sentence.

      • Line176: "Since 358 of the 375 proteins in our protein-binding site test set share > 30% identity with the training sequences of PeSTo, we re-split our protein-binding dataset to generate a test set of 65 proteins sharing < 30% identity with the training set of PeSTo for a fair evaluation."

      Too specific to be here in my opinion.

      RE: We thank the reviewer for the comment. We have now moved these details to Appendix 1-note 2. The description in the main text here is now more concise:

      “Given the substantial overlap between our protein-binding site test set and the training set of PeSTo, we conducted separate training and comparison using the datasets of PeSTo, where GPSite still demonstrates a remarkable improvement over PeSTo (Appendix 1-note 2).”

      • Figure 2. The authors should try to either increase Fig A's size or increase the font size. This could probably be done by compressing the size of Figure C into a single figure.

      RE: We thank the reviewer for the suggestion. We have now increased the font size in Figure A. Besides, the figures in the final version of the manuscript should be clearer where we could upload SVG files.

      • Have you tried using embeddings from more structure-aware pLM such as ESM Fold embeddings (fine-tuned) or ProstTrans (that may be more recent than this study)?

      RE: We thank the reviewer for the insightful comment. We have not yet explored the embeddings from structure-aware pLM, but we acknowledge its potential as a promising avenue for future investigation. We have now added this point in our Discussion section:

      “Besides, the hidden embeddings from ESMFold may also serve as informative protein representations.”

      Reviewer #3 (Public Review):

      Summary

      The authors of this work aim to address the challenge of accurately and efficiently identifying protein binding sites from sequences. They recognize that the limitations of current methods, including reliance on multiple sequence alignments or experimental protein structure, and the under-explored geometry of the structure, which limit the performance and genome-scale applications. The authors have developed a multi-task network called GPSite that predicts binding residues for a range of biologically relevant molecules, including DNA, RNA, peptides, proteins, ATP, HEM, and metal ions, using a combination of sequence embeddings from protein language models and ESMFold-predicted structures. Their approach attempts to extract residual and relational geometric contexts in an end-to-end manner, surpassing current sequence-based and structure-based methods.

      Strengths

      • The GPSite model's ability to predict binding sites for a wide variety of molecules, including DNA, RNA, peptides, and various metal ions.

      • Based on the presented results, GPSite outperforms state-of-the-art methods in several benchmark datasets.

      • GPSite adopts predicted structures instead of native structures as input, enabling the model to be applied to a wider range of scenarios where native structures are rare.

      • The authors emphasize the low computational cost of GPSite, which enables rapid genome-scale binding residue annotations, indicating the model's potential for large-scale applications.

      RE: We thank the reviewer for recognizing the significance and value of our work!

      Weaknesses

      • One major advantage of GPSite, as claimed by the authors, is its efficiency. Although the manuscript mentioned that the inference takes about 5 hours for all datasets, it remains unclear how much improvement GPSite can offer compared with existing methods. A more detailed benchmark comparison of running time against other methods is recommended (including the running time of different components, since some methods like GPSite use predicted structures while some use native structures).

      RE: We thank the reviewer for the valuable suggestion. Empirically, it takes about 5-20 min for existing MSA-based methods to make predictions for a protein with 500 residues, while it only takes about 1 min for GPSite (including structure prediction). However, it is worth noting that some predictors in our benchmark study are solely available as webservers, and it is challenging to compare the runtime between a standalone program and a webserver due to the disparity in hardware configurations. Therefore, we have now included comprehensive runtime comparisons between the GPSite webserver and other top-performing servers in Appendix 3-figure 1 to illustrate the practicality and efficiency of our method. For convenience, we also attach the figure here as Author response-figure 3. The corresponding description is now added in the “GPSite outperforms state-of-the-art methods” section:

      “Moreover, GPSite is computationally efficient, achieving comparable or faster prediction speed compared to other top-performing methods (Appendix 3-figure 1).”

      Author response image 3.

      Runtime comparison of the GPSite webserver with other top-performing servers. Five protein chains (i.e., 8HN4_B, 8USJ_A, 8C1U_A, 8K3V_A and 8EXO_A) comprising 100, 300, 500, 700, and 900 residues, respectively, were selected for testing, and the average runtime is reported for each method. Note that a significant portion of GPSite’s runtime (75 s, indicated in orange) is allocated to structure prediction using ESMFold.

      • Since the model uses predicted protein structure, the authors have conducted some studies on the effect of the predicted structure's quality. However, only the 0.7 threshold was used. A more comprehensive analysis with several different thresholds is recommended.

      RE: We thank the reviewer for the comment. We assessed the effect of the predicted structure's quality by evaluating GPSite’s performance on high-quality (TM-score > 0.7) and low-quality (TM-score ≤ 0.7) predicted structures. We did not employ multiple thresholds (e.g., 0.3, 0.5, and 0.7), as the majority of proteins in the test sets were accurately predicted by ESMFold. Specifically, as shown in Figure 3B, Appendix 3-figure 3 and Appendix 2-table 5, the numbers of proteins with TM-score ≤ 0.7 are small in most datasets (e.g., 42 for DNA and 17 for ATP). Consequently, there is insufficient data available for analysis with lower thresholds, except for the RNA test set. Notably, Figure 3C presents a detailed inspection of the 104 proteins with TM-score < 0.5 in the RNA test set. Within this subset, GPSite consistently outperforms the state-of-the-art structure-based method GraphBind with predicted structures as input, regardless of the prediction quality of ESMFold. Only in cases where structures are predicted with extremely low quality (TM-score < 0.3) does GPSite fall behind GraphBind input with native structures. This result further demonstrates the robustness of GPSite. We have now added clearer explanations in the “GPSite is robust for low-quality predicted structures” section:

      “Figure 3B and Appendix 3-figure 3 show the distributions of TM-scores between native and predicted structures calculated by US-align in the ten benchmark datasets, where most proteins are accurately predicted with TM-score > 0.7 (see also Appendix 2-table 5)”; “Given the infrequency of low-quality predicted structures except for the RNA test set, we took a closer inspection of the 104 proteins with predicted structures of TM-score < 0.5 in the RNA test set.”

      • To demonstrate the robustness of GPSite, the authors performed a case study on human GR containing two zinc fingers, where the predicted structure is not perfect. The analysis could benefit from more a detailed explanation of why the model can still infer the binding site correctly even though the input structural information is slightly off.

      RE: We thank the reviewer for the comment. We have actually explained the potential reason for the robustness of GPSite in the second paragraph of the “GPSite is robust for low-quality predicted structures” section. In summary, although the whole structure of this protein is not perfectly predicted, the local structures of the binding domains of peptide, DNA and Zn2+ are actually predicted accurately as evidenced by the superpositions of the native and predicted structures in Figure 3D and 3E. Therefore, GPSite can still make reliable predictions. We have now revised this paragraph to explain these more clearly:

      “Figure 3D shows the structure of the human glucocorticoid receptor (GR), a transcription factor that binds DNA and assembles a coactivator peptide to regulate gene transcription (PDB: 7PRW, chain A). The DNA-binding domain of GR also consists of two C4-type zinc fingers to bind Zn2+ ions. Although the structure of this protein is not perfectly predicted (TM-score = 0.72), the local structures of the binding domains of peptide and DNA are actually predicted accurately as viewed by the superpositions of the native and predicted structures in Figure 3D and 3E. Therefore, GPSite can correctly predict all Zn2+ binding sites and precisely identify the binding sites of DNA and peptide with AUPR values of 0.949 and 0.924, respectively (Figure 3F, G and H).”

      • To analyze the relatively low AUC value for protein-protein interactions, the authors claimed that it is "due to the fact that protein-protein interactions are ubiquitous in living organisms while the Swiss-Prot function annotations are incomplete", which is unjustified. It is highly recommended to support this claim by showing at least one example where GPSite's prediction is a valid binding site that is not present in the current Swiss-Prot database or via other approaches.

      RE: We thank the reviewer for the valuable recommendation. To support this claim, we have now added two examples in Appendix 1-note 7, where GPSite confidently predicted the presences of the “protein binding” function (GO:0005515). Notably, this function was absent in these two proteins in the Swiss-Prot database at the time of manuscript preparation (release: 2023-05-03), but has been included in the latest release of Swiss-Prot (release: 2023-11-08). For convenience, we also attach the note below:

      “As depicted in Figure 5A, GPSite assigns relatively high prediction scores to the proteins without “protein binding” function in the Swiss-Prot annotations, leading to a modest AUC value of 0.608 (Figure 5B). This may be ascribed to the fact that protein-protein interactions are ubiquitous in living organisms while the Swiss-Prot function annotations are incomplete. To support this hypothesis, we present two proteins as case studies, both sharing < 20% sequence identity with the protein-binding training set of GPSite. The first case is Aminodeoxychorismate synthase component 2 from Escherichia coli (UniProt ID: P00903). GPSite confidently predicted this protein as a protein-binding protein with a high prediction score of 0.936. Notably, this protein was not annotated with the “protein binding” function (GO:0005515) or any of its GO child terms in the Swiss-Prot database at the time of manuscript preparation (https://rest.uniprot.org/unisave/P00903?format=txt&versions=171, release: 2023-05-03). However, in the latest release of Swiss-Prot (https://rest.uniprot.org/unisave/P00903?format=txt&versions=174, release: 2023-11-08) during manuscript revision, this protein is annotated with the “protein heterodimerization activity” function (GO:0046982), which is a child term of “protein binding”. In fact, the heterodimerization activity of this protein has been validated through experiments in the year of 1996 (PMID: 8679677), indicating the potential incompleteness of the Swiss-Prot annotations. The other case is Hydrogenase-2 operon protein HybE from Escherichia coli (UniProt ID: P0AAN1), which was also predicted as a protein-binding protein by GPSite (score = 0.909). Similarly, this protein was not annotated with the “protein binding” function in the Swiss-Prot database at the time of manuscript preparation (https://rest.uniprot.org/unisave/P0AAN1?format=txt&versions=108). However, in the latest release of Swiss-Prot (https://rest.uniprot.org/unisave/P0AAN1?format=txt&versions=111), this protein is annotated with the “preprotein binding” function (GO:0070678), which is a child term of “protein binding”. In fact, the preprotein binding function of this protein has been validated through experiments in the year of 2003 (PMID: 12914940). These cases demonstrate the effectiveness of GPSite for completing the missing function annotations in Swiss-Prot.”

      • The authors reported that many GPSite-predicted binding sites are associated with known biological functions. Notably, for RNA-binding sites, there is a significantly higher proportion of translation-related binding sites. The analysis could benefit from a further investigation into this observation, such as the analyzing the percentage of such interactions in the training site. In addition, if there is sufficient data, it would also be interesting to see the cross-interaction-type performance of the proposed model, e.g., train the model on a dataset excluding specific binding sites and test its performance on that class of interactions.

      RE: We thank the reviewer for the suggestion. We would like to clarify that the analysis in Figure 5C was conducted at “protein-level” instead of “residue-level”. As described in the second paragraph of the “Large-scale binding site annotation for Swiss-Prot” section, a protein-level ligand-binding score was assigned to a protein by averaging the top k residue-level predicted binding scores. This protein-level score indicates the overall binding propensity of the protein to a specific ligand. We gathered the top 20,000 proteins with the highest protein-level binding scores for each ligand and found that their biological process annotations from Swiss-Prot were consistent with existing knowledge. We have now revised the corresponding sentence to explain these more clearly:

      “Exploiting the residue-level binding site annotations, we could readily extend GPSite to discriminate between binding and non-binding proteins of various ligands. Specifically, a protein-level binding score indicating the overall binding propensity to a specific ligand can be generated by averaging the top k predicted scores among all residues.”

      As for the cross-interaction-type performance raised by the reviewer, we have now conducted cross-type evaluations to investigate the specificity of the ligand-specific MLPs and the inherent similarities among different ligands in Appendix 1-note 6 and Appendix 2-table 10. For convenience, we also attach the note and table here:

      “We conducted cross-type evaluations by applying different ligand-specific MLPs in GPSite for the test sets of different ligands. As shown in Appendix 2-table 10, for each ligand-binding site test set, the corresponding ligand-specific network consistently achieves the best performance. This indicates that the ligand-specific MLPs have specifically learned the binding patterns of particular molecules. We also noticed that the cross-type performance is reasonable for the ligands sharing similar properties. For instance, the DNA-specific MLP exhibits a reasonable AUPR when predicting RNA-binding sites, and vice versa. Similar trends are also observed between peptide and protein, as well as among metal ions as expected. Interestingly, the cross-type performance between ATP and HEM is also acceptable, potentially attributed to their comparable molecular weights (507.2 and 616.5, respectively).”

      Author response table 4.

      Cross-type performance by applying different ligand-specific MLPs in GPSite for the test sets of different ligands

      Note: “Pep” and “Pro” denote peptide and protein, respectively. The numbers in this table are AUPR values. The best/second-best result in each test set is indicated by bold/underlined font.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Hats off to the authors for taking time to decipher the seemingly subtle but important differences between the Gnai2/3 double mutant and Ptx mutant phenotypes. These results further illustrate the dynamic requirement of Gnai/0 in hair bundle establishment. I have some minor suggestions for the authors to consider and it is up to the authors to decide whether to incorporate them:

      We decided to make the current (revised) version the version of record, and we explain why below. Please include these comments in the review+rebuttal material.

      (1) The abstract could be modified to reflect the revised interpretations of the results.

      Response: the abstract is high-level and the changes in interpretation in the revised manuscript do not modify the message there. Briefly, the abstract only states that Gnai2; Gnai3 double mutants recapitulate two defects previously only observed with pertussis toxin. There is no claim about the timing or dose of GNAI proteins involved.

      (2) The three rows of OHCs are like a different beast from each other. Mireille Montcouquiol's lab has demonstrated that there is a differential requirement for Gnai3 in hair bundle orientation among the three rows of OHCs. The results described in this manuscript support this notion as well.

      To clarify, Gnai3 inactivation does not affect OHC orientation. Only pertussis toxin, and in this work Gnai2; Gnai3 double mutants, do. The Montcouquiol lab showed different degree of OHC1, OHC2 and OHC3 misorientation upon use of pertussis toxin in vitro using cochlear explants (Ezan et al 2013). We showed the same thing in vivo using transgenic models (Tarchini et al 2013; Tarchini et al 2016). The different OHC responses by row and corresponding citations are mentioned in several locations in the manuscript, including first on line 112 in the Introduction and in Fig. 1C in a graphical summary.

      (3) I wonder if "compensate" or "redundancy" may be a better term to use than "rescue" in the Discussion and figure.

      Use of “rescue” in the Discussion is line 603 and 604. We think that “rescue” is appropriate to refer to the ability of GNAI2 to compensate for the loss of GNAI1 and GNAI3 in mutant context. We would argue that these different wordings are largely interchangeable and do not change the message.


      Author Response

      The following is the authors’ response to the original reviews.

      We really appreciate the time the reviewers spent reading and commenting on the original manuscript. Although they were positive already, we decided to spend some time to address the main comments with new experiments as thoroughly as possible in a new manuscript version. We also heavily edited some sections accordingly.: 1) we delayed pertussis toxin activation in hair cells with Atoh1-Cre to show that the resulting misorientation phenotype is delayed compared to FoxG1-Cre results, as also seen in Gnai2; Gnai3 double mutants. It follows that Gnai2; Gnai3 and pertussis mutants do share a similar misorientation profile, and that GNAI proteins are required to normally reverse OHC1-2 (from medial to lateral), but also to maintain the lateral orientation, at least transiently. 2) We experimentally verified that one of our GNAI antibodies can indeed detect GNAI1, and consequently that absence of signal in Gnai2; Gnai3 double mutants is evidence that GNAI1 is not involved in apical hair cell polarization. We believe these changes strengthen the manuscript and its conclusions.

      Reviewer #1 (Public Review):

      A subclass of inhibitory heterotrimeric guanine nucleotide-binding protein subunits, GNAI, has been implicated in sensory hair cell formation, namely the establishment of hair bundle (stereocilia) orientation and staircase formation. However, the former role of hair bundle orientation has only been demonstrated in mutants expressing pertussis toxin, which blocks all GNAI subunits, but not in mutants with a single knockout of any of the Gnai genes, suggesting that there is a redundancy among various GNAI proteins in this role. Using various conditional mutants, the authors concluded that GNAI3 is the primary GNAI proteins required for hair bundle morphogenesis, whereas hair bundle orientation requires both GNAI2 and GNAI3.

      Strength

      Various compound mutants were generated to decipher the contribution of individual GNAI1, GNAI2, GNAI3 and GNAIO in the establishment of hair bundle orientation and morphogenesis. The study is thorough with detailed quantification of hair bundle orientation and morphogenesis, as well as auditory functions.

      Weakness

      While the hair bundle orientation phenotype in the Foxg1-cre; Gnai2-/-; Gnai3 lox/lox (double mutants) appear more severe than those observed in Ptx cKO mutants, it may be an oversimplification to attribute the differences to more GNAI function in the Ptx cko mutants. The phenotypes between the double mutants and Ptx cko mutants appear qualitatively different. For example, assuming the milder phenotypes in the Ptx cKO is due to incomplete loss of GNAI function, one would expect the Ptx phenotype would be reproducible by some combination of compound mutants among various Gnai genes. Such information was not provided. Furthermore, of all the double mutant specimens analyzed for hair bundle orientation (Fig. 8), the hair bundle/kinocilium position started out normally in the lateral quadrant at E17.5 but failed to be maintained by P0. This does not appear to be the case for Ptx cKO, in which all affected hair cells showed inverted orientation by E17.5. It is not clear whether this is the end-stage of bundle orientation in Ptx cKO, and the kinocilium position started out normal, similar to the double mutants before the age of analysis at E17.5. Understanding these differences may reveal specific requirements of individual GNAI subunits or other factors are being affected in the Ptx mutants.

      This criticism was very useful and prompted new experiments as well as a change in data presentation and a fundamental rewrite regarding hair cell orientation. These changes are detailed below. Of note, however, please let us clarify that the original manuscript did show that the ptxA orientation phenotype is reproduced to some extent in Gnai2; Gnai3 double mutants (previously Fig. 8 and corresponding text line 505). We showed that OHC1-2 are also inverted in the double mutant, although at a later differentiation stage. We recognize that similarities in hair cell misorientation between ptxA and Gnai2; Gnai3 DKO were not explained and discussed well enough. This part of the manuscript has been re-worked extensively, and we hope that along with new results, comparisons between mutant models are easier to follow and understand. We notably fully adopted the idea that there are qualitative differences between ptxA and Gnai2; Gnai3 mutants, and not only a difference in the remaining “dose” of GNAI activity.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Comments related to clarification of the weakness:

      (1) In general, hair bundle orientation in the double mutants is established in the lateral quadrant of the cochlea before being inverted (Fig. 8). These results are intriguing because the lateral orientation is the correct position for these hair bundles normally and Gnai proteins are thought to be required to get the kinocilium to the lateral position. This process appears to proceed normally in the double mutants but the kinocilium reverted to the medial default position over time, which suggests that Gnai2 and Gnai3 are only required for the maintenance and not the establishment of the kinocilium in the lateral position. Is this phenotype qualitatively similar in the Ptx cKO?

      We addressed these issues with two types of modifications to the data:

      (1) We modified the eccentricity threshold used at E17.5 in Fig. 8 (orientation) to be more stringent, using 0.4 (instead of 0.25 previously) in both controls and mutants. This means that we now only graph the orientation of cells where eccentricity is more marked. The rationale is that at early stages, it is challenging to distinguish immature vs defective near-symmetrical cells. We kept a threshold of 0.25 at P0 when the hair cell apical surface is larger and better differentiated (Fig. 8C-D). Importantly, the dataset remains rigorously identical. This change usefully highlights that a large proportion of OHC1 is in fact inverted (oriented medially) at E17.5 in Gnai2; Gnai3 double mutants at the cochlear mid, as also seen in the ptxA model at the same stage and position (see new Fig. 8A). At the E17.5 base (Fig. 8B), a slightly more mature position, the outcome is unchanged (the majority of OHC1 are inverted using either a 0.25 or 0.4 threshold in double mutants and in ptxA).

      Interestingly however, the orientation trend is unchanged for OHC2: OHC2 remain oriented largely laterally (i.e. normally) at the E17.5 mid and base in Gnai2; Gnai3 double mutants even with a raised eccentricity thresholds, whereas by contrast OHC2 in ptxA are inverted at these stage and positions. In the double mutant, OHC2 only become inverted at the P0 base (Fig. 8D). This suggests that there are similarities (OHC1) but also differences (OHC2s) between the two mouse models, and that double mutants show a delay in adopting an inverted orientation compared to ptxA. Of note, OHC2 have been shown to differentiate later than OHC1 (for example, Anniko 1983 PMID:6869851).

      (2) To directly test the idea that the misorientation phenotype (inverted OHC1-2) is comparable between the two models but delayed in Gnai2; Gnai3 mutants, we performed a new experiment and added new results in the manuscript. We delayed ptxA action by using Atoh1-Cre (postmitotic hair cells) instead of FoxG1-Cre (otic progenitors). Remarkably, this produced a pattern of OHC1-2 misorientation more similar to Gnai2; Gnai3 mutants: at the E17.5 base and P0 apex, OHC2 were still largely oriented laterally (normally) in Atoh1-Cre; ptxA as in Gnai2; Gnai3 mutants whereas at the P0 base a large proportion of OHC2 were inverted (Fig. 8 Supp 1B). OHC1 were inverted at all stages and positions in the Atoh1-Cre as in the FoxG1-Cre; ptxA model. For Atoh1-Cre; ptxA, we only illustrated OHC1 and OHC2 and did not add E17.5 mid or P0 mid results because other cell types and stage/positions did not provide additional insight. In addition, we are well aware that the full FoxG1-Cre; ptxA and Gnai2; Gnai3 results for 4 cells types (IHC, OHC1-3) and 5 stages/positions is already a lot of data for cell orientation.

      These results suggest that:

      (a) The normal reversal of OHC1-2 to adopt a lateral orientation needs to be maintained, at least transiently, and that maintenance also relies on GNAI/O (Results starting line 529. Disussion line 621).

      (b) ptxA is more severe than Gnai2; Gnai3 when it comes to OHC1-2 orientation (Figure 9, role b). Oppositely, Gnai2; Gnai3 is obviously more severe when it comes to symmetry-breaking (Fig. 9, role a) and hair bundle morphogenesis (Fig. 9, c). It follows that the two early GNAI/O activities are qualitatively different and not just based on dose. This is essentially what this Reviewer correctly pointed out, and we have fully edited both Results and Discussion accordingly. We now speculate that the difference may lie in the identity of the necessary GNAI/O protein for each role. Any GNAI/O proteins acting as a switch downstream of the GPR156 receptor may relay orientation information (Fig. 9, role b), making ptxA a particularly effective disruption strategy since it downregulates all GNAI/O proteins. In contrast, symmetry-breaking may rely more specifically on GNAI2 and GNAI3, and ptxA is not expected to achieve a loss-of-function of GNAI2 and GNAI3 as extensive as a double targeted genetic inactivation of the corresponding genes. Please see new Results starting line 526 and Discussion starting line 603. We consequently abandoned the notion that increased doses of GNAI/O is required for each role, and we also clarify that symmetry-breaking (a) and orientation (b) occur at the same time (Fig. 9).

      (2) P0 may not be late enough a stage to access phenotype maturity in the double mutants. For example, it is not clear from the basal PO results whether the IHC will acquire an inverted phenotype or just misorientation in the lateral side.

      For context, the OHC1-2 misorientation pattern in the ptxA model at P0 does represent the end stage, as the same pattern is observed in adults (illustrated in Fig. 2A). In addition, OHC1-2 that express ptxA are inverted as soon as they break planar symmetry, and this was established at E16.5 in a previous publication where ptxA and Gpr156 misorientation patterns were compared and shown to be identical (Kindt et al., 2021 Supp. fig. 5C-D). However, we clearly failed to mention these important results in the original manuscript. We now cite Figure 2 for adult defects (line 522), and provide a citation for OHC1-2 inversion being observed from earliest stage of hair cell differentiation (Kindt et al., 2021) (line 519).

      The vast majority of Gnai2; Gnai3 double mutants die before weaning but the single specimen we managed to collect at P21 also showed inverted OHC1-2 (representative example in Fig. 2A). Again, we previously failed to point out this important result. We now do so line 214 and 555. This is another evidence that OHC1-2 misorientation is in fact similar in the ptxA and Gnai2; Gnai3 models (but milder and delayed in the latter).

      When it comes to IHCs and OHC3s however, the situation is less clear. These cell types are mildly misoriented in ptxA and Gpr156 mutants, but IHCs in particular appear severely misoriented in Gnai2; Gnai3 mutants based on the position of the basal body (Fig. 8). However, very dysmorphic hair bundles can pull on the basal body via the kinocilium and affect its position, which obscures hair cell orientation inferred from the basal body and subsequent interpretations. We do not delve on IHC and OHC3 and their orientation in Gnai2; Gnai3 mutants in the revision since we do not observe similar orientation defects in a different mouse model and lack sufficient adult data.

      Suggestions to improve upon the manuscript for readers:

      (1) Line 294, indicate on the figure the staining in bare zone and tips of stereocilia on row 1.

      Pertains to Figure 4. In A, we now point out the bare zone and stereocilia tips with arrow and arrowheads, respectively (as in other figures).

      (2) Fig.8 schematic diagram, the labels of the line and 90o side by side is misleading.

      We added black ticks for 0, 90, 180, 270 degree references. In contrast, the hair cell angle represented was switched to magenta.

      (3) Fig. 7 legend, redundancy towards the end of the paragraph.

      Thank you for catching this issue. A large portion of the legend was indeed accidentally repeated and is now deleted.

      (4) Line 490-493, Another plausible explanation is that other factors besides Gnai2 and Gnai3 are involved in breaking symmetry during bundle establishment.

      We now acknowledge that other proteins besides GNAI/O may be involved (Discussion line 614). That said, the notion that we do not achieve sufficient and/or early enough GNAI loss is supported for example by the Beer-Hammer 2018 study where no defects in symmetry-breaking or orientation were reported in their Gnai2 flox/flox; Gnai3 flox/flox model (Discussion new Line 637).

      (5) Line 518, the base were largely inverted (Figure 8B). Should Fig 8A be cited instead of 8B?

      Fig. 8B has graphs for the E17.5 cochlear base where OHC1-2 are inverted in both ptxA and Gnai2;3 DKO models. Fig. 8A has graphs of the E17.5 cochlear mid (less differentiated hair cells) where an inversion was not obvious previously, but is now clear although only partial in Gnai2; Gnai3 DKO (see above; raised eccentricity threshold). In the context of the previous text, this citation was thus correct. However, this section has been heavily modified to better compare Gnai2; Gnai3 DKO and ptxA and is hopefully less confusing in the revised version.

      Reviewer #2 (Public Review):

      Jarysta and colleagues set out to define how similar GNAI/O family members contribute to the shape and orientation of stereocilia bundles on auditory hair cells. Previous work demonstrated that loss of particular GNAI proteins, or inhibition of GNAIs by pertussis toxin, caused several defects in hair bundle morphogenesis, but open questions remained which the authors sought to address. Some of these questions include whether all phenotypes resulting from expression of pertussis toxin stemmed from GNAI inhibition; which GNAI family members are most critical for directing bundle development; whether GNAI proteins are needed for basal body movements that contribute to bundle patterning. These questions are important for understanding how tissue is patterned in response to planar cell polarity cues.

      To address questions related to the GNAI family in auditory hair cell development, the authors assembled an impressive and nearly comprehensive collection of mouse models. This approach allowed for each Gnai and Gnao gene to be knocked out individually or in combination with each other. Notably, a new floxed allele was generated for Gnai3 because loss of this gene in combination with Gnai2 deletion was known to be embryonic lethal. Besides these lines, a new knockin mouse was made to conditionally express untagged pertussis toxin following cre induction from a strong promoter. The breadth and complexity involved in generating and collecting these strains makes this study unique, and likely the authoritative last word on which GNAI proteins are needed for which aspect of auditory hair bundle development.

      Appropriate methods were employed by the authors to characterize auditory hair bundle morphology in each mouse line. Conclusions were carefully drawn from the data and largely based on excellent quantitative analysis. The main conclusions are that GNAI3 has the largest effect on hair bundle development. GNAI2 can compensate for GNAI3 loss in early development but incompletely in late development. The Gnai2 Gnai3 double mutant recapitulates nearly all the phenotypic effects associated with pertussis toxin expression and also reveals a role for GNAIs in early movement of the basal body. Although these results are not entirely unexpected based on earlier reports, the current results both uncover new functions and put putative functions on more solid ground.

      Based on this study, loss of GNAI1 and GNAO show a slight shortening of the tallest row of stereocilia but no other significant changes to bundle shape. Antibody staining shows no change in GNAI localization in the Gnai1 knockout, suggesting that little to no protein is found in hair cells. One caveat to this interpretation is that the antibody, while proposed to cross-react with GNAI1, is not clearly shown to immunolabel GNAI1. More than anything, this reservation mostly serves to illustrate how challenging it is to nail down every last detail. In turn, the comprehensive nature of the current study seems all the more impressive.

      (1) The original manuscript quantified stereocilia properties in Gnai1 and Gnai2 single mutants, and in Gnai1; Gnai2 double mutants using non-parametric t-tests (Mann-Whitney) for comparisons. This approach indeed suggested subtle reduction in row 1 height in IHCs in all 3 mutants. We did not quantify stereocilia features in Gnao1 mutants but could not observe defects (new Fig. 2 Supp. 1E-F). In fact, we could not observe defects in Gnai1 and Gnai2 single mutants, and in Gnai1; Gnai2 double mutants either. For this reason we have been ambivalent about reporting defects for Gnai1 and Gnai2 single and Gnai1; Gnai2 double mutants.

      In the revision, we applied a nested (hierarchical) t-test to avoid pseudo-replication (Eisner 2021; PMID: 33464305; https://pubmed.ncbi.nlm.nih.gov/33464305/). In our data, the nested t-tests structure measurements by animal instead of having all stereocilia or other cell measurements treated as independent values. This more stringent approach no longer finds row 1 height reduction significant in single Gnai1 or Gnai2 mutants, or in Gnai1; Gnai2 double mutants. We modified the text accordingly in Results and Discussion. Nested t-tests were applied uniformly across the manuscript and, besides IHC measurements in Fig. 2, now also apply to bare zone surface area in Fig. 6 and eccentricity in Fig. 7. For these experiments in contrast, previous conclusions are not changed. We think that this more careful statistical treatment is a closer representation of the data in term of the conclusions we can safely make.

      (2) The reviewer's criticism about antibody specificity is accurate and fair, and is fully addressed in the revised manuscript. First, we provide a phylogeny cartoon as Figure 1A to compare the GNAI/O proteins and highlight how closely related they are in sequence. To validate the assumption that our approach would detect GNAI1 if it were present in hair cells, we took a new dual experimental approach in the revision. First, we electroporated Gnai1, Gnai2 and Gnai3 expression constructs in the E13.5 inner ear and tested whether the two GNAI antibodies used in the study can detect ectopic GNAI1 in Kolliker organ. This revealed that “ptGNAI2” detects GNAI1 very well (in addition to GNAI2), but that “scbtGNAI3” does not detect GNAI1 efficiently (although it does detect GNAI3 very well). To verify in vivo that “ptGNAI2” can detect endogenous GNAI1, we immunolabeled the gallbladder epithelium in Gnai1 mutants and littermate controls using the “ptGNAI2” antibody. Based on IMPC consortium data* about the Gnai1 LacZ mouse strain, Gnai1 is specifically expressed in the adult gallbladder. We could verify that signals detected in the Gnai1 mutants were visually reduced in comparison to littermate controls. We now added this validation step in Results line 309 and the data in Fig. 4 Supp. 1A-B).

      *https://www.mousephenotype.org/data/genes/MGI:95771

      Reviewer #2 (Recommendations For The Authors):

      Minor comments that may marginally improve clarity.

      Abstract line 24: delete "nor polarized" because polarization cannot be assessed since the protein is undetectable.

      This is a fair point, now deleted.

      Consider revising: Lines 80-82; 188-202 (the order in which the mutants were presented was hard to follow for me); 239-240.

      Lines 80-82: Used to read as "Ptx recapitulates severe stereocilia stunting and immature-looking hair bundles observed when GPSM2 or both GNAI2 and GNAI3 are inactivated."

      Line 88: Was now changed to "Ptx provokes immature-looking hair bundles with severely stunted stereocilia, mimicking defects in Gpsm2 mutants and Gnai2; Gnai3 double mutants".

      Lines 188-202: This was the first paragraph describing adult stereocilia defects in the different Gnai/o mouse strains. We completely rewrote the entire section to reflect the order in which the strains appear in Figure 2, hopefully making the text easier to follow because it better matches panels in Fig. 2 . We also made several other modifications to streamline comparisons and better introduce the orientation defects that are later detailed at neonate stages.

      Lines 239-240: Used to read "GNAI2 makes a clear contribution since stereocilia defects increase in severity when GNAI loss extends from GNAI3 to both GNAI2 and GNAI3".

      Line 247: Was now changed for "GNAI2 makes a clear contribution since Gnai3neo stereocilia defects dramatically increase in severity when GNAI2 is absent as well in Gnai2; Gnai3 double mutants."

      Line 164: hardwired is unclear. Conserved?

      We modified this sentence as follows: Line 171: "We reasoned that apical HC development is probably highly constrained and less likely to be influenced by genetic heterogeneity compared to susceptibility to disease, for example."

      Line 299: It is not clear why GNAI1 is a better target than GNAI3. This phrase is repeated in line 303, I suspect inadvertently. Is there evidence that this antibody detects GNAI1, perhaps in another tissue? Line 308: GNAI1 may also not be detected by this antibody.

      Please see point 2 above. We removed these hypothetical statements entirely and we instead now experimentally show that one of the two commercial antibodies used can readily detect GNAI1 (yet does not detect signal in hair cells when GNAI2 and GNAI3 are absent in Fig. 4F).

    1. Author Response

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      (1) Substantial revision of the claims and interpretation of the results is needed, especially in the setting of additional data showing enhanced erythrophagocytosis with decreased RBC lifespan.

      Thank you for your valuable feedback and suggestion for a substantial revision of the claims and interpretation of our results. We acknowledge the importance of considering additional data that shows enhanced erythrophagocytosis with decreased RBC lifespan. In response, we have revised our manuscript and incorporated additional experimental data to support and clarify our findings.

      (1) In our original manuscript, we reported a decrease in the number of splenic red pulp macrophages (RPMs) and phagocytic erythrocytes after hypobaric hypoxia (HH) exposure. This conclusion was primarily based on our observations of reduced phagocytosis in the spleen.

      (2) Additional experimental data on RBC labeling and erythrophagocytosis:

      • Experiment 1 (RBC labeling and HH exposure)

      We conducted an experiment where RBCs from mice were labeled with PKH67 and injected back into the mice. These mice were then exposed to normal normoxia (NN) or HH for 7 or 14 days. The subsequent assessment of RPMs in the spleen using flow cytometry and immunofluorescence detection revealed a significant decrease in both the population of splenic RPMs (F4/80hiCD11blo, new Figure 5A and C) and PKH67-positive macrophages after HH exposure (as depicted in new Figure 5A and C-E). This finding supports our original claim of reduced phagocytosis under HH conditions.

      Author response image 1.

      -Experiment 2 (erythrophagocytosis enhancement)

      To examine the effects of enhanced erythrophagocytosis, we injected Tuftsin after administering PKH67-labelled RBCs. Our observations showed a significant decrease in PKH67 fluorescence in the spleen, particularly after Tuftsin injection compared to the NN group. This result suggests a reduction in RBC lifespan when erythrophagocytosis is enhanced (illustrated in new Figure 7, A-B).

      Author response image 2.

      (3) Revised conclusions:

      • The additional data from these experiments support our original findings by providing a more comprehensive view of the impact of HH exposure on splenic erythrophagocytosis.

      • The decrease in phagocytic RPMs and phagocytic erythrocytes after HH exposure, along with the observed decrease in RBC lifespan following enhanced erythrophagocytosis, collectively suggest a more complex interplay between hypoxia, erythrophagocytosis, and RBC lifespan than initially interpreted.

      We think that these revisions and additional experimental data provide a more robust and detailed understanding of the effects of HH on splenic erythrophagocytosis and RBCs lifespan. We hope that these changes adequately address the concerns raised and strengthen the conclusions drawn in our manuscript.

      (2) F4/80 high; CD11b low are true RPMs which the cells which the authors are presenting, i.e. splenic monocytes / pre-RPMs. To discuss RPM function requires the presentation of these cells specifically rather than general cells in the proper area of the spleen.

      Thank you for your feedback requesting a substantial revision of our claims and interpretation, particularly considering additional data showing enhanced erythrophagocytosis with decreased RBC lifespan. In response, we have thoroughly revised our manuscript and included new experimental data that further elucidate the effects of HH on RPMs and erythrophagocytosis.

      (1) Re-evaluation of RPMs population after HH exposure:

      • Flow cytometry analysis (new Figure 3G, Figure 5A and B): We revisited the analysis of RPMs (F4/80hiCD11blo) in the spleen after 7 and 14 days of HH exposure. Our revised flow cytometry data consistently showed a significant decrease in the RPMs population post-HH exposure, reinforcing our initial findings.

      Author response image 3.

      Author response image 4.

      • In situ expression of RPMs (Figure S1, A-D):

      We further confirmed the decreased population of RPMs through in situ co-staining with F4/80 and CD11b, and F4/80 and CD68, in spleen tissues. These results clearly demonstrated a significant reduction in F4/80hiCD11blo (Figure S1, A and B) and F4/80hiCD68hi (Figure S1, C and D) cells following HH exposure.

      Author response image 5.

      (2) Single-cell sequencing analysis of splenic RPMs:

      • We conducted a single-cell sequencing analysis of spleen samples post 7 days of HH exposure (Figure S2, A-C). This analysis revealed a notable shift in the distribution of RPMs, predominantly associated with Cluster 0 under NN conditions, to a reduced presence in this cluster after HH exposure.

      • Pseudo-time series analysis indicated a transition pattern change in spleen RPMs, with a shift from Cluster 2 and Cluster 1 towards Cluster 0 under NN conditions, and a reverse transition following HH exposure (Figure S2, B and D). This finding implies a decrease in resident RPMs in the spleen under HH conditions.

      (3) Consolidated findings and revised interpretation:

      • The comprehensive analysis of flow cytometry, in situ staining, and single-cell sequencing data consistently indicates a significant reduction in the number of RPMs following HH exposure.

      • These findings, taken together, strongly support the revised conclusion that HH exposure leads to a decrease in RPMs in the spleen, which in turn may affect erythrophagocytosis and RBC lifespan.

      Author response image 6.

      In conclusion, our revised manuscript now includes additional experimental data and analyses, strengthening our claims and providing a more nuanced interpretation of the impact of HH on spleen RPMs and related erythrophagocytosis processes. We believe these revisions and additional data address your concerns and enhance the scientific validity of our study.

      (3) RBC retention in the spleen should be measured anyway quantitatively, eg, with proper flow cytometry, to determine whether it is increased or decreased.

      Thank you for your query regarding the quantitative measurement of RBC retention in the spleen, particularly in relation to HH exposure. We have utilized a combination of techniques, including flow cytometry and histological staining, to investigate this aspect comprehensively. Below is a summary of our findings and methodology.

      (1) Flow cytometry analysis of labeled RBCs:

      • Our study employed both NHS-biotin (new Figure 4, A-D) and PKH67 labeling (new Figure 4, E-H) to track RBCs in mice exposed to HH. Flow cytometry results from these experiments (new Figure 4, A-H) showed a decrease in the proportion of labeled RBCs over time, both in the blood and spleen. Notably, there was a significantly greater reduction in the amplitude of fluorescently labeled RBCs after NN exposure compared to the reduced amplitude of fluorescently labeled RBCs observed in blood and spleen under HH exposure. The observed decrease in labeled RBCs was initially counterintuitive, as we expected an increase in RBC retention due to reduced erythrophagocytosis. However, this decrease can be attributed to the significantly increased production of RBCs following HH exposure, diluting the proportion of labeled cells.

      • Specifically, for blood, the biotin-labeled RBCs decreased by 12.06% under NN exposure and by 7.82% under HH exposure, while the PKH67-labeled RBCs decreased by 9.70% under NN exposure and by 4.09% under HH exposure. For spleen, the biotin-labeled RBCs decreased by 3.13% under NN exposure and by 0.46% under HH exposure, while the PKH67-labeled RBCs decreased by 1.16% under NN exposure and by 0.92% under HH exposure. These findings suggest that HH exposure leads to a decrease in the clearance rate of RBCs.

      Author response image 7.

      (2) Detection of erythrophagocytosis in spleen:

      To assess erythrophagocytosis directly, we labeled RBCs with PKH67 and analyzed their uptake by splenic macrophages (F4/80hi) after HH exposure. Our findings (new Figure 5, D-E) indicated a decrease in PKH67-positive macrophages in the spleen, suggesting reduced erythrophagocytosis.

      Author response image 8.

      (3) Flow cytometry analysis of RBC retention:

      Our flow cytometry analysis revealed a decrease in PKH67-positive RBCs in both blood and spleen (Figure S4). We postulated that this was due to increased RBC production after HH exposure. However, this method might not accurately reflect RBC retention, as it measures the proportion of PKH67-labeled RBCs relative to the total number of RBCs, which increased after HH exposure.

      Author response image 9.

      (4) Histological and immunostaining analysis:

      Histological examination using HE staining and band3 immunostaining in situ (new Figure 6, A-D, and G-H) revealed a significant increase in RBC numbers in the spleen after HH exposure. This was further confirmed by detecting retained RBCs in splenic single cells using Wright-Giemsa composite stain (new Figure 6, E and F) and retained PKH67-labelled RBCs in spleen (new Figure 6, I and J).

      Author response image 10.

      (5) Interpreting the data:

      The comprehensive analysis suggests a complex interplay between increased RBC production and decreased erythrophagocytosis in the spleen following HH exposure. While flow cytometry indicated a decrease in the proportion of labeled RBCs, histological and immunostaining analyses demonstrated an actual increase in RBCs retention in the spleen. These findings collectively suggest that while the overall RBCs production is upregulated following HH exposure, the spleen's capacity for erythrophagocytosis is concurrently diminished, leading to increased RBCs retention.

      (6) Conclusion:

      Taken together, our results indicate a significant increase in RBCs retention in the spleen post-HH exposure, likely due to reduced residual RPMs and erythrophagocytosis. This conclusion is supported by a combination of flow cytometry, histological staining, and immunostaining techniques, providing a comprehensive view of RBC dynamics under HH conditions. We think these findings offer a clear quantitative measure of RBC retention in the spleen, addressing the concerns raised in your question.

      (4) Numerous other methodological problems as listed below.

      We appreciate your question, which highlights the importance of using multiple analytical approaches to understand complex physiological processes. Please find below our point-by-point response to the methodological comments.

      Reviewer #1 (Recommendations For The Authors):

      (1) Decreased BM and spleen monocytes d/t increased liver monocyte migration is unclear. there is no evidence that this happens or why it would be a reasonable hypothesis, even in splenectomized mice.

      Thank you for highlighting the need for further clarification and justification of our hypothesized decrease in BM and spleen monocytes due to increased monocyte migration to the liver, particularly in the context of splenectomized mice. Indeed, our study has not explicitly verified an augmentation in mononuclear cell migration to the liver in splenectomized mice.

      Nonetheless, our investigations have revealed a notable increase in monocyte migration to the liver after HH exposure. Noteworthy is our discovery of a significant upregulation in colony stimulating factor-1 (CSF-1) expression in the liver, observed after both 7 and 14 days of HH exposure (data not included). This observation was substantiated through flow cytometry analysis (as depicted in Figure S4), which affirmed an enhanced migration of monocytes to the liver. Specifically, we noted a considerable increase in the population of transient macrophages, monocytes, and Kupffer cells in the liver following HH exposure.

      Author response image 11.

      Considering these findings, we hypothesize that hypoxic conditions may activate a compensatory mechanism that directs monocytes towards the liver, potentially linked to the liver’s integral role in the systemic immune response. In accordance with these insights, we intend to revise our manuscript to reflect the speculative nature of this hypothesis more accurately, and to delineate the strategies we propose for its further empirical investigation. This amendment ensures that our hypothesis is presented with full consideration of its speculative basis, supported by a coherent framework for future validation.

      (2) While F4/80+CD11b+ population is decreased, this is mainly driven by CD11b and F4/80+ alone population is significantly increased. This is counter to the hypothesis.

      Thank you for addressing the apparent discrepancy in our findings concerning the F4/80+CD11b+ population and the increase in the F4/80+ alone population, which seems to contradict our initial hypothesis. Your observation is indeed crucial for the integrity of our study, and we appreciate the opportunity to clarify this matter.

      (1) Clarification of flow cytometry results:

      • In response to the concerns raised, we revisited our flow cytometry experiments with a focus on more clearly distinguishing the cell populations. Our initial graph had some ambiguities in cell grouping, which might have led to misinterpretations.

      • The revised flow cytometry analysis, specifically aimed at identifying red pulp macrophages (RPMs) characterized as F4/80hiCD11blo in the spleen, demonstrated a significant decrease in the F4/80 population. This finding is now in alignment with our immunofluorescence results.

      Author response image 12.

      Author response image 13.

      (2) Revised data and interpretation:

      • The results presented in new Figure 3G and Figure 5 (A and B) consistently indicate a notable reduction in the RPMs population following HH exposure. This supports our revised understanding that HH exposure leads to a decrease in the specific macrophage subset (F4/80hiCD11blo) in the spleen.

      We’ve updated our manuscript to reflect these new findings and interpretations. The revised manuscript details the revised flow cytometry analysis and discusses the potential mechanisms behind the observed changes in macrophage populations.

      (3) HO-1 expression cannot be used as a surrogate to quantify number of macrophages as the expression per cell can decrease and give the same results. In addition, the localization of effect to the red pulp is not equivalent to an assertion that the conclusion applies to macrophages given the heterogeneity of this part of the organ and the spleen in general.

      Thank you for your insightful comments regarding the use of HO-1 expression as a surrogate marker for quantifying macrophage numbers, and for pointing out the complexity of attributing changes in HO-1 expression specifically to macrophages in the splenic red pulp. Your observations are indeed valid and warrant a detailed response.

      (1) Role of HO-1 in macrophage activity:

      • In our study, HO-1 expression was not utilized as a direct marker for quantifying macrophages. Instead, it was considered an indicator of macrophage activity, particularly in relation to erythrophagocytosis. HO-1, being upregulated in response to erythrophagocytosis, serves as an indirect marker of this process within splenic macrophages.

      • The rationale behind this approach was that increased HO-1 expression, induced by erythrophagocytosis in the spleen’s red pulp, could suggest an augmentation in the activity of splenic macrophages involved in this process.

      (2) Limitations of using HO-1 as an indicator:

      • We acknowledge your point that HO-1 expression per cell might decrease, potentially leading to misleading interpretations if used as a direct quantifier of macrophage numbers. The variability in HO-1 expression per cell indeed presents a limitation in using it as a sole indicator of macrophage quantity.

      • Furthermore, your observation about the heterogeneity of the spleen, particularly the red pulp, is crucial. The red pulp is a complex environment with various cell types, and asserting that changes in HO-1 expression are exclusive to macrophages could oversimplify this complexity.

      (3) Addressing the concerns:

      • To address these concerns, we propose to supplement our HO-1 expression data with additional specific markers for macrophages. This would help in correlating HO-1 expression more accurately with macrophage numbers and activity.

      • We also plan to conduct further studies to delineate the specific cell types in the red pulp contributing to HO-1 expression. This could involve techniques such as immunofluorescence or immunohistochemistry, which would allow us to localize HO-1 expression to specific cell populations within the splenic red pulp.

      We’ve revised our manuscript to clarify the role of HO-1 expression as an indirect marker of erythrophagocytosis and to acknowledge its limitations as a surrogate for quantifying macrophage numbers.

      (4) line 63-65 is inaccurate as red cell homeostasis reaches a new steady state in chronic hypoxia.

      Thank you for pointing out the inaccuracy in lines 63-65 of our manuscript regarding red cell homeostasis in chronic hypoxia. Your feedback is invaluable in ensuring the accuracy and scientific integrity of our work. We’ve revised lines 63-65 to accurately reflect the understanding.

      (5) Eryptosis is not defined in the manuscript.

      Thank you for highlighting the omission of a definition for eryptosis in our manuscript. We acknowledge the significance of precisely defining such key terminologies, particularly when they play a crucial role in the context of our research findings. Eryptosis, a term referenced in our study, is a specialized form of programmed cell death unique to erythrocytes. Similar with apoptosis in other cell types, eryptosis is characterized by distinct physiological changes including cell shrinkage, membrane blebbing, and the externalization of phosphatidylserine on the erythrocyte surface. These features are indicative of the RBCs lifecycle and its regulated destruction process.

      However, it is pertinent to note that our current study does not extensively delve into the mechanisms or implications of eryptosis. Our primary focus has been to elucidate the effects of HH exposure on the processes of splenic erythrophagocytosis and the resultant impact on the lifespan of RBCs. Given this focus, and to maintain the coherence and relevance of our manuscript, we have decided to exclude specific discussions of eryptosis from our revised manuscript. This decision aligns with our aim to provide a clear and concentrated exploration of the influence of HH exposure on RBCs dynamics and splenic function.

      We appreciate your input, which has significantly contributed to enhancing the clarity and accuracy of our manuscript. The revision ensures that our research is presented with a focused scope, aligning closely with our experimental investigations and findings.

      (6) Physiologically, there is no evidence that there is any "free iron" in cells, making line 89 point inaccurate.

      Thank you for highlighting the concern regarding the reference to "free iron" in cells in line 89 of our manuscript. The term "free iron" in our manuscript was intended to refer to divalent iron (Fe2+), rather than unbound iron ions freely circulating within cells. We acknowledge that the term "free iron" might lead to misconceptions, as it implies the presence of unchelated iron, which is not physiologically common due to the potential for oxidative damage. To rectify this and provide clarity, we’ve revised line 89 of our manuscript to reflect our meaning more accurately. Instead of "free iron," we use "divalent iron (Fe2+)" to avoid any misunderstanding regarding the state of iron in cells. We also ensure that any implications drawn from the presence of Fe2+ in cells are consistent with current scientific literature and understanding.

      (7) Fig 1f no stats

      We appreciate your critical review and suggestions, which help in improving the accuracy and clarity of our research. We’ve revised statistic diagram of new Figure 1F.

      (8) Splenectomy experiments demonstrate that erythrophagocytosis is almost completely replaced by functional macrophages in other tissues (likely Kupffer cells in the liver). there is only a minor defect and no data on whether it is in fact the liver or other organs that provide this replacement function and makes the assertions in lines 345-349 significantly overstated.

      Thank you for your critical assessment of our interpretation of the splenectomy experiments, especially concerning the role of erythrophagocytosis by macrophages in other tissues, such as Kupffer cells in the liver. We appreciate your observation that our assertions may be overstated and acknowledge the need for more specific data to identify which organs compensate for the loss of splenic erythrophagocytosis.

      (1) Splenectomy experiment findings:

      • Our findings in Figure 2D do indicate that in the splenectomized group under NN conditions, erythrophagocytosis is substantially compensated for by functional macrophages in other tissues. This is an important observation that highlights the body's ability to adapt to the loss of splenic function.

      • However, under HH conditions, our data suggest that the spleen plays an important role in managing erythrocyte turnover, as indicated by the significant impact of splenectomy on erythrophagocytosis and subsequent erythrocyte dynamics.

      (2) Addressing the lack of specific organ identification:

      • We acknowledge that our study does not definitively identify which organs, such as the liver or others, take over the erythrophagocytosis function post-splenectomy. This is an important aspect that needs further investigation.

      • To address this, we also plan to perform additional experiments that could more accurately point out the specific tissues compensating for the loss of splenic erythrophagocytosis. This could involve tracking labeled erythrocytes or using specific markers to identify macrophages actively engaged in erythrophagocytosis in various organs.

      (3) Revising manuscript statements:

      Considering your feedback, we’ve revised the statements in lines 345-349 (lines 378-383 in revised manuscript) to enhance the scientific rigor and clarity of our research presentation.

      (9) M1 vs M2 macrophage experiments are irrelevant to the main thrust of the manuscript, there are no references to support the use of only CD16 and CD86 for these purposes, and no stats are provided. It is also unclear why bone marrow monocyte data is presented and how it is relevant to the rest of the manuscript.

      Thank you for your critical evaluation of the relevance and presentation of the M1 vs. M2 macrophage experiments in our manuscript. We appreciate your insights, especially regarding the use of specific markers and the lack of statistical analysis, as well as the relevance of bone marrow monocyte data to our study's main focus.

      (1) Removal of M1 and M2 macrophage data:

      Based on your feedback and our reassessment, we agree that the results pertaining to M1 and M2 macrophages did not align well with the main objectives of our manuscript. Consequently, we have decided to remove the related content on M1 and M2 macrophages from the revised manuscript. This decision was made to ensure that our manuscript remains focused and coherent, highlighting our primary findings without the distraction of unrelated or insufficiently supported data.

      The use of only CD16 and CD86 markers for M1 and M2 macrophage characterization, without appropriate statistical analysis, was indeed a methodological limitation. We recognize that a more comprehensive set of markers and rigorous statistical analysis would be necessary for a meaningful interpretation of M1/M2 macrophage polarization. Furthermore, the relevance of these experiments to the central theme of our manuscript was not adequately established. Our study primarily focuses on erythrophagocytosis and red pulp macrophage dynamics under hypobaric hypoxia, and the M1/M2 polarization aspect did not contribute significantly to this narrative.

      (2) Clarification on bone marrow monocyte data:

      Regarding the inclusion of bone marrow monocyte data, we acknowledge that its relevance to the main thrust of the manuscript was not clearly articulated. In the revised manuscript, we provide a clearer rationale for its inclusion and how it relates to our primary objectives.

      (3) Commitment to clarity and relevance:

      We are committed to ensuring that every component of our manuscript contributes meaningfully to our overall objectives and research questions. Your feedback has been instrumental in guiding us to streamline our focus and present our findings more effectively.

      We appreciate your valuable feedback, which has led to a more focused and relevant presentation of our research. These changes enhance the clarity and impact of our manuscript, ensuring that it accurately reflects our key research findings.

      (10) Biotinolated RBC clearance is enhanced, demonstrating that RBC erythrophagocytosis is in fact ENHANCED, not diminished, calling into question the founding hypothesis that the manuscript proposes.

      Thank you for your critical evaluation of our data on biotinylated RBC clearance, which suggests enhanced erythrophagocytosis under HH conditions. This observation indeed challenges our founding hypothesis that erythrophagocytosis is diminished in this setting. Below is a summary of our findings and methodology.

      (1) Interpretation of RBC labeling results:

      Both the previous results of NHS-biotin labeled RBCs (new Figure 4, A-D) and the current results of PKH67-labeled RBCs (new Figure 4, E-H) demonstrated a decrease in the number of labeled RBCs with an increase in injection time. The production of RBCs, including bone marrow and spleen production, was significantly increased following HH exposure, resulting in a consistent decrease in the proportion of labeled RBCs via flow cytometry detection both in the blood and spleen of mice compared to the NN group. However, compared to the reduced amplitude of fluorescently labeled RBCs observed in blood and spleen under NN exposure, there was a significantly weaker reduction in the amplitude of fluorescently labeled RBCs after HH exposure. Specifically, for blood, the biotin-labeled RBCs decreased by 12.06% under NN exposure and by 7.82% under HH exposure, while the PKH67-labeled RBCs decreased by 9.70% under NN exposure and by 4.09% under HH exposure. For spleen, the biotin-labeled RBCs decreased by 3.13% under NN exposure and by 0.46% under HH exposure, while the PKH67-labeled RBCs decreased by 1.16% under NN exposure and by 0.92% under HH exposure.

      Author response image 14.

      (2) Increased RBCs production under HH conditions:

      It's important to note that RBCs production, including from bone marrow and spleen, was significantly increased following HH exposure. This increase in RBCs production could contribute to the decreased proportion of labeled RBCs observed in flow cytometry analyses, as there are more unlabeled RBCs diluting the proportion of labeled cells in the blood and spleen.

      (3) Analysis of erythrophagocytosis in RPMs:

      Our analysis of PKH67-labeled RBCs content within RPMs following HH exposure showed a significant reduction in the number of PKH67-positive RPMs in the spleen (new Figure 5). This finding suggests a decrease in erythrophagocytosis by RPMs under HH conditions.

      Author response image 15.

      (4) Reconciling the findings:

      The apparent contradiction between enhanced RBC clearance (suggested by the reduced proportion of labeled RBCs) and reduced erythrophagocytosis in RPMs (indicated by fewer PKH67-positive RPMs) may be explained by the increased overall production of RBCs under HH. This increased production could mask the actual erythrophagocytosis activity in terms of the proportion of labeled cells. Therefore, while the proportion of labeled RBCs decreases more significantly under HH conditions, this does not necessarily indicate an enhanced erythrophagocytosis rate, but rather an increased dilution effect due to higher RBCs turnover.

      (5) Revised interpretation and manuscript changes:

      Given these factors, we update our manuscript to reflect this detailed interpretation and clarify the implications of the increased RBCs production under HH conditions on our observations of labeled RBCs clearance and erythrophagocytosis. We appreciate your insightful feedback, which has prompted a careful re-examination of our data and interpretations. We hope that these revisions provide a more accurate and comprehensive understanding of the effects of HH on erythrophagocytosis and RBCs dynamics.

      (11) Legend in Fig 4c-4d looks incorrect and Fig 4e-4f is very non-specific since Wright stain does not provide evidence of what type of cells these are and making for a significant overstatement in the contribution of this data to "confirming" increased erythrophagocytosis in the spleen under HH exposure (line 395-396).

      Thank you for your insightful observations regarding the data presentation and figure legends in our manuscript, particularly in relation to Figure 4 (renamed as Figure 6 in the revised manuscript) and the use of Wright-Giemsa composite staining. We appreciate your constructive feedback and acknowledge the importance of presenting our data with utmost clarity and precision.

      (1) Amendments to Figure legends:

      We recognize the necessity of rectifying inaccuracies in the legends of the previously labeled Figure 4C and D. Corrections have been meticulously implemented to ensure the legends accurately contain the data presented. Additionally, we acknowledge the error concerning the description of Wright staining. The method employed in our study is Wright-Giemsa composite staining, which, unlike Wright staining that solely stains cytoplasm (RBC), is capable of staining both nuclei and cytoplasm.

      (2) Addressing the specificity of Wright-Giemsa Composite staining:

      Our approach involved quantifying RBC retention using Wright-Giemsa composite staining on single splenic cells post-perfusion at 7 and 14 days post HH exposure. We understand and appreciate your concerns regarding the nonspecific nature of Wright staining. Although Wright stain is a general hematologic stain and not explicitly specific for certain cell types, its application in our study aimed to provide preliminary insights. The spleen cells, devoid of nuclei and thus likely to be RBCs, were stained and observed post-perfusion, indicating RBC retention within the spleen.

      (3) Incorporating additional methods for RBC identification:

      To enhance the specificity of our findings, we integrated supplementary methods for RBC identification in the revised manuscript. We employed band3 immunostaining (in the new Figure 6, C-D and G-H) and PKH67 labeling (Figure 6, I-J) for a more targeted identification of RBCs. Band3, serving as a reliable marker for RBCs, augments the specificity of our immunostaining approach. Likewise, PKH67 labeling affords a direct and definitive means to assess RBC retention in the spleen following HH exposure.

      Author response image 16. same as 10

      (4) Revised interpretation and manuscript modifications:

      Based on these enhanced methodologies, we have refined our interpretation of the data and accordingly updated the manuscript. The revised narrative underscores that our conclusions regarding reduced erythrophagocytosis and RBC retention under HH conditions are corroborated by not only Wright-Giemsa composite staining but also by band3 immunostaining and PKH67 labeling, each contributing distinctively to our comprehensive understanding.

      We are committed to ensuring that our manuscript precisely reflects the contribution of each method to our findings and conclusions. Your thorough review has been invaluable in identifying and rectifying areas for improvement in our research report and interpretation.

      (12) Ferroptosis data in Fig 5 is not specific to macrophages and Fer-1 data confirms the expected effect of Fer-1 but there is no data that supports that Fer-1 reverses the destruction of these cells or restores their function in hypoxia. Finally, these experiments were performed in peritoneal macrophages which are functionally distinct from splenic RPM.

      Thank you for your critique of our presentation and interpretation of the ferroptosis data in Figure 5 (renamed as Figure 9 in the revised manuscript), as well as your observations regarding the specificity of the experiments to macrophages and the effects of Fer-1. We value your input and acknowledge the need to clarify these aspects in our manuscript.

      (1) Clarification on cell type used in experiments:

      • We appreciate your attention to the details of our experimental setup. The experiments presented in Figure 9 were indeed conducted on splenic macrophages, not peritoneal macrophages, as incorrectly mentioned in the original figure legend. This was an error in our manuscript, and we have revised the figure legend accordingly to accurately reflect the cell type used.

      (2) Specificity of ferroptosis data:

      • We recognize that the data presented in Figure 9 need to be more explicitly linked to the specific macrophage population being studied. In the revised manuscript, we ensure that the discussion around ferroptosis data is clearly situated within the framework of splenic macrophages.

      • We also provide additional methodological details in the 'Methods' section to reinforce the specificity of our experiments to splenic macrophages.

      (3) Effects of Fer-1 on macrophage function and survival:

      • Regarding the effect of Fer-1, we agree that while our data confirms the expected effect of Fer-1 in inhibiting ferroptosis, we have not provided direct evidence that Fer-1 reverses the destruction of macrophages or restores their function in hypoxia.

      • To address this, we propose additional experiments to specifically investigate the impact of Fer-1 on the survival and functional restoration of splenic macrophages under hypoxic conditions. This would involve assessing not only the inhibition of ferroptosis but also the recovery of macrophage functionality post-treatment.

      (4) Revised interpretation and manuscript changes:

      • We’ve revised the relevant sections of our manuscript to reflect these clarifications and proposed additional studies. This includes modifying the discussion of the ferroptosis data to more accurately represent the cell types involved and the limitations of our current findings regarding the effects of Fer-1.

      • The revised manuscript presents a more detailed interpretation of the ferroptosis data, clearly describing what our current experiments demonstrate and what remains to be investigated.

      We are grateful for your insightful feedback, which has highlighted important areas for improvement in our research presentation. We think that these revisions will enhance the clarity and scientific accuracy of our manuscript, ensuring that our findings and conclusions are well-supported and precisely communicated.

      Reviewer #2 (Recommendations For The Authors):

      The following questions and remarks should be considered by the authors:

      (1) The methods should clearly state whether the HH was discontinued during the 7 or 14 day exposure for cleaning, fresh water etc. Moreover, how was CO2 controlled? The procedure for splenectomy needs to be described in the methods.

      Thank you for your inquiry regarding the specifics of our experimental methods, particularly the management of HH exposure and the procedure for splenectomy. We appreciate your attention to detail and the importance of these aspects for the reproducibility and clarity of our research.

      (1) HH exposure conditions:

      In our experiments, mice were continuously exposed to HH for the entire duration of 7 or 14 days, without interruption for activities such as cleaning or providing fresh water. This uninterrupted exposure was crucial for maintaining consistent hypobaric conditions throughout the experiment. The hypobaric chamber was configured to ensure a ventilation rate of 25 air exchanges per minute. This high ventilation rate was effective in regulating the concentration of CO2 inside the chamber, thereby maintaining a stable environment for the mice.

      (2) The splenectomy was performed as follows:

      After anesthesia, the mice were placed in a supine position, and their limbs were fixed. The abdominal operation area was skinned, disinfected, and covered with a sterile towel. A median incision was made in the upper abdomen, followed by laparotomy to locate the spleen. The spleen was then carefully pulled out through the incision. The arterial and venous directions in the splenic pedicle were examined, and two vascular forceps were used to clamp all the tissue in the main cadre of blood vessels below the splenic portal. The splenic pedicle was cut between the forceps to remove the spleen. The end of the proximal hepatic artery was clamped with a vascular clamp, and double or through ligation was performed to secure the site. The abdominal cavity was then cleaned to ensure there was no bleeding at the ligation site, and the incision was closed. Post-operatively, the animals were housed individually. Generally, they were able to feed themselves after recovering from anesthesia and did not require special care.

      We hope this detailed description addresses your queries and provides a clear understanding of the experimental conditions and procedures used in our study. These methodological details are crucial for ensuring the accuracy and reproducibility of our research findings.

      (2) The lack of changes in MCH needs explanation? During stress erythropoiesis some limit in iron availability should cause MCH decrease particularly if the authors claim that macrophages for rapid iron recycling are decreased. Fig 1A is dispensable. Fig 1G NN control 14 days does not make sense since it is higher than 7 days of HH.

      Thank you for your inquiry regarding the lack of changes in Mean Corpuscular Hemoglobin (MCH) in our study, particularly in the context of stress erythropoiesis and decreased macrophage-mediated iron recycling. We appreciate the opportunity to provide further clarification on this aspect.

      (1) Explanation for stable MCH levels:

      • Our research identified a decrease in erythrophagocytosis and iron recycling in the spleen following HH exposure. Despite this, the MCH levels remained stable. This observation can be explained by considering the compensatory roles of other organs, particularly the liver and duodenum, in maintaining iron homeostasis.

      • Specifically, our investigations revealed an enhanced capacity of the liver to engulf RBCs and process iron under HH conditions. This increased hepatic erythrophagocytosis likely compensates for the reduced splenic activity, thereby stabilizing MCH levels.

      (2) Role of hepcidin and DMT1 expression:

      Additionally, hypoxia is known to influence iron metabolism through the downregulation of Hepcidin and upregulation of Divalent Metal Transporter 1 (DMT1) expression. These alterations lead to enhanced intestinal iron absorption and increased blood iron levels, further contributing to the maintenance of MCH levels despite reduced splenic iron recycling.

      (3) Revised Figure 1 and data presentation

      To address the confusion regarding the data presented in Figure 1G, we have made revisions in our manuscript. The original Figure 1G, which did not align with the expected trends, has been removed. In its place, we have included a statistical chart of Figure 1F in the new version of Figure 1G. This revision will provide a clearer and more accurate representation of our findings.

      (4) Manuscript updates and future research:

      • We update our manuscript to incorporate these explanations, ensuring that the rationale behind the stable MCH levels is clearly articulated. This includes a discussion on the role of the liver and duodenum in iron metabolism under hypoxic conditions.

      • Future research could explore in greater detail the mechanisms by which different organs contribute to iron homeostasis under stress conditions like HH, particularly focusing on the dynamic interplay between hepatic and splenic functions.

      We thank you for your insightful question, which has prompted a thorough re-examination of our findings and interpretations. We believe that these clarifications will enhance the overall understanding of our study and its implications in the context of iron metabolism and erythropoiesis under hypoxic conditions.

      (3) Fig 2 the difference between sham and splenectomy is really marginal and not convincing. Is there also a difference at 7 days? Why does the spleen size decrease between 7 and 14 days?

      Thank you for your observations regarding the marginal differences observed between sham and splenectomy groups in Figure 2, as well as your inquiries about spleen size dynamics over time. We appreciate this opportunity to clarify these aspects of our study.

      (1) Splenectomy vs. Sham group differences:

      • In our experiments, the difference between the sham and splenectomy groups under HH conditions, though subtle, was consistent with our hypothesis regarding the spleen's role in erythrophagocytosis and stress erythropoiesis. Under NN conditions, no significant difference was observed between these groups, which aligns with the expectation that the spleen's contribution is more pronounced under hypoxic stress.

      (2) Spleen size dynamics and peak stress erythropoiesis:

      • The observed splenic enlargement prior to 7 days can be attributed to a combination of factors, including the retention of RBCs and extramedullary hematopoiesis, which is known to be a response to hypoxic stress.

      • Prior research has elucidated that splenic stress-induced erythropoiesis, triggered by hypoxic conditions, typically attains its zenith within a timeframe of 3 to 7 days. This observation aligns with our Toluidine Blue (TO) staining results, which indicated that the apex of this response occurs at the 7-day mark (as depicted in Figure 1, F-G). Here, the culmination of this peak is characteristically succeeded by a diminution in extramedullary hematopoiesis, a phenomenon that could elucidate the observed contraction in spleen size, particularly in the interval between 7 and 14 days.

      • This pattern of splenic response under prolonged hypoxic stress is corroborated by studies such as those conducted by Wang et al. (2021), Harada et al. (2015), and Cenariu et al. (2021). These references collectively underscore that the spleen undergoes significant dynamism in reaction to sustained hypoxia. This dynamism is initially manifested as an enlargement of the spleen, attributable to escalated erythropoiesis and erythrophagocytosis. Subsequently, as these processes approach normalization, a regression in spleen size ensues.

      We’ve revised our manuscript to include a more detailed explanation of these splenic dynamics under HH conditions, referencing the relevant literature to provide a comprehensive context for our findings. We will also consider performing additional analysis or providing further data on spleen size changes at 7 days to support our observations and ensure a thorough understanding of the splenic response to hypoxic stress over time.

      (4) Fig 3 B the clusters should be explained in detail. If the decrease in macrophages in Fig 3K/L is responsible for the effect, why does splenectomy not have a much stronger effect? How do the authors know which cells died in the calcein stained population in Fig 3D?

      Thank you for your insightful questions regarding the details of our data presentation in Figure 3, particularly about the identification of cell clusters and the implications of macrophage reduction. We appreciate the opportunity to address these aspects and clarify our findings.

      (1) Explanation of cell clusters in Figure 3B:

      • In the revised manuscript, we have included detailed notes for each cell population represented in Figure 3B (Figure 3D in revised manuscript). These notes provide a clearer understanding of the cell types present in each cluster, enhancing the interpretability of our single-cell sequencing data.

      • This detailed annotation will help readers to better understand the composition of the splenic cell populations under study and how they are affected by hypoxic conditions.

      (2) Impact of splenectomy vs. macrophage reduction:

      • The interplay between the reduction in macrophage populations, as evidenced by our single-cell sequencing data, and the ramifications of splenectomy presents a multifaceted scenario. Notably, the observed decline in macrophage numbers following HH exposure does not straightforwardly equate to a comparable alteration in overall splenic function, as might be anticipated with splenectomy.

      • In the context of splenectomy under HH conditions, a significant escalation in the RBCs count was observed, surpassing that in non-splenectomized mice exposed to HH. This finding underscores the spleen's critical role in modulating RBCs dynamics under HH. It also indirectly suggests that the diminished phagocytic capacity of the spleen following HH exposure contributes to an augmented RBCs count, albeit to a lesser extent than in the splenectomy group. This difference is attributed to the fact that, while the number of RPMs in the spleen post-HH is reduced, they are still present, unlike in the case of splenectomy, where they are entirely absent.

      • Splenectomy entails the complete removal of the spleen, thus eliminating a broad spectrum of functions beyond erythrophagocytosis and iron recycling mediated by macrophages. The nuanced changes observed in our study may be reflective of the spleen's diverse functionalities and the organism's adaptive compensatory mechanisms in response to the loss of this organ.

      (3) Calcein stained population in Figure 3D:

      • Regarding the identification of cell death in the calcein-stained population in Figure 3D (Figure 3A in revised manuscript), we acknowledge that the specific cell types undergoing death could not be distinctly determined from this analysis alone.

      • The calcein staining method allows for the visualization of live (calcein-positive) and dead (calcein-negative) cells, but it does not provide specific information about the cell types. The decrease in macrophage population was inferred from the single-cell sequencing data, which offered a more precise identification of cell types.

      (4) Revised manuscript and data presentation:

      • Considering your feedback, we have revised our manuscript to provide a more comprehensive explanation of the data presented in Figure 3, including the nature of the cell clusters and the interpretation of the calcein staining results.

      • We have also updated the manuscript to reflect the removal of Figure 3K/L results and to provide a more focused discussion on the relevant findings.

      We are grateful for your detailed review, which has helped us to refine our data presentation and interpretation. These clarifications and revisions will enhance the clarity and scientific rigor of our manuscript, ensuring that our conclusions are well-supported and accurately conveyed.

      (5) Is the reduced phagocytic capacity in Fig 4B significant? Erythrophagocytosis is compromised due to the considerable spontaneous loss of labelled erythrocytes; could other assays help? (potentially by a modified Chromium release assay?). Is it necessary to stimulated phagocytosis to see a significant effect?

      Thank you for your inquiry regarding the significance of the reduced phagocytic capacity observed in Figure 4B, and the potential for employing alternative assays to elucidate erythrophagocytosis dynamics under HH conditions.

      (1) Significance of reduced phagocytic capacity:

      The observed reduction in the amplitude of fluorescently labeled RBCs in both the blood and spleen under HH conditions suggests a decrease in erythrophagocytosis. This is indicative of a diminished phagocytic capacity, particularly when contrasted with NN conditions.

      (2) Investigation of erythrophagocytosis dynamics:

      To delve deeper into erythrophagocytosis under HH, we employed Tuftsin to enhance this process. Following the injection of PKH67-labeled RBCs and subsequent HH exposure, we noted a significant decrease in PKH67 fluorescence in the spleen, particularly marked after the administration of Tuftsin. This finding implies that stimulated erythrophagocytosis can influence RBCs lifespan.

      (3) Erythrophagocytosis under normal and hypoxic conditions:

      Under normal conditions, the reduction in phagocytic activity is less apparent without stimulation. However, under HH conditions, our findings demonstrate a clear weakening of the phagocytic effect. While we established that promoting phagocytosis under NN conditions affects RBC lifespan, the impact of enhanced phagocytosis under HH on RBCs numbers was not explicitly investigated.

      (4) Potential for alternative assays:

      Considering the considerable spontaneous loss of labeled erythrocytes, alternative assays such as a modified Chromium release assay could provide further insights. Such assays might offer a more nuanced understanding of erythrophagocytosis efficiency and the stability of labeled RBCs under different conditions.

      (5) Future research directions:

      The implications of these results suggest that future studies should focus on comparing the effects of stimulated phagocytosis under both NN and HH conditions. This would offer a clearer picture of the impact of hypoxia on the phagocytic capacity of macrophages and the subsequent effects on RBC turnover.

      In summary, our findings indicate a diminished erythrophagocytic capacity, with enhanced phagocytosis affecting RBCs lifespan. Further investigation, potentially using alternative assays, would be beneficial to comprehensively understand the dynamics of erythrophagocytosis in different physiological states.

      (6) Can the observed ferroptosis be influenced by bi- and not trivalent iron chelators?

      Thank you for your question regarding the potential influence of bi- and trivalent iron chelators on ferroptosis under hypoxic conditions. We appreciate the opportunity to discuss the implications of our findings in this context.

      (1) Analysis of iron chelators on ferroptosis:

      In our study, we did not specifically analyze the effects of bi- and trivalent iron chelators on ferroptosis under hypoxia. However, our observations with Deferoxamine (DFO), a well-known iron chelator, provide some insights into how iron chelation may influence ferroptosis in splenic macrophages under hypoxic conditions.

      (2) Effect of DFO on oxidative stress markers:

      Our findings showed that under 1% O2, there was an increase in Malondialdehyde (MDA) content, a marker of lipid peroxidation, and a decrease in Glutathione (GSH) content, indicative of oxidative stress. These changes are consistent with the induction of ferroptosis, which is characterized by increased lipid peroxidation and depletion of antioxidants. Treatment with Ferrostatin-1 (Fer-1) and DFO effectively reversed these alterations. This suggests that DFO, like Fer-1, can mitigate ferroptosis in splenic macrophages under hypoxia, primarily by impacting MDA and GSH levels.

      Author response image 17.

      (3) Potential role of iron chelators in ferroptosis:

      The effectiveness of DFO in reducing markers of ferroptosis indicates that iron availability plays a crucial role in the ferroptotic process under hypoxic conditions. It is plausible that both bi- and trivalent iron chelators could influence ferroptosis, given their ability to modulate iron availability within cells. Since ferroptosis is an iron-dependent form of cell death, chelating iron, irrespective of its valence state, could potentially disrupt the process by limiting the iron necessary for the generation of reactive oxygen species and lipid peroxidation.

      (4) Additional research and manuscript updates:

      Our study highlights the need for further research to explore the differential effects of various iron chelators on ferroptosis, particularly under hypoxic conditions. Such studies could provide a more comprehensive understanding of the role of iron in ferroptosis and the potential therapeutic applications of iron chelators. We update our manuscript to include these findings and discuss the potential implications of iron chelation in the context of ferroptosis under hypoxic conditions. This will provide a broader perspective on our research and its significance in understanding the mechanisms of ferroptosis.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study provides insights into the IDA peptide with dual functions in development and immunity. The approach used is solid and helps to define the role of IDA in a two-step process, cell separation followed by activation of innate defenses. The main limitation of the study is the lack of direct evidence linking signaling by IDA and its HAE receptors to immunity. As such the work remains descriptive but it will nevertheless be of interest to a wide range of plant cell biologists.

      We thank the reviewers for thoroughly reading our manuscript. We have used their comments and suggestions- to improve the manuscript. Below is a response to the reviewer's comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      The paper titled 'A dual function of the IDA peptide in regulating cell separation and modulating plant immunity at the molecular level' by Olsson Lalun et al., 2023 aims to understand how IDAHAE/HSL2 signalling modulates immunity, a pathway that has previously been implicated in development. This is a timely question to address as conflicting reports exist within the field. IDL6/7 have previously been shown to negatively regulate immune signalling, disease resistance and stress responses in leaf tissue, however IDA has been shown to positively regulate immunity through the shedding of infected tissues. Moreover, recently the related receptor NUT/HSL3 has been shown to positively regulate immune signalling and disease resistance. This work has the potential to bring clarity to this field, however the manuscript requires some additional work to address these questions. This is especially the case as it contracts some previous work with IDL peptides which are perceived by the same receptor complexes.

      Can IDA induce pathogen resistance? Does the infiltration of IDA into leaf tissue enhance or reduce pathogen growth? Previously it has been shown that IDL6 makes plants more susceptible. Is this also true for IDA? Currently cytoplasmic calcium influx and apoplastic ROS as overinterpreted as immune responses - these can also be induced by many developmental cue e.g. CLE40 induced calcium transients. Whilst gene expression is more specific is also true that treatment with synthetic peptides, which are recognised by LRR-RKs, can induce immune gene expression, especially in the short term, even when that is not there in vivo function e.g. doi.org/10.15252/embj.2019103894.

      We thank the reviewer for the concerns raised and agree that further experiments including pathogen assays would strengthen the link between IDA signaling and immunity and we plan for such experiments in future work. We have however, modified the discussion to include the possible role of IDA induced Ca2+ and ROS during development. We have recently published a preprint (accepted for publication in JXB) ( (Galindo-Trigo et al., 2023, https://doi.org/10.1101/2023.09.12.557497)) strengthening the link between IDA and defense by identifying WRKY transcription factors that regulate IDA expression through a Y1H assay.

      This paper shows that receptors other than hae/hsl2 are genetically required to induce defense gene expression, it would have been interesting to see what phenotype would be associated with higher order mutants of closely related haesa/haesa-like receptors. Indeed recently HSL1 has been shown to function as a receptor for IDA/IDL peptides. Could the triple mutant suppress all response? Could the different receptors have distinct outputs? For example for FRK1 gene expression the hae hsl2 mutant has an enhanced response. Could defence gene expression be primarily mediated by HSL1 with subfunctionalisation within this clade?

      We agree that it would be interesting to also include HSL1 in our studies. However, the focus of this study has been on HAE and HSL2 and we wanted to explore their role in IDA induced defense responses. Including HSL1 in these studies will require generation of multiple transgenic lines and repeating most of the experiments and are experiments we will consider in a follow up study together with pathogen assays (that would also address the main concern raised in the comment above). We have however, modified the text to include the known function of HSL1 and discuss the possibility of subfunctionalisation of this receptor clade.

      One striking finding of the study is the strong additive interaction between IDA and flg22 treatment on gene expression. Do the authors also see this for co-treatment of different peptides with flg22, or is this unique function of IDA? Is this receptor dependent (HAE/HSL1/HSL2)?

      This is a good question. Since our study focuses on the IDA signaling pathway we preferentially tested if the additive effect observed between flg22 and mIDA was also observed when mIDA was combined with another peptide involved in defense. The endogenous peptide PIP1, has previously been shown to amplify flg22 signaling (Hou et al 2014, doi:10.1371/journal.ppat.1004331 ). In this study it is shown that co-treatment with flg22 and PIP1 gives increased resistance to Pseudomonas PstDC3000 compared to when plants are treated with each peptide separately. In the same study, the authors also show reduced flg22 induce transcriptional activity of two defense related genes WRKY33 and PR in the receptor like kinase7 (rlk7) mutant (the receptor perceiving PIP1) (). To investigate whether PIP1 would give the same additive effect with mIDA as that observed between flg22 and mIDA, we co-treated seedlings with PIP1 and mIDA. We observed no enhanced transcriptional activity of FRK1, MYB51 and PEP3 in tissue from plants treated with both PIP1 and mIDA peptides compared to single exposure. These results are presented in supplementary figure 11. In conclusion we do not think mIDA acts as a general amplifier of all immune elicitors in plants.

      It is interesting how tissue specific calcium responses are in response to IDA and flg22, suggesting the cellular distribution of their cognate receptors. However, one striking observation made by the authors as well, is that the expression of promoter seems to be broader than the calcium response. Indicating that additional factors are required for the observed calcium response. Could diffusion of the peptide be a contributing factor, or are only some cells competent to induce a calcium response?

      It is interesting that the authors look for floral abscission phenotypes in cngc and rbohd/f mutants to conclude for genetic requirement of these in floral abscission. Do the authors have a hypothesis for why they failed to see a phenotype for the rbohd/f mutant as was published previously? Do you think there might be additional players redundantly mediating these processes?

      It is a possibility that diffusion of the peptide plays a role in the observed response. In a biological context we would assume that the local production of the peptides plays an important role in the cellular responses. In our experimental setup, we add the peptide externally and we can therefore assume that the overlaying cells get in contact with the peptide before cells in the inner tissues and this could be affecting the response recorded However, our results show that there is a differences between flg22 and mIDA induced responses even when the application of the peptides is performed in the same manner, indicating that the difference in the response is not primarily due to the diffusion rate of the peptides but is likely due to different factors being present in different cells. To acquire a better picture of the distribution of receptor expression in the root tissue and to investigate in which cells the receptors have an overlapping expression pattern, we have included results in figure 6 showing plant lines co-expressing transcriptional reporters of FLS2 and HAE or HSL2.

      Can you observe callose deposition in the cotyledons of the 35S::HAE line? Are the receptors expressed in native cotyledons? This is the only phenotype tested in the cotyledons.

      We thank the reviewer for this valuable comment. We have now conducted callose deposition assay on the 35S:HAE line. And Indeed, we observe callose depositions when cotyledons from a 35S:HAE line is treated with mIDA. We have included these results in figure 4 and have adjusted the text regarding the callose assay accordingly. In addition, we have analyzed the promoter activity of pHAE in cotelydons and we observe weak promoter activity. These results are included as supplementary figure 1d.

      Are flg22-induced calcium responses affected in hae hsl2?

      The experiment suggested by the reviewer is an important control to ensure that the hae hsl2-Aeq line can respond to a Ca2+ inducing peptide signaling through a different receptor than HAE or HSL2. One would expect to see a Ca2+ response in this line to the flg22 peptide. We performed this experiment and surprisingly we could not detect a flgg22 induced Ca2+ signal in the hae hsl2 mutnt. As it is unlikely that the Ca2+ response triggered by flg22 is dependent on HAE and HSL2 we have to assume that the lack of response is due to a malfunction of the Aeq sensor in this line. As a control to measure the amount of Aeq present in the cells we treat the Aeq seedlings with 2 M CaCl2 and measure the luminescence constantly for 180 seconds (Ranf et al., 2012, DOI10.1093/mp/ssr064). The CaCl2 treatment disrupts the cells and releases the Aeq sensor into the solution where it will react with Ca2+ and release the total possible response in the sample (Lmax) in form of a luminescent peak. When treating the hae hsl2-Aeq line with CaCl2we observe a luminescent peak, indicating the presence of the sensor, however, the response is reduced compared to WT seedlings expressing Aeq. Given the sensitivity of FLS2 to flg22 one would still expect to see a Ca2+ peak in the hae hsl2-Aeq line even if the amount of sensor is reduced. Given that this is not the case, we have to assume that localization or conformation of the sensor is somehow affected in this line or that there is another biological explanation that we cannot explain at the moment.

      We have therefore opted on omitting the results using the hae hsl2 Aeq lines from the manuscript and are in the process of mutating HAE and HSL2 by CRISPR-Cas9 in the Aeq background to verify that the mIDA triggered Ca2+ response is dependent on HAE and HSL2.

      Reviewer #2 (Public Review):

      Lalun and co-authors investigate the signalling outputs triggered by the perception of IDA, a plant peptide regulating organs abscission. The authors observed that IDA perception leads to a transient influx of Ca2+, to the production of reactive oxygen species in the apoplast, and to an increase accumulation of transcripts which are also responsive to an immunogenic epitope of bacterial flagellin, flg22. The authors show that IDA is transcriptionally upregulated in response to several biotic and abiotic stimuli. Finally, based on the similarities in the molecular responses triggered by IDA and elicitors (such as flg22) the authors proposed that IDA has a dual function in modulating abscission and immunity. The manuscript is rather descriptive and provide little information regarding IDA signalling per se. A potential functional link between IDA signalling and immune signalling remains speculative.

      We thank the reviewer for the concerns raised and agree that further experiments including pathogen assays would strengthen the link between IDA signaling and immunity and plan for such experiments in future work.

      Reviewer #3 (Public Review):

      Previously, it has been shown the essential role of IDA peptide and HAESA receptor families in driving various cell separation processes such as abscission of flowers as a natural developmental process, of leaves as a defense mechanism when plants are under pathogenic attack or at the lateral root emergence and root tip cell sloughing. In this work, Olsson et al. show for the first time the possible role of IDA peptide in triggering plant innate immunity after the cell separation process occurred. Such an event has been previously proposed to take place in order to seal open remaining tissue after cell separation to avoid creating an entry point for opportunistic pathogens.

      The elegant experiments in this work demonstrate that IDA peptide is triggering the defenseassociated marker genes together with immune specific responses including release of ROS and intracellular CA2+. Thus, the work highlights an intriguing direct link between endogenous cell wall remodeling and plant immunity. Moreover, the upregulation of IDA in response to abiotic and especially biotic stimuli are providing a valuable indication for potential involvement of HAE/IDA signalling in other processes than plant development.

      We are pleased that the reviewer finds our findings linking IDA to defense interesting and would like to thank the reviewer for this positive feedback.

      Strengths:

      The various methods and different approaches chosen by the authors consolidates the additional new role for a hormone-peptide such as IDA. The involvement of IDA in triggering of the immunity complex process represents a further step in understanding what happens after cell separation occurs. The Ca2+ and ROS imaging and measurements together with using the haehsl2 and haehsl2 p35S::HAE-YFP genotypes provide a robust quantification of defense responses activation. While Ca2+ and ROS can be detected after applying the IDA treatment after the occurrence of cell separation it is adequately shown that the enzymes responsible for ROS production, RBOHD and RBOHF, are not implicated in the floral abscission.

      Furthermore, IDA production is triggered by biotic and abiotic factors such as flg22, a bacterial elicitor, fungi, mannitol or salt, while the mature IDA is activating the production of FRK1, MYB51 and PEP3, genes known for being part of plant defense process.

      Thank you.

      Weaknesses:

      Even though there is shown a clear involvement of IDA in activating the after-cell separation immune system, the use of p35S:HAE-YFP line represent a weak point in the scientific demonstration. The mentioned line is driving the HAE receptor by a constitutive promoter, capable of loading the plant with HAE protein without discriminating on a specific tissue. Since it is known that IDA family consist of more members distributed in various tissues, it is very difficult to fully differentiate the effects of HAE present ubiquitously.

      We agree on this statement. Nevertheless, it is important to note that the responses we have observed are not detectable in WT plants that do not (over)express the HAE receptors. Suggesting that the ROS and callose deposition are induced by the addition of mIDA peptide and not the potential presence of the endogenous IDL peptides.

      The co-localization of HAE/HSL2 and FLS2 receptors is a valuable point to address since in the present work, the marker lines presented do not get activated in the same cell types of the root tissues which renders the idea of nanodomains co-localization (as hypothetically written in the discussion) rather unlikely.

      Thank you for raising an important aspect of our study. It is true that not all cells in the root which have promoter activity for FLS2 also exhibit promoter activity for either HAE or HSL2. However, we have observed that certain cells in the roots show promoter activity for both receptors. In the revised version of the manuscript, we have included plants expression a transcriptional promoter for both FLS2 and HAE or HSL2 using different fluorescent proteins. We have investigated overlapping promoter activity both at sites of lateral roots, in the tip of the primary root and in the abscission zone. Our results show overlapping expression of the transcriptional reporters in certain cells, indicating that FLS2 and HAE or HSL2 are likely to be found in some of the same cells during plant development. We also observe cells where only one or none of the promoters are active.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Supplementary Figure 3: re-labelling of y axis; 200 than 200,00 for clarity.

      This has been addressed.

      Supplementary Figure 2: It would be good to include the age of the seedlings used to study calcium influx in the legend.

      This has been addressed.

      Supplementary Figure 1: rephrase 'IDA induces ROS production in Arabidopsis'.

      This has been addressed.

      The use of chelating agents to establish the need of calcium from extracellular space is a clear experiment supporting the calcium response phenotype specific to IDA treatment in seedlings. Removing the last asparagine (N) and using it as a peptide that fails to elicit calcium response could simply be because of the peptide is smaller in length or different chemical properties. Therefore, a scrambled sequence would have been a better control.

      We thank the reviewer for the suggestion of using a scrambled peptide as a negative control, however we find it unlikely that mIDA∆N69 could induce any activity based on previous work. Results from crystal structure of mIDA bound to the HAE receptor and ligand-receptor interaction studies (10.7554/eLife.15075 ) show that the last asparagine in the mIDA peptide is essential for detectable binding to the HAE receptor and that a peptide lacking this amino acid does not have any activity. We will however, in future experiments also include a scrambled version of the peptide as an additional control.

      Reviewer #2 (Recommendations For The Authors):

      Please find below specific comments:

      (1) Most of the molecular outputs triggered by IDA can be considered as common molecular marks of plant peptides signalling, they do not represent strong evidences of a potential function of IDA in modulating immunity. For instance, perception of CIF peptides, which control the establishment of the Casparian strips, regulate the production of reactive oxygen species, and the transcription of genes associated with immune responses (Fujita et al., The EMBO Journal 2020). It should also be considered that FRK1, whose function remains unknown, may be involved in both immunity and abscission and that the upregulation of FRK1 upon IDA treatment is not indicative of active modulation of immune signalling by IDA.

      This is a fair point raised by the reviewer and we now address in the manuscript that ROS and Ca2+ are hallmarks of both plant development and defense. The function of FRK1 is not known however, it is unlikely that the upregulation of FRK1 in response to mIDA plays a role in the developmental progression of abscission as it is not temporally regulated during the abscission process, thus making it an unlikely candidate in the regulation of cell separation (Cai & Lashbrook, 2008, https://doi.org/10.1104/pp.107.110908). We do however agree that further experiments including pathogen assays would strengthen the link between IDA signaling and immunity and plan for such experiments in future work.

      (2) It remains unknown whether IDA modulate immunity. For instance, does IDA perception promote resistance to bacteria (bacterial proliferation, disease symptoms)? Is IDA genetically required for plant disease resistance immunity? Is the IDA signalling pathway genetically required for transcriptional changes induced by flg22, such as increase in FRK1 transcripts? In addition, the authors propose that the proposed function of IDA in modulating immune signalling prevents bacterial infection in tissue exposed to stress(es). Does loss of function of IDA or of its corresponding receptors leads to changes in the ability of bacteria to colonise plant root upon stress(es)?

      Please see the comment above regarding pathogen assays.

      (3) Several aspects of the work appear to correspond to preliminary investigation. For instance, the authors analyse loss of function mutant for genes encoding for Ca2+ permeable channels (CNGCs) which are transcriptionally active during the onset of abscission (Sup. Figure 5). None of the single mutants present an abscission defect. These observations provide no information regarding the identity of the channel(s) involved in IDA-induced calcium influx.

      We agree with the reviewer that we have not been able to identify the channels responsible for the IDA-induced calcium influx. Given the redundancy for many of the members of this multigenic family a future approach to identify proteins responsible for the IDA triggered calcium response could be to create multiple KO mutants by CRISPR Cas9.

      (4) Using H2DCF-DA, the authors observed a decrease in ROS accumulation in the abscission zone of rbohd/rbohf double KO line (Sup Figure 5c) but describe in the text that ROS production in this zone does not depend on RBOHD and RBOHF (L220). Please clarify.

      This has now been clarified in the text.

      (5) The authors describe that rbohd/rbohf double KO present a lower petal break-strength, which they describe as an indication of premature cell wall loosening, and that petals of rbohd/rbohf abscised one position earlier than in WT. Yet, the authors postulate that IDA-induced ROS production does not regulate abscission but may regulate additional responses. Instead the data seems to indicate that ROS production by RBOHD and RBOHF regulate the timing of abscission. In addition, it would have been interesting to test whether IDA signalling pathway regulate ROS production in the abscission zone.

      The rbohd and rbohf double mutants show several phenotypes associated to developmental stress, the mild phenotype observed with regards to premature abscission (by one position) could be caused by the phenotype of the double mutant rather than related to ROS production. Indeed, it has been suggested that the lignified brace in the AZ dependent on ROS production by the aforementioned RBOHs in necessary for the correct concentration of cell modifying enzymes (Lee et al., 2018, https://doi.org/10.1016/j.cell.2018.03.060). The precocious abscission in this double mutant clearly shows this not to be the case. We have tried to do a ROS burst assay on AZ tissue/flowers with the mIDA peptide but have not been successful with this approach. A ROS sensor expressed in AZ tissue would be a valuable tool to address whether IDA signalling regulates ROS production in AZs.

      (6) In Sup. Figure5a, it would be of interest to have a direct comparison of the transcript accumulation of the presented CNGCs and RBOHDs with other of these multigenic families.

      The CNGCs and RBOH gene expression profile shown in the figure are the family members expressed during the developmental progress of floral abscission in stamen AZs. Since there is no difference in the temporal expression of the other family members (and most are either not expressed or very weakly expressed in this tissue) it is not possible to do this comparison (Cai & Lashbrook, 2008, https://doi.org/10.1104/pp.107.110908).

      (7) L251-253, since IDAdeltaN69 cannot be perceived by its receptors, the absence of induction of pIDA::GUS by IDAdeltaN69 compared to flg22 cannot be seen as a sign of specificity in peptideinduced increase in IDA promotor activity.

      We have rephased this in the text

      (8) Please provide quantitative and statistical analysis of the calcium measurement presented in sup figure 3.

      This has been addressed.

      (9) L339-341; This sentence is unclear to me, please rephrase.

      We have rephased this in the text

      Reviewer #3 (Recommendations For The Authors):

      (1) In order to assess the role of CNGCs in abscission process, it would be more interesting to see the effect on the Ca2+ pattern and ROS signaling after application of mIDA on cngc and rbohf rbohd mutants.

      We agree in this statement and the studies on mIDA induced ROS and Ca2+ on these mutants will provide valuable information to the regulation of the response. We are in the process of making the lines needed to be able to perform these experiments. However, since it requires crossing of genetically encoded sensors into each mutant, and generation of higher order mutants this is a long process.

      (2) With regard to the ROS production (Sup Fig. 1), the application of mIDA can trigger ROS in p35S::HAE:YFP lines, but not in the wild-type plant, which is according to the text "most likely due to the absence of HAE expression" in leaves. The experiment on callose deposition is performed in wild-type cotyledons where no callose deposition could be observed after mIDA treatment (Fig. 4a,b). The conclusion from text is that IDA "is not involved in promoting deposition of callose as a long-term defence response". It appears more likely that neither ROS nor callose can be observed in wild-type plants due to the lack of HAE expression. Therefore, the callose experiment should include the p35S::HAE:YFP lines. The experiment as it is does not allow to draw any conclusion on HAE/IDA involvement in callose formation.

      We fully agree with this comment, thank you for pinpointing this out. We have now performed the callose experiment with the 35S:HAE lines. Please see our answer to reviewer #1.

      (3) Between Sup Fig. 3 and Sup Fig. 5 two different systems were used to asses the floral stage. An adjustment of the floral stages would be easier to convey the levels of HAE/HSL2 expression and hence potentially with the onset of cell-wall degradation.

      We now used the same system to assess floral stages throughout the whole manuscript.

      (4) For the Fig. 1 and 2, it will be helpful to mention the genotype used for imaging/quantification of Ca2+.

      This has been addressed.

      (5) Some of the abbreviations are not introduced as full-text at their first time use in the text, such as: mIDA (Line 68), Ef-Tu (line 85), NADPH (line 77).

      The abbreviations have now been introduced.

      (6) In the legend of Fig. 5 (lines 897 and 898)- in the figure description, the box plots are identified as light gray and dark gray, while in the panel a of the figure the box plots are colored in red and blue.

      Thank you for pointing this out, this has now been corrected.

      (7) In figure 1 and 2. the authors write that the number of replicates is 10 (n=10) but data represents a single analysis. Please provide the quantitative ROI analysis, demonstrating that the observed example is representative. This is particularly important since the authors claim very specific changes in pattern of Ca signaling between mIDA and FLG22 treatments (Line 148).

      (8) Figure 4: please use alternative scaling on the Y axis instead of breaks.

      This has now been fixed.

      (9) Figure 5: it is not clear what n=4 refers to when the authors state three independent replicates. In figure 6 they state 4 technical reps and 3 biological reps. Please ensure this is similar across all descriptions.

      We have now ensured the correct information in all descriptions.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable findings on Legionella pneumophila effector proteins that target host vesicle trafficking GTPases during infection and more specifically modulate ubiquitination of the host GTPase Rab10. The evidence supporting the claims of the authors is solid, although it remains unclear how modification of the GTPase Rab10 with ubiquitin supports Legionella virulence and the impact of ubiquitination during LCV formation. The work will be of interest to colleagues studying animal pathogens as well as cell biologists in general.

      We greatly appreciate the positive and valuable feedback from the editors and the reviewers. According to their suggestions, we added many new experimental data and implications of our findings in Legionella virulence in terms of the biological process of its replication niche. Please find our point-to-point responses below.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, Kubori and colleagues characterized the manipulation of the host cell GTPase Rab10 by several Legionella effector proteins, specifically members of the SidE and SidC family. They show that Rab10 undergoes both conventional ubiquitination and noncanonical phosphoribose-ubiquitination, and that this posttranslational modification contributes to the retention of Rab10 around Legionella vacuoles.

      Strengths

      Legionella is an emerging pathogen of increasing importance, and dissecting its virulence mechanisms allows us to better prevent and treat infections with this organism. How Legionella and related pathogens exploit the function of host cell vesicle transport GTPases of the Rab family is a topic of great interest to the microbial pathogenesis field. This manuscript investigates the molecular processes underlying Rab10 GTPase manipulation by several Legionella effector proteins, most notably members of the SidE and SidC families. The finding that MavC conjugates ubiquitin to SdcB to regulate its function is novel, and sheds further light into the complex network of ubiquitin-related effectors from Lp. The manuscript is well written, and the experiments were performed carefully and examined meticulously.

      Weaknesses

      Unfortunately, in its current form this manuscript offers only little additional insight into the role of effector-mediated ubiquitination during Lp infection beyond what has already been published. The enzymatic activities of the SidC and SidE family members were already known prior to this study, as was the importance of Rab10 for optimal Lp virulence. Likewise, it had previously been shown that SidE and SidC family members ubiquitinate various host Rab GTPases, like Rab33 and Rab1. The main contribution of this study is to show that Rab10 is also a substrate of the SidE and SidC family of effectors. What remains unclear is if Rab10 is indeed the main biological target of SdcB (not just 'a' target), and how exactly Rab10 modification with ubiquitin benefits Lp infection.

      Reviewer #1 (Recommendations for The Authors):

      Major points of concern

      (1) The authors show that SdcB increases Rab10 levels on LCVs at later times of infection and conclude that this is its main biological role. An alternative explanation may be that Rab10 is not 'the main' target of SdcB but merely 'a' target, which may explain why the effect of SdcB on Rab10 accumulation on LCV is only detectable after several hours of infection. An unbiased omics-based approach to identify the actual host target(s) of SdcB may be needed to confirm that Rab10 modification by SdcB is biologically relevant.

      We totally agree with your comment that SdcB should have multiple targets considering the abundance of ubiquitin observed on the LCVs when SdcB was expressed (Figure 3). However, the effect of SdcB on Rab10 accumulation at the later time point (7 h) (current Figure 4e) was well supported by the new data showing that the SdcB-mediated ubiquitin conjugation to Rab10 was highly detected at this time point (new Figure 4c). We have tried the comprehensive search of interaction partners of the ANK domain of SdcB. This analysis is planned to be included in our on-going study. We therefore decided not to add the data in this manuscript.

      (2) The authors show that Rab10 within cell lysate is ubiquitinated and conclude that ubiquitination of Rab10 is directly responsible for its retention on the LCV. What is the underlying molecular mechanism for this retention? Are GAP proteins prevented from binding and deactivating Rab10. This may be worth testing.

      It would be a fantastic hypothesis that a Rab10GAP is involved in the regulation of Rab10 localization on the LCV. However, as far as we know, GAP proteins against Rab10 have not been identified yet. It should be an important issue to be addressed when a Rab10GAP will be found.

      (3) Related to this, an alternative explanation would be that Rab10 retention is an indirect effect where inactivators of Rab10, such as host cell GAP proteins, are the main target of SidE/C family members and sent for degradation (see point #1). Can the authors show that Rab10 on the LCV is indeed ubiquitinated?

      The possible involvement of a putative Rab10GAP is currently untestable as it is not known. To address whether Rab10 located on the LCV is ubiquitinated nor not, we conducted the critical experiments using active Rab10 (QL) and inactive Rab10 (TN) (new Figure 4a, new Figure 4-figure supplement 1). As revealed for Rab1 (Murata et al., Nature Cell Biol. 2006; Ingmundson et al., Nature 2007), Rab10 is expected to be recruited to the LCV as a GDPbound inactive form and converted to a GTP-bound active form on the LCV. The new results clearly demonstrated that GTP-locked Rab10QL is preferentially ubiquitinated upon infection, strongly supporting the model; Rab10 is ubiquitinated “on the LCV” by the SidE and SidC family ligases.

      (4) Also, on what residue(s) is Rab10 ubiquitinated? Jeng et. al. (Cell Host Microbe, 2019, 26(4): 551-563)) suggested that K102, K136, and K154 of Rab10 are modified during Lp infection. How does substituting those residues affect the residency of Rab10 on LCVs? Addressing these questions may ultimately help to uncover if the growth defect of a sidE gene cluster deletion strain is due to its inability to ubiquitinate and retain Rab10 on the LCV.

      Thank you for the suggestion. We conducted mutagenesis of the three Lys residues of Rab10 and applied the derivative on the ubiquitination analysis (new Figure 1-figure supplement 1). The Lys substitution to Ala residues did not abrogate the ubiquitination upon Lp infection. This result indicates that ubiquitination sites are present in the other residue(s) including the PR-ubiquitination site(s), raising possibility that disruption of sidE genes would be detrimental for intracellular growth of L. pneumophila because of failure of Rab10 retention.

      (5) The authors proposed that "the SidE family primarily contributes towards ubiquitination of Rab10". In this case, what is the significance of SdcB-mediated ubiquitination of Rab10 during Lp infection?

      We found that the major contribution of SdcB is retention of Rab10 until the late stage of infection. This claim was supported by our new data (new Figure 4c) as mentioned above (response to comment #1).

      (6) The contribution of SdcB to ubiquitination of Rab10 relative to SidC and SdcA is unclear. SidC is shown to be unaffected by MavC. In this case, SidC can ubiquitinate Rab10 regardless of the regulatory mechanism of SdcB by MavC. This is not further being examined or discussed in the manuscript.

      The effect of intrinsic MavC is apparent at the later stage (9 h) of infection (Figure 7c) when SdcB gains its activity (see above). We therefore do not think that the contribution of MavC on the SidC/SdcA activities, which are effective in the early stage, would impact on Rab10 localization. However, without specific experiments addressing this issue, possible MavC effects on SidC/SdcA would be beyond the scope in this manuscript.

      (7) When is Rab10 required during Lp infection? The authors showed that Rab10 levels at LCV are rather stable from 1hr to 7hr post infection. If MavC regulates the activity of SdcB, when does this occur?

      While the Rab10 levels on the LCV (~40 %) are stable during 1-7 h post infection (Figure 2b), it reduced to ~20% at 9 h after infection (Figure 7c) (the description was added in lines 304-306). Rab10 seems to be required for optimal LCV biogenesis over the early to late stages, but may not be required at the maturation stage (9 h). We validated the effect of MavC on the Rab10 localization at this time point (Figure 7c). These observations allowed us to build the scheme described in Figure 7d. We revised the illustration in new Figure 7d according to the helpful suggestions from both the reviewers.

      (8) Previous analyses by MS showed that ubiquitination of Rab10 in Lp-infected cells decreases over time (from 1 hpi to 8 hpi - Cell Host Microbe, 2019, 26(4): 551-563). How does this align with the findings made here that Rab10 levels on the LCV and likely its ubiquitination levels increase over time?

      We carefully compared the Rab10 ubiquitination at 1 h and 7 h after infection (new Figure 1figure supplement 1b). This analysis showed that the level of its ubiquitination decreased over time in agreement with the previous report. Nevertheless, Rab10 was still significantly ubiquitinated at 7 h, which we believe to cause the sustained retention of Rab10 on the LCV at this time point. We added the observation in lines 146-148.

      (9) Polyubiquitination of Rab10 was not detected in cells ectopically producing SdcB and SdeA lacking its DUB domain (Figure 7 - figure supplement 2). Does SdcB actually ubiquitinate Rab10 (see also point #5)? Along the same line, it is curious to find that the ubiquitination pattern of Rab10 is not different for LpΔsidC/ΔsdcA compared to LpΔsidC/dsdcA/dsdcB (Figure 1C). The actual contribution of SdcB to ubiquitinating Rab10 compared to SidC/SdcA thus needs to be clarified.

      Thank you for the important point. We currently hypothesize that SidC/SdcA/SdcB-mediated ubiquitin conjugation can occur only in the presence of PR-ubiquitin on Rab10 (either directly on the PR-ubiquitin or on other residue(s) of Rab10). Failure to detect the polyubiquitination in the transfection condition (Figure 7-figure supplement 2) suggests that this specific ubiquitin conjugation can occur in the restricted condition, i.e. only “on the LCV”. We added this description in the discussion section (lines 334-335). No difference between the ΔsidCΔsdcA and ΔsidCΔsdcAΔsdcB strains (Figure 1C, 1h infection) can be explained by the result that SdcB gains activity at the later stages (see above).

      Minor comments In Figure 4b and 7b, the authors show a quantification of "Rab10-positive LCVs/SdcBpositive LCVs". Whys this distinction? It begs the question what the percentile of Rab10positive/SdcB-negative LCVs might be?

      We took this way of quantification as we just wanted to see the effect of SdcB on the Rab10 localization. To distinguish between SdcB-positive and negative LCVs, we would need to rely on the blue color signals of DAPI to visualize internal bacteria, which we thought to be technically difficult in this specific analysis.

      The band of FLAG-tagged SdcB was not detected by immunoblot using anti-FLAG antibody (Figure 5). The authors hypothesized that "disappearance of the SdcB band can be caused by auto-ubiquitination, as SdcB has an ability to catalyze auto-ubiquitination with a diverse repertoire of E2 enzymes. This can be easily confirmed by using MG-132 to inhibit proteasomal degradation of polyubiquitinated substrates.

      We conducted the experiment using MG-132 as suggested and found that proteasomal degradation is not the cause of the disappearance of the band (new Figure 5-figure supplement 2, added description in lines 228-233). SdcB is actually not degraded. Instead, its polyubiquitination causes its apparent loss by distributing the SdcB bands in the gel.

      In Figure 5F, the authors mentioned that "HA-UbAA did not conjugate to SdcB", whereas "shifted band detected by FLAG probing plausibly represents conjugation of cellular intrinsic Ub". The same argument was made in Figure 6B. These claims should be confirmed by immunoblot using anti-Ub antibody.

      Thank you. We added the data using anti-Ub antibody (P4D1) (Figure 6f, new third panel).

      Figure 7A: In cell producing MavC, SdcB is clearly present on LCV. However, in Figure 5A, SdcB was not detected by immunoblot in cells ectopically expressing MavC-C74A. What is the interpretation for these results?

      SdcB was not degraded in the cells, but just its apparent molecular weight shift occurred by polyubiquitination (see above). The detection of SdcB in the IF images (Figure 7a) supported this claim.

      Reviewer #2 (Public Review):

      This manuscript explores the interplay between Legionella Dot/Icm effectors that modulate ubiquitination of the host GTPase Rab10. Rab10 undergoes phosphoribosyl-ubiquitination (PR-Ub) by the SidE family of effectors which is required for its recruitment to the Legionella containing vacuole (LCV). Through a series of elegant experiments using effector gene knockouts, co-transfection studies and careful biochemistry, Kubori et al further demonstrate that:

      (1) The SidC family member SdcB contributes to the polyubiquitination (poly-Ub) of Rab10 and its retention at the LCV membrane.

      (2) The transglutaminase effector, MavC acts as an inhibitor of SdcB by crosslinking ubiquitin at Gln41 to lysine residues in SdcB.

      Some further comments and questions are provided below.

      (1) From the data in Figure 1, it appears that the PR-Ub of Rab10 precedes and in fact is a prerequisite for poly-Ub of Rab10. The authors imply this but there's no explicit statement but isn't this the case?

      Yes, we think that it is the case. We revised the description in the text accordingly (lines 326327).

      (2) The complex interplay of Legionella effectors and their meta-effectors targeting a single host protein (as shown previously for Rab1) suggests the timing and duration of Rab10 activity on the LCV is tightly regulated. How does the association of Rab10 with the LCV early during infection and then its loss from the LCV at later time points impact LCV biogenesis or stability? This could be clearer in the manuscript and the summary figure does not illustrate this aspect.

      Thank you for pointing the important issue. Association of Rab10 with the LCV is thought to be beneficial for L. pneumophila as it is the identified factor which supports bacterial growth in cells (Jeng et al., 2019). We speculate that its loss from the LCV at the later stage of infection would also be beneficial, since the LCV may need to move on to the maturation stage in which a different membrane-fusion process may proceed. As this is too speculative, we gave a simple modification on the part of discussion section (lines 356-358). We also modified the summary figure (revised Figure 7d) as illustrated with the time course.

      (3) How do the activities of the SidE and SidC effectors influence the amount of active Rab10 on the LCV (not just its localisation and ubiquitination)

      We agree that it is an important point. We tested the active Rab10 (QL) and inactive Rab10 (TN) for their ubiquitination and LCV-localization profiles (new Figure 4ab, new Figure 4figure supplement 1 and 2). These analyses led us to the unexpected finding that the active form of Rab10 is the preferential target of the effector-mediated manipulation. See also our response to Reviewer 1’s comment #3. Thank you very much for your insightful suggestion.

      (4) What is the fate of PR-Ub and then poly-Ub Rab10? How does poly-Ub of Rab10 result in its persistence at the LCV membrane rather than its degradation by the proteosome?

      We have not revealed the molecular mechanism in this study. We believe that it is an important question to be solved in future. We added the sentence in the discussion section (lines 376378).

      (5) Mutation of Lys518, the amino acid in SdcB identified by mass spec as modified by MavC, did not abrogate SdcB Ub-crosslinking, which leaves open the question of how MavC does inhibit SdcB. Is there any evidence of MavC mediated modification to the active site of SdcB?

      The active site of SdcB (C57) is required for the modification (Figure 5b), but it is not likely to be the target residue, as the MavC transglutaminase activity restricts the target residues to Lys. It would be expected that multiple Lys residues on SdcB can be modified by MavC to disturb the catalytic activity.

      (6) I found it difficult to understand the role of the ubiquitin glycine residues and the transglutaminase activity of MavC on the inhibition of SdcB function. Is structural modelling using Alphafold for example helpful to explain this?

      We conducted the Alphafold analysis of SdcB-Ub. Unfortunately, when the Glycine residues of Ub was placed to the catalytic pocket of SdcB, Q41 of Ub did not fit to the expected position of SdcB (K518). Probably, the ternary complex (MavC-Ub-SdcB) would cause the change of their entire conformation. A crystal structure analysis or more detailed molecular modeling would be required to resolve the issue.

      (7) Are the lys mutants of SdbB still active in poly-Ub of Rab10?

      We performed the experiment and found that K518R K891R mutant of SdcB still has the E3 ligase activity of similar level with the wild-type upon infection (new Figure 6-figure supplement 2) (lines 283-284). The level was actually slightly higher than that of the wildtype. This result may suggest that the blocking of the modification sites can rescue SdcB from MavC-mediated down regulation.

      Reviewer #2 (Recommendations For The Authors):

      see above

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study applies voltage clamp fluorometry to provide new information about the function of serotonin-gated ion channels 5-HT3AR. The authors convincingly investigate structural changes inside and outside the orthosteric site elicited by agonists, partial agonists, and antagonists, helping to annotate existing cryo-EM structures. This work confirms that the activation of 5-HT3 receptors is similar to other members of this well-studied receptor superfamily. The work will be of interest to scientists working on channel biophysics but also drug development targeting ligand-gated ion channels.

      Public Reviews:

      All reviewers agreed that these results are solid and interesting. However, reviewers also raised several concerns about the interpretation of the data and some other aspects related to data analysis and discussion that should be addressed by the authors. Essential revisions should include:

      (1) Please try to explicitly distinguish between a closed pore and a resting or desensitized state of the pore, to help in clarity.

      (2) Add quantification of VCF data (e.g. sensor current kinetics, as suggested by reviewer #2) or better clarify/discuss the VCF quantitative aspects that are taken into account to reach some conclusions (reviewer #3).

      (3) Review and add relevant foundational work relevant to this study that is not adequately cited.

      (4) Revise the text according to all recommendations raised by the reviewers and listed in the individual reviews below.

      We have revised the text to address all four points. See the answers to referees’ recommendations.

      Reviewer #1 (Public Review):

      Summary:

      This study brings new information about the function of serotonin-gated ion channels 5-HT3AR, by describing the conformational changes undergoing during ligands binding. These results can be potentially extrapolated to other members of the Cys-loop ligand-gated ion channels. By combining fluorescence microscopy with electrophysiological recordings, the authors investigate structural changes inside and outside the orthosteric site elicited by agonists, partial agonists, and antagonists. The results are convincing and correlate well with the observations from cryo-EM structures. The work will be of important significance and broad interest to scientists working on channel biophysics but also drug development targeting ligand-gated ion channels.

      Strengths:

      The authors present an elegant and well-designed study to investigate the conformational changes on 5-HT3AR where they combine electrophysiological and fluorometry recordings. They determined four positions suitable to act as sensors for the conformational changes of the receptor: two inside and two outside the agonist binding site. They make a strong point showing how antagonists produce conformational changes inside the orthosteric site similarly as agonists do but they failed to spread to the lower part of the ECD, in agreement with previous studies and Cryo-EM structures. They also show how some loss-of-function mutant receptors elicit conformational changes (changes in fluorescence) after partial agonist binding but failed to produce measurable ionic currents, pointing to intermediate states that are stabilized in these conditions. The four fluorescence sensors developed in this study may be good tools for further studies on characterizing drugs targeting the 5-HT3R.

      Weaknesses:

      Although the major conclusions of the manuscript seem well justified, some of the comparison with the structural data may be vague. The claim that monitoring these silent conformational changes can offer insights into the allosteric mechanisms contributing to signal transduction is not unique to this study and has been previously demonstrated by using similar techniques with other ion channels.

      The referee emphasizes that “some of the comparison with the structural data may be vague”. To better illustrate the structural reorganizations seen in the cryo-EM structures and that are used for VCF data interpretation, we added a new supplementary figure 3. It shows a superimposition of Apo, setron and 5-HT bond structures, with reorganization of loop C and Cys-loop consistent with VCF data.

      Reviewer #2 (Public Review):

      Summary:

      This study focuses on the 5-HT3 serotonin receptor, a pentameric ligand-gated ion channel important in chemical neurotransmission. There are many cryo-EM structures of this receptor with diverse ligands bound, however assignment of functional states to the structures remains incomplete. The team applies voltage-clamp fluorometry to measure, at once, both changes in ion channel activity, and changes in fluorescence. Four cysteine mutants were selected for fluorophore labeling, two near the neurotransmitter site, one in the ECD vestibule, and one at the ECD-TMD junction. Agonists, partial agonists, and antagonists were all found to yield similar changes in fluorescence, a proxy for conformational change, near the neurotransmitter site. The strength of the agonist correlated to a degree with propagation of this fluorescence change beyond the local site of neurotransmitter binding. Antagonists failed to elicit a change in fluorescence in the vestibular the ECD-TMD junction sites. The VCF results further turned up evidence supporting intermediate (likely pre-active) states.

      Strengths:

      The experiments appear rigorous, the problem the team tackles is timely and important, the writing and the figures are for the most part very clear. We sorely need approaches orthogonal to structural biology to annotate conformational states and observe conformational transitions in real membranes- this approach, and this study, get right to the heart of what is missing.

      Weaknesses:

      The weaknesses in the study itself are overall minor, I only suggest improvements geared toward clarity. What we are still missing is application of an approach like this to annotate the conformation of the part of the receptor buried in the membrane; there is important debate about which structure represents which state, and that is not addressed in the current study.

      Reviewer #3 (Public Review):

      Summary:

      The authors have examined the 5-HT3 receptor using voltage clamp fluorometry, which enables them to detect structural changes at the same time as the state of receptor activation. These are ensemble measurements, but they enable a picture of the action of different agonists and antagonists to be built up.

      Strengths:

      The combination of rigorously tested fluorescence reporters with oocyte electrophysiology is a solid development for this receptor class.

      Weaknesses:

      The interpretation of the data is solid but relevant foundational work is ignored. Although the data represent a new way of examining the 5-HT3 receptor, nothing that is found is original in the context of the superfamily. Quantitative information is discussed but not presented.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Here are some suggestions that may help to improve the manuscript: - Page 6, point 2), typo: "L131W is positioned more profound in each ECD, its side chain (...)"

      “profound” have been corrected into “profoundly”

      • Fig 1C: Why not compare 5-HT responses for the four sensors studied? If the reason is the low currents elicited by 5-HT on I160C/Y207W sensor, could you comment on this effect that is not observed for the other full agonist tested (mCPBG)?

      The point of this figure (Fig 1G) is to show currents that desensitize to follow the evolution of the fluorescence signal during desensitization, that’s why for the I160C/Y207W sensor where 5-HT become a partial agonist we have judge more appropriate to use mCPBG acting as a more potent agonist to elicit currents with clear desensitization component. We have added a sentence in the legend of the figure to explain this choice more clearly.

      • Page 9, paragraph 2: "However, concentration-response curves on V106C/L131W show a small yet visible decorrelation of fluorescence and current (...)" Statistical analysis on EC50c and EC50f will help to see this decorrelation.

      Statistical analysis (unpaired t test) has been added to figure 3 panel A.

      • Page 10, paragraph 1: the authors describe how "different antagonists promote different degrees of local conformational changes". Does it have any relation to the efficacy or potency of these antagonists? Is there any interpretation for this result?

      Since setrons are competitive antagonists, the concept of efficacy of these molecules is unclear. Concerning potency, no correlation between affinity and fluorescence variation is observed. For instance, ondansetron and alosetron bind with similar nanomolar affinity to the 5-HT3R (Thompson & Lummis Curr Pharm Des. 2006;12(28):3615-30) but elicit different fluorescence variations on both S204C and I160C/Y207W sensors.

      • Fig. 1 panel A, graph to far right: axis label is cut ("current (uA)/..."). Colors of graph A - right are not clearly distinguishable e.g. cyan from green.

      The fluorescent green color that describes the mutant has been changed into limon color which is more clearly distinguishable from cyan.

      • Why is R219C/F142W not selected in the study? Are the signals comparable to the chosen R219C/F142W?

      We have chosen not to select R219C/F142W because the current elicited by this construct was lower than the current elicited by the construct R219C/Y140W. Moreover, the residue F142 belongs to the FPF motif from Cys-loop that is essential for gating (Polovinkin et al, 2018, Nature).

      • Fig. 1 legend typo: "mutated in tryptophan”

      “in” has been changed by “into”

      • Fig. 2: yellow color (graphs in panel B) is very hard to read.

      Yellow color has been darkened to yellow/brown to allow easy reading.

      • Fig. 4 is too descriptive and undermines the information of the study. It could be improved e.g. by representing specific structures or partial structures involved. As an additional minor comment, some colors in the figure are hard to differentiate, e.g. magenta and purple.

      We have added relevant specific structures involved, namely loop C, the Cys-loop and pre-M1 loop to clarify. The intensity of magenta and purple has been increased to help differentiate the two sensor positions.

      • Fig S1C: it is confusing to see the same color pattern for the single mutants without the W. I would recommend to label each trace to make it clearer.

      Labelling of the traces corresponding to the single mutants has been added.

      • Fig S2: Indicating the statistical significance in the graph for the mutants with different desensitization properties compared to the WT receptor will help its interpretation.

      The statistical significance of the difference in the desensitization properties has been added to Figure S2.

      Reviewer #2 (Recommendations For The Authors):

      Overall comments for the authors:

      Selection of cysteine mutants and engineered Trp sites is clear and logical. VCF approach with controls for comparing the functionality of WT vs. mutants, and labeled with unlabeled receptor, is well explained and satisfying. The finding that desensitization involves little change in ECD conformation makes sense. It is somewhat surprising, at least superficially, to find that competitive antagonists promote changes in fluorescence in the same 'direction' and amplitude as strong agonists, however, this is indeed consistent with the structural biology, and with findings from other groups testing different labeling sites. Importantly, the team finds that antagonist-binding changes in deltaF do not spread beyond the region near the neurotransmitter site. The finding that most labeling sites in the ECD, in particular those not in/near the neurotransmitter site, fail to report measurable fluorescence changes, is noteworthy. It contrasts with findings in GlyR, as noted by the authors, and supports a mechanism where most of each subunit's ECD behaves as a rigid body.

      Specific questions/comments:

      I am confused about the sensor current kinetics. Results section 2) states that all sensors share the same current desensitization kinetics, while Results section 5) states that the ECD-TMD site and the vestibule site sensors exhibit faster desensitization. SF1C, right-most panel of R219C suggests the mutation and/or labeling here dramatically changes apparent activation and deactivation rates measured by TEVC. Both activation and deactivation upon washout appear faster in this one example. Data for desensitization are not shown here but are shown in aggregate in earlier panels. It is a bit surprising that activation and deactivation would both change but no effect on desensitization. Indeed, it looks like, in Fig. 1G, that desensitization rate is not consistent across all constructs. Can you please confirm/clarify?

      TEVC and VCF recordings in this study show a significant variability concerning both the apparent desensitization and desactivation kinetics. This is illustrated concerning desensitization in TEVC experiments in figure S2, where the remaining currents after 45 secondes of 5-HT perfusion and the rate constants of desensitization are measured on different oocytes from different batches. Therefore, the differences in desensitization kinetics shown in fig 1.G are not significant, the aim of the figure being solely to illustrate that no variation of fluorescence is observed during the desensitization phase. A sentence in the legend of fig 1.G has been added to precise this point. We also revised the first paragraph of result section 5, clearly stating that the slight tendency of faster desensitization of V106C/L131W and R219C/Y140W sensors is not significant.

      An alternative to the conclusion-like title of Results section 2) is that the ECD (and its labels) does not undergo notable conformational changes between activated and desensitized states.

      This is a good point and we have added a sentence at the end of results section 2 to present this idea.

      I find the discussion paragraph on partial agonist mechanisms, starting with "However," to be particularly important but at times hard to follow. Please try to revise for clarity. I am particularly excited to understand how we can understand/improve assignments of cryo-EM structures using the VCF (or other) approaches. As examples of where I struggled, near the top of p. 11, related to the partial agonist discussion, there is an assumption about the pore being either activated, or resting. Is it not also possible that partial agonists could stabilize a desensitized state of the pore? Strictly speaking, the labeling sites and current measurements do not distinguish between pre-active resting and desensitized channel conformations/states. However, the cryo-EM structures can likely help fill in the missing information there- with all the normal caveats. Please try to explicitly distinguish between a closed pore and a resting or desensitized state of the pore, to help in clarity.

      We have revised the section, and hope it is clearer now. We notably state more explicitly the argument for annotation of partial agonist bound closed structures as pre-active, mainly from kinetic consideration of VCF experiments. We also mention and cite a paper by the Chakrapani group published the 4th of January 2024 (Felt et al, Nature Communication), where they present the structures of the m5HT3AR bound to partial agonists, with a set of conformations fully consistent with our VCF data.

      This statement likely needs references: "...indirect experiments of substituted cysteine accessibility method (SCAM) and VCF experiments suggested that desensitization involves weak reorganizations of the upper part of the channel that holds the activation gate, arguing for the former hypothesis."

      Reference Polovinkin et al, Nature, 2018, has been added.

      I respectfully suggest toning down this language a little bit: "VCF allowed to characterize at an unprecedented resolution the mechanisms of action of allosteric effectors and allosteric mutations, to identify new intermediate conformations and to propose a structure-based functional annotation of known high-resolution structures." This VCF stands strongly without unclear claims about unprecedented resolution. What impresses me most are the findings distinguishing how agonists/partial agonists/antagonists share a conserved action in one area and not in another, the observations consistent with intermediate states, and the efforts to integrate these simultaneous current and conformation measurements with the intimidating array of EM structures.

      We thank the referee for his positive comments. We have removed “unprecedented resolution” and revised the sentences.

      It is beyond the scope of the current study, but I am curious what the authors think the hurdles will be to tracking conformation of the pore domain- an area where non-cryo-EM based conformational measurements are sorely needed to help annotate the EM structures.

      We fully agree with the referee that structures of the TMD are very divergent between the various conditions depending on the membrane surrogate. We are at the moment working on this region by VCF, incorporating the fluorescent unnatural amino acid ANAP.

      Minor:

      (1) P. 5, m5-HT3R: Please clarify that this refers to the mouse receptor, if that is correct.

      OK, “mouse” has been added.

      (2) Fig. 1D, I suggest moving the 180-degree arrow to the right so it is below but between the two exterior and vestibular views.

      Ok, it has been done.

      (3) Please add a standard 2D chemical structure of MTS-TAMRA, and TAMRA attached to a cysteine, to Fig 1.

      A standard chemical structure has been added for the two isomers of MTS-TAMRA.

      (4) Please label subpanels in Fig. 1G with the identity of the label site.

      The subpanels have been labelled.

      Reviewer #3 (Recommendations For The Authors):

      This is solid work but I mainly have suggestions about placing it in context.

      (1) Abstract "Data show that strong agonists promote a concerted motion of all sensors during activation, "

      The concept of sensors here is the fluorescent labels? I did not find this meaningful until I read the significance statement.

      We have specified “fluorescently-labelled” before sensors in the abstract.

      (2) p4 "each subunit in the 5-HT3A pentamer...." this description would be identical for any pentameric LGIC so the authors should beware of a misleading specificity. This goes for other phrases in this paragraph. However, the summary of the 5HT specific results is very good.

      About the description of the structure, we added “The 5-HT3AR displays a typical pLGIC structure, where….”.

      (3) This paper is very nicely put together and generally explains itself well. The work is rigorous and comprehensive. But the meaning of quenching (by local Trp) seems straightforward, but it is not made explicit in the paper. Why doesn't simple labelling (single Cys) at this site work? And can we have a more direct demonstration of the advantage of including the Trp (not in the supplementary figure?) All this information is condensed into the first part of figure 1 (the graph in Figure 1A). Figure 1 could be split and the principle of the introduced quenching could be more clearly shown

      detailed in a few more sentences the principle of the TrIQ approach. In addition, to be more explicit, the significative differences of fluorescence comparing sensors with and without tryptophan have been added in Figure 1, panel screening and a sentence have been added in the legend of this figure.

      (4) p10 "VCF measurements are also remarkably coherent with the atomic structures showing an open pore (so called F, State 2 and 5-HT asymmetric states), "

      This statement is intriguing. What do these names or concepts represent? Are they all the same thing? Where do the names come from? What is meant here? Three different concepts, all consistent? Or three names for the same concept?

      We have tried to clarify the statement by making reference to the PDB of the structures.

      (5) "Fluorescence and VCF studies identified similar intermediate conformations for nAChRs, ⍺1-GlyRs and the bacterial homolog GLIC(21,32-35). "

      Whilst this is true, the motivation for such ideas came from earlier work identifying intermediates from electrophysiology alone (such as the flip state (Burzomato et al 2004), the priming state (Mukhatsimova 2009) and the conformational wave in ACh channels grosman et al 2000). It would be appropriate to mention some of this earlier work.

      We have incorporated and described these references in the discussion. Of note, we fully quoted these references in our previous papers on the subject (Menny 2017, Lefebvre 2021, Shi 2023), but the referee is right in asking to quote them again.

      (6) "A key finding of the study is the identification of pre-active intermediates that are favored upon binding of partial agonists and/or in the presence of loss-of-function mutations. "

      Even more fundamental, the idea of a two-state equilibrium for neurotransmitter receptors was discarded in 1957 according to the action of partial agonists.

      DEL CASTILLO J, KATZ B (1957) Interaction at end-plate receptors between different choline derivatives. Proc R Soc Lond B Biol Sci

      So to discover this "intermediate" - that is, bound but minimal activity - in the present context seems a bit much. It is a big positive of this paper that the results are congruent with our expectations, but I cannot see value in posing the results as an extension of the 2-state equilibrium (for which there are anyway other objections).

      As for intermediates being favoured by loss of function mutations, this concept is already well established in glycine receptors (Plested et al 2007, Lape et al 2012) and doubtless in other cases too.

      I do get the point that the authors want to establish a basis in 5-HT3 receptors, but these previous works suggest the results are somewhat expected. This should be commented on.

      We also agree. We replace “key finding” by “key observation”, quote most of the references proposed, and explicitly conclude that “The present work thus extends this idea to the 5HT3AR, together with providing structural blueprints for cryo-EM structure annotation”.

      (7) "In addition, VCF data allow a quantitative estimate of the complex allosteric action of partial agonists, that do not exclusively stabilize the active state and document the detailed phenotypes of various allosteric mutations."

      Where is this provided? If the authors are not motivated to do this, I have some doubts that others will step in. If it is not worth doing, it's probably not worth mentioning either.

      Language has been toned down by “In addition, VCF data give insights in the action of partial agonists, that do not exclusively stabilize the active state and document the phenotypes of various allosteric mutations."

      (8) Figure 1G please mark which construct is which.

      This has been added into Figure 1G

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers for their insightful comments and recommendations. We have extensively revised the manuscript in response to the valuable feedback. We believe the results is a more rigorous and thoughtful analysis of the data. Furthermore, our interpretation and discussion of the findings is more focused and highlights the importance of the circuit and its role in the response to stress. Thank you for helping to improve the presented science.

      Key changes made in response to the reviewers comments include:

      • Revision of statistical analyses for nearly all figures, with the addition of a new table of summary statistics to include F and/or t values alongside p-values.

      • Addition of statistical analyses for all fiber photometry data.

      • Examination of data for possible sex dependent effects.

      • Clarification of breeding strategies and genotype differences, with added details to methods to improve clarity.

      • Addressing concerns about the specificity of virus injections and the spread, with additional details added to methods.

      • Modification of terminology related to goal-directed behavior based on reviewer feedback, including removal of the term from the manuscript.

      • Clarification and additional data on the use of photostimulation and its effects, including efforts to inactivate neurons for further insight, despite technical challenges.

      • Correction of grammatical errors throughout the manuscript.

      Reviewer 1:

      Despite the manuscript being generally well-written and easy to follow, there are several grammatical errors throughout that need to be addressed.

      Thank you for highlighting this issue. Grammatical errors have been fixed in the revised version of the manuscript.

      Only p values are given in the text to support statistical differences. This is not sufficient. F and/or t values should be given as well.

      In response to this critique and similar comments from Reviewer 2, we re-evaluated our approach to statistical analyses and extensively revised analyses for nearly all figures. We also added a new table of summary statistics (Supplemental Table 1) containing the type of analysis, statistic, comparison, multiple comparisons, and p value(s). For Figures 4C-E, 5C, 6C-E, 7H-I, and 8H we analyzed these data using two-way repeated measures (RM) ANOVA that examined the main effect of time (either number of sessions or stimulation period) in the same animal and compared that to the main effect of genotype of the animal (Cre+ vs Cre-), and if there was an interaction. For Supplemental Figure 7A we also conducted a two-way RM ANOVA with time as a factor and activity state (number of port activations in active vs inactive nose port) as the other in Cre+ mice. For Figures 5D-E we conducted a two-way mixed model ANOVA that accounted and corrected for missing data. In figures that only compared two groups of data (Figures 5F-L, 6F, 8C-D, 8I, and Supp 6F-G) we used two-tailed t-test for the analysis. If our question and/or hypothesis required us to conduct multiple comparisons between or within treatments, we conducted Bonferroni’s multiple comparisons test for post hoc analysis (we note which groups we compared in Supplemental Table 1). For figures that did or did not show a change in calcium activity (Figure 3G, 3I-K, 7B, 7D-E, 8E-F), we compared waveform confidence intervals (Jean-Richard-Dit-Bressel, Clifford, McNally, 2020). The time windows we used as comparison are noted in Supplemental Table 1, and if the comparisons were significant at 95%, 99%, and 99.9% thresholds.

      None of prior comparisons in prior analyses that were significant were found to have fallen below thresh holds for significance. Of those found to be not significantly different, only one change was noted. In Figure 6E there was now a significant baseline difference between Cre+ and Cre- mice with Cre- mice taking longer to first engage the port compared to Cre+ mice (p=0.045). Although the more rigorous approach the statistical analyses did not change our interpretations we feel the enhanced the paper and thank the reviewer for pushing this improvement.

      Moreover, the fibre photometry data does not appear to have any statistical analyses reported - only confidence intervals represented in the figures without any mention of whether the null hypothesis that the elevations in activity observed are different from the baseline.

      This is particularly important where there is ambiguity, such as in Figure 3K, where the spontaneous activity of the animal appears to correlate with a spike in activity but the text mentions that there is no such difference. Without statistics, this is difficult to judge.

      Thank you for highlighting this critical point and providing an opportunity to strengthen our manuscript. We added statistical analyses of all fiber photometry data using a recently described approach based on waveform confidence intervals (Jean-Richard-Dit-Bressel, Clifford, McNally, 2020). In the statistical summary (Supplemental Table 1) we note the time window that we used for comparison in each analysis and if the comparisons were significant at 95%, 99%, and 99.9% thresholds. Thank you from highlighting this and helping make the manuscript stronger.

      With respect to Figure 3K, we are not certain we understood the spike in activity the reviewer referred to. Figure 3J and K include both velocity data (gold) and Ca2+ dependent signal (blue). We used episodes of velocity that were comparable to the avoidance respond during the ambush test and no significant differences in the Ca2+ signal when gating around changes in velocity in the absence of stressor (Supplemental Table1). This is in contrast to the significant change in Ca2+ signal following a mock predator ambush (Figure 3J). We interpret these data together to indicate that locomotion does not correlate with an increase in calcium activity in SuMVGLUT2+::POA neurons, but that coping to a stressor does. This conclusion is further examined in supplemental Figure 5, including examining cross-correlation to test for temporally offset relationship between velocity and Ca2+ signal in SUMVGLUT2+::POA neurons.

      The use of photostimulation only is unfortunate, it would have been really nice to see some inactivation of these neurons as well. This is because of the well-documented issues with being able to determine whether photostimulation is occurring in a physiological manner, and therefore makes certain data difficult to interpret. For instance, with regards to the 'active coping' behaviours - is this really the correct characterisation of what's going on? I wonder if the mice simply had developed immobile responding as a coping strategy but when they experience stimulation of these neurons that they find aversive, immobility is not sufficient to deal with the summative effects of the aversion from the swimming task as well as from the neuronal activation? An inactivation study would be more convincing.

      We agree with the point of the reviewer, experiments demonstrating necessity of SUMVGLUT2+::POA neurons would have added to the story here. We carried out multiple experiments aimed at addressing questions about necessity of SuMVGLUT2+::POA neurons in stress coping behaviors, specifically the forced swim assay. Efforts included employing chemogenetic, optogenetic, and tetanus toxin-based methods. We observed no effects on locomotor activity or stress coping. These experiments are both technically difficult and challenging to interpret. Interpretation of negative results, as we obtained, is particularly difficult because of potential technical confounds. Selective targeting of SuMVGLUT2+::POA neurons for inhibition requires a process requiring three viral injections and two recombination steps, increasing variability and reducing the number of neurons impacted. Alternatively, photoinhibition targeting SuMVGLUT2+::POA cells can be done using Retro-AAV injected into POA and a fiber implant over SuM. We tried both approaches. Data obtained were difficult to interpret because of questions about adequate coverage of SuMVGLUT2+::POA population by virally expressed constructs and/or light spread arose. The challenge of adequate coverage to effectively prevent output from the targeted population is further confounded by challenges inherent in neural inhibition, specifically determining if the inhibition created at the cellular level is adequate to block output in the context of excitatory inputs or if neurons must be first engaged in a particular manner for inhibition to be effective. Baseline neural activity, release probability, and post-synaptic effects could all be relevant, which photo-inhibition will potentially not resolve. So, while the trend is to always show “necessary and sufficient” effects, we’ve tried nearly everything, and we simply cannot conclude much from our mixed results. There are also wellestablished problems with existing photo-inhibition methods, which while people use them and tout them, are often ignored. We have a lot of expertise in photo-inhibition optogenetics, and indeed have used it with some success, developed new methods, yet in this particular case we are unable to draw conclusions related to inhibition. People have experienced similar challenges in locus coeruleus neurons, which have very low basal activity, and inhibition with chemogenetics is very hard, as well as with optogenetic pump-based approaches, because the neurons fire robust rebound APs. We have spent almost 2.5 years trying to get this to work in this circuit because reviews have been insistent on this result for the paper to be conclusive. Unfortunately, it simply isn’t possible in our view until we know more about the cell types involved. This is all in spite of experience using the approach in many other publications.

      We also employed less selective approaches, such as injecting AAV-DIO-tetanus toxin light chain (Tettox) constructs directly into SuM VGLUT2-Cre mice but found off target effects impacting animal wellbeing and impeding behavioral testing due viral spread to surrounding areas.

      While we are disappointed for being unable to directly address questions about necessity of SuMVGLUT2+::POA neurons in active coping with experimental data, we were unable to obtain results allowing for clear interpretation across numerous other domains the reviewers requested. We also feel strongly that until we have a clear picture of the molecular cell type architecture in the SuM, and Cre-drivers to target subsets of neurons, this question will be difficult to resolve for any group. We are working now on RNAseq and related spatial transcriptomics efforts in the SuM and examining additional behavioral paradigm to resolve these issues, so stay tuned for future publications.

      Accordingly, we avoid making statements relating to necessity in the manuscript. In spite of having several lines of physiological data with strong robust correlations behavior related to the SuMVGLUT2+::POA circuit.

      Nose poke is only nominally instrumental as it cannot be shown to have a unique relationship with the outcome that is independent of the stimuli-outcome relationships (in the same way that a lever press can, for example). Moreover, there is nothing here to show that the behaviours are goal-directed.

      Thank you for highlighting this point. Regarding goal-direct terminology, we removed this terminology from the manuscript. Since the mice perform highly selective (active vs inactive) port activation robustly across multiple days of training the behavior likely transitions to habitual behavior. We only tested the valuation of stimuli termination of the final day of training with time limited progressive ratio test. With respect to lever press versus active port activation, we are unclear how using a lever in this context would offer a different interpretation. Lever pressing may be more sensitive to changes in valuation when compared to nose poke port activation (Atalayer and Rowland 2008); however, in this study the focus of the operant behavior is separating innate behaviors for learned action–outcome instrumental learned behaviors for threat response (LeDoux and Daw 2018). The robust highly selective activation of the active port illustrated in Figure 6 fits as an action–outcome instrumental behavior wherein mice learn to engage the active but not inactive port to terminate photostimulation. The first activation of the port occurs through exploration of the arena but as demonstrated by the number of active port activations and the decline in time of the first active port engagement, mice expressing ChR2eYFP learn to engage the port to terminate the stimulation. To aid in illustrating this point we have added Supplemental Figure 7 showing active and inactive port activations for both Cre+ and Cre- mice. This adds clarity to high rate of selective port activation driven my stimulation of SUMVGLUT2+::POA neurons compared to controls. The elimination of goal directed and providing additional data narrows and supports one of the key points of the operant experiment.

      With regards to Figure 1: This is a nice figure, but I wonder if some quantification of the pathways and their density might be helpful, perhaps by measuring the intensity of fluorescence in image J (as these are processes, not cell bodies that can be counted)? Mind you, they all look pretty dense so perhaps this is not necessary! However, because the authors are looking at projections in so-called 'stress-engaged regions', the amygdala seems conspicuous by its absence. Did the authors look in the amygdala and find no projections? If so it seems that this would be worth noting.

      This is an interesting question but has proven to be a very technically challenging question. We consulted with several leaders who routinely use complimentary viral tracing methods in the field. We were unable to devise a method to provide a satisfactorily meaningful quantitative (as opposed to qualitative) approach to compare SUMVGLUT2+::POA to SuMVGLUT2+ projections. A few limitations are present that hinder a meaningful quantitative approach. One limitation was the need for different viral strategies to label the two populations. Labeling SuMVGLUT2+::POA neurons requires using VGLUT2-Flp mice with two injections into the POA and one into SuM. Two recombinase steps were required, reducing efficiency of overlap. This combination of viral injections, particularly the injections of RetroAAVs in the POA, can induce significant quantitative variability due to tropism, efficacy, and variability of retro-viral methods, and viral infection generally. These issues are often totally ignored in similar studies across the “neural circuit” landscape, but it doesn’t make them less relevant here.

      Although people do this in the field, and show quantification, we actually believe that it can be a quite misleading read-out of functionally relevant circuitry, given that neurotransmitter release ultimately is amplified by receptors post-synaptically, and many examples of robust behavioral effects have been observed with low fiber tracing complimentary methods (McCall, Siuda et al. 2017). In contrast, the broader SuMVGLUT2+ population was labeled using a single injection into the SuM. This means there like more efficient expression of the fluorophore. Additionally, in areas that contain terminals and passing fibers understanding and interpreting fluorescent signal is challenging. Together, these factors limit a meaningful quantitative comparison and make an interpretation difficult to make. In this context, we focused on a conservative qualitative presentation to demonstrate two central points. That 1) SuMVGLUT2+::POA neurons are subset of SuMVGLUT2+ neurons that project to specific areas and that exclude dentate gyrus, and they 2) arborize extensively to multiple areas which have be linked to threat responses. We agree that there is much to be learned about how different populations in SuM connect to targets in different regions of the brain and to continue to examine this question with different techniques. A meaningful quantitative study comparing projections is technically complex and, we feel, beyond our ability for this study.

      Also, for the reasons above we do not believe that quantification provides exceptional clarity with respect to the putative function of the circuit, glutamate released, or other cotransmitters given known amplification at the post-synaptic side of the circuit.

      With regard to the amygdala, other studies on SuM projections have found efferent projections to amygdala (Ottersen, 1980; Vertes, 1992). In our study we were unable to definitively determine projections from SuMVGLUT2+::POA neurons to amygdala, which if present are not particularly dense. For this reason we were conservative and do not comment on this particular structure.

      I would suggest removing the term goal-directed from the manuscript and just focusing on the active vs. passive distinction.

      We removed the use of goal-directed. Thank you for helping us clarify our terminology.

      The effect observed in Figure 7I is interesting, and I'm wondering if a rebound effect is the most likely explanation for this. Did the authors inhibit the VGAT neurons in this region at any other times and observe a similar rebound? If such a rebound was not observed it would suggest that it is something specific about this task that is producing the behaviour. I would like it if the authors could comment on this.

      We agree that results showing the change in coping strategy (passive to active) in forced swim after but not during stimulation of SuMVGAT+ neurons is quite interesting (Figure 7I). This experiment activated SuMVGAT+ neurons during a section of the forced swim assay and mice showed a robust shift to mobility after the stimulation of SuMVGAT+ neurons stopped. We did not carry out inhibition of SuMVGAT+ neurons in this manuscript. As the reviewer suggested, strong inhibition of local SuM neurons, including SUMVGLUT2+::POA neurons, could lead to rebound activity that may shift coping behaviors in confusing ways. We agree this is an interesting idea but do not have data to support the hypothesis further at this time.

      Reviewer 2

      (1) These are very difficult, small brain regions to hit, and it is commendable to take on the circuit under investigation here. However, there is no evidence throughout the manuscript that the authors are reliably hitting the targets and the spread is comparable across experiments, groups, etc., decreasing the significance of the current findings. There are no hit/virus spread maps presented for any data, and the representative images are cropped to avoid showing the brain regions lateral and dorsal to the target regions. In images where you can see the adjacent regions, there appears expression of cell bodies (such as Supp 6B), suggesting a lack of SuM specificity to the injections.

      We agree with the reviewer that the areas studied are small and technically challenging to hit. This was one of driving motivations for using multiple tools in tandem to restrict the area targeted for stimulation. Approaches included using a retrograde AAVs to express ChR2eFYP in SUMVGLUT2+::POA neurons; thereby, restricting expression to VGLUT2+ neurons that project to the POA. Targeting was further limited by placement of the optic fiber over cell bodies on SuM. Thus, only neurons that are VGLUT2+, project to the POA, and were close enough to the fiber were active by photostimulation. Regrettably, we were not able to compile images from mice where the fiber was misplaced leading to loss of behavioral effects. We would have liked to provide that here to address this comment. Unfortunately, generating heat maps for injections is not possible for anatomic studies that use unlabeled recombinase as part of an intersectional approach. Also determining the point of injection of a retroAAV can be difficult to accurately determine its location because neurons remote to injection site and their processes are labeled.

      Experiments described in Supplemental Figure 6B on VGAT neurons in SuM were designed and interpreted to support the point that SUMVGLUT2+::POA neurons are a distinct population that does not overlap with GABAergic neurons. For this point it is important that we targeted SuM, but highly confined targeting is not needed to support the central interpretation of the data. We do see labeling in SuM in VGAT-Cre mice but photo stimulation of SuMVGAT+ neurons does not generate the behavioral changes seen with activation of SUMVGLUT2+::POA neurons. As the reviewer points out, SuM is small target and viral injection is likely to spread beyond the anatomic boundaries to other VGAT+ neurons in the region, which are not the focus here. The activation would be restricted by the spread of light from the fiber over SuM (estimated to be about a 200um sphere in all directions). We did not further examine projections or localization of VGAT+ neurons in this study but focused on the differential behavioral effects of SUMVGLUT2+::POA neurons.

      (2) In addition, the whole brain tracing is very valuable, but there is very little quantification of the tracing. As the tracing is the first several figures and supp figure and the basis for the interpretation of the behavior results, it is important to understand things including how robust the POA projection is compared to the collateral regions, etc. Just a rep image for each of the first two figures is insufficient, especially given the above issue raised. The combination of validation of the restricted expression of viruses, rep images, and quantified tracing would add rigor that made the behavioral effects have more significance.

      For example, in Fig 2, how can one be sure that the nature of the difference between the nonspecific anterograde glutamate neuron tracing and the Sum-POA glutamate neuron tracing is real when there is no quantification or validation of the hits and expression, nor any quantification showing the effects replicate across mice? It could be due to many factors, such as the spread up the tract of the injection in the nonspecific experiment resulting in the labeling of additional regions, etc.

      Relatedly, in Supp 4, why isn’t C normalized to DAPI, which they show, or area? Similar for G what is the mcherry coverage/expression, and why isn’t Fos normalized to that?

      Thank you for highlighting the importance of anatomy and the value of anatomy. Two points based on the anatomic studies are central to our interpretation of the experimental data. First, SUMVGLUT2+::POA are a distinct population within the SuM. We show this by demonstrating they are not GABAergic and that they do not project to dentate gyrus. Projections from SuM to dentate gyrus have been described in multiple studies (Boulland et al., 2009; Haglund et al., 1987; Hashimotodani et al., 2018; Vertes, 1992) and we demonstrate them here for SuMVGLUT2+ cells. Using an intersectional approach in VGLUT2-Flp mice we show SUMVGLUT2+::POA neurons do not project to dentate gyrus. We show cell bodies of SUMVGLUT2+::POA neurons located in SuM across multiple figures including clear brain images. Thus, SUMVGLUT2+::POA neurons are SuM neurons that do not project to dentate gyrus, are not GABAergic, send projections to a distinct subset of targets, most notably excluding dentate gyrus. Second, SUMVGLUT2+::POA neurons arborize sending projections to multiple regions. We show this using a combinatorial genetic and viral approach to restrict expression of eYFP to only neurons that are in SuM (based on viral injection), project to the POA (based on retrograde AAV injection in POA), and VGLUT2+ (VGLUT2-Flp mice). Thus, any eYFP labeled projection comes from SUMVGLUT2+::POA neurons. We further confirmed projections using retroAAV injection into areas identified using anterograde approaches (Supplemental Figure 2). As discussed above in replies to Reviewer 1, we feel limitations are present that preclude meaningful quantitative analysis. We thus opted for a conservative interpretation as outlined.

      Prior studies have shown efferent projections from SuM to many areas, and projections to dentate gyrus have received substantial attention (Bouland et al., 2009; Haglund, Swanson, and Kohler, 1984; Hashimotodani et al., 2018; Soussi et al., 2010; Vertes, 1992; Pan and McNaugton, 2004). We saw many of the same projections from SuMVGLUT2+ neurons. We found no projections from SUMVGLUT2+::POA neurons to dentate gyrus (Figure 2). Our description of SuM projection to dentate gyrus is not new but finding a population of neurons in SuM that does not project to dentate gyrus but does project to other regions in hippocampus is new. This finding cannot be explained by spread of the virus in the tract or non-selective labeling.

      (3) The authors state that they use male and female mice, but they do not describe the n’s for each experiment or address sex as a biological variable in the design here. As there are baseline sex differences in locomotion, stress responses, etc., these could easily factor into behavioral effects observed here.

      Sex specific effects are possible; however, the studies presented here were not designed or powered to directly examine them. A point about experimental design that helps mitigate against strong sex dependent effect is that often the paradigm we used examined baseline (pre-stimulation) behavior, how behavior changed during stimulation, and how behavior returned (or not) to baseline after stimulation. Thus, we test changes in individual behaviors. Although we had limited statistical power, we conducted analyses to examine the effects of sex as variable in the experiments and found no differences among males and females.

      (4) In a similar vein as the above, the authors appear to use mice of different genotypes (however the exact genotypes and breeding strategy are not described) for their circuit manipulation studies without first validating that baseline behavioral expression, habituation, stress responses are not different. Therefore, it is unclear how to interpret the behavioral effects of circuit manipulation. For example in 7H, what would the VGLUT2-Cre mouse with control virus look like over time? Time is a confound for these behaviors, as mice often habituate to the task, and this varies from genotype to genotype. In Fig 8H, it looks like there may be some baseline differences between genotypes- what is normal food consumption like in these mice compared to each other? Do Cre+ mice just locomote and/or eat less? This issue exists across the figures and is related to issues of statistics, potential genotype differences, and other experimental design issues as described, as well as the question about the possibility of a general locomotor difference (vs only stress-induced). In addition, the authors use a control virus for the control groups in VGAT-Cre manipulation studies but do not explain the reasoning for the difference in approach.

      Thank you for highlighting the need for greater clarity about the breeding strategies used and for these related questions. We address the breeding strategy and then move to address the additional concerns raised. We have added details to the methods section to address this point. For VGLUT2-Cre mice we use litter mates controls from Cre/WT x WT/WT cross. The VGLUT2-Cre line (RRID:IMSR_JAX:028863) (Vong L , et al. 2011) used here been used in many other reports. We are not aware of any reports indicating a phenotype associated with the addition of the IRES-Cre to the Slc17a6 loci and there is no expected impact of expression of VGLUT2. Also, we see in many of the experiments here that the baseline (Figures 4, 5, and 7) behaviors are not different between the Cre+ and Cre- mice. For VGAT-Cre mice we used a different breeding strategy that allowed us to achieve greater control of the composition of litters and more efficient cohorts cohort. A Cre/Cre x WT/WT cross yielded all Cre/WT litters. The AAV injected, ChR2eYFP or eYFP, allowed us to balance the cohort.

      Regarding Figure 7H, which shows time immobile on the second day of a swim test, data from the Cre- mice demonstrate the natural course of progression during the second day of the test. The control mice in the VGAT-Cre cohort (Figure 7I) have similar trend. The change in behavior during the stimulation period in the Cre+ mice is caused by the activation of SUMVGLUT2+::POA neurons. The behavioral shift largely, but not completely, returns to baseline when the photostimulation stops. We have no reason to believe a VGLUT2-Cre+ mouse injected with control AAV to express eYFP would be different from WT littermate injected with AVV expressing ChR2eYFP in a Cre dependent manner.

      Turning to concerns related to 8H, which shows data from fasted mice quantify time spent interacting with food pellet immediately after presentation of a chow pellet, we found no significant difference between the control and Cre+ mice. We unaware of any evidence indicating that the two groups should have a different baseline since the Cre insertion is not expected to alter gene expression and we are unaware of reports of a phenotype relating to feeding and the presence of the transgene in this mouse line. Even if there were a small baseline shift this would not explain the large abrupt shift induced by the photostimulation. As noted above, we saw shifts in behavior abruptly induced by the initiation of photostimulation when compared to baseline in multiple experiments. This shift would not be explained by a hypothetical difference in the baseline behaviors of litter mates.

      (5) The statistics used throughout are inappropriate. The authors use serial Mann-Whitney U tests without a description of data distributions within and across groups. Further, they do not use any overall F tests even though most of the data are presented with more than two bars on the same graph. Stats should be employed according to how the data are presented together on a graph. For example, stats for pre-stim, stim, and post-stim behavior X between Cre+ and Cre- groups should employ something like a two-way repeated measures ANOVA, with post-hoc comparisons following up on those effects and interactions. There are many instances in which one group changes over time or there could be overall main effects of genotype. Not only is serially using Mann-Whitney tests within the same panel misleading and statistically inaccurate, but it cherry-picks the comparisons to be made to avoid more complex results. It is difficult to comprehend the effects of the manipulations presented without more careful consideration of the appropriate options for statistical analysis.

      We thank the reviewer for pointing this out and suggesting alterative analyses, we agree with the assessment on this topic. Therefore, we have extensively revised the statical approach to our data using the suggested approach. Reviewer 1 also made a similar comment, and we would like to point to our reply to reviewer 1’s second point in regard to what we changed and added to the new statistical analyses. Further, we have added a full table detailing the statical values for each figure to the paper.

      Conceptual:

      (6) What does the signal look like at the terminals in the POA? Any suggestion from the data that the projection to the POA is important?

      This is an interesting question that we will pursue in future investigations into the roles of the POA. We used the projection to the POA from SuM to identify a subpopulation in SuM and we were surprised to find the extensive arborization of these neurons to many areas associated with threat responses. We focused on the cell bodies as “hubs” with many “spokes”. Extensive studies are needed to understand the roles of individual projections and their targets. There is also the hypothetical technical challenge of manipulating one projection without activating retrograde propagation of action potentials to the soma. At the current time we have no specific insights into the roles of the isolated projection to POA. Interpretation of experiments activating only “spoke” of the hub would be challenging. Simple terminal stimulation experiments are challenged by the need to separate POA projections from activation of passing fibers targeting more anterior structures of the accumbens and septum.

      (7) Is this distinguishing active coping behavior without a locomotor phenotype? For example, Fig. 5I and other figure panels show a distance effect of stimulation (but see issues raised about the genotype of comparison groups). In addition, locomotor behavior is not included for many behaviors, so it is hard to completely buy the interpretation presented.

      We agree with the reviewer and thank them for highlighting this fundamental challenge in studies examining active coping behaviors in rodents, which requires movement. Additionally, actively responding to threatening stressors would include increased locomotor activity. Separation of movement alone from active coping can be challenging. Because of these concerns we undertook experiments using diverse behavioral paradigms to examine the elicited behaviors and the recruitment of SuMVGLUT2+::POA neurons to stressors. We conducted experiments to directly examine behaviors evoked by photoactivation of SuMVGLUT2+::POA. In these experiments we observed a diversity of behaviors including increased locomotion and jumping but also treading/digging (Figure 4). These are behaviors elicited in mice by threatening and noxious stimuli. An Increase of running or only jumping could signify a specific locomotor effect, but this is not what was observed. Based on these behaviors, we expected to find evidence of increase movement in open field (Figure 5G-I) and light dark choice (Figure 5J-L) assays. For many of the assays, reporting distance traveled is not practical. An important set of experiments that argues against a generic increase in locomotion is the operant behavior experiments, which require the animal to engage in a learned behavior while receiving photostimulation of SuMVGLUT2+::POA neurons (Figure 6). This is particularly true for testing using a progressive ratio when the time of ongoing photostimulation is longer, yet animals actively and selectively engage the active port (Figure 6G-H). Further, we saw a shift in behavioral strategy induce by photoactivation in forced swim test (Figure 7H). Thus, activation of SUMVGLUT2+::POA neurons elicited a range of behaviors that included swimming, jumping, treading, and learned response, not just increased movement. Together these data strongly argue that SuMVGLUT2+::POA neurons do not only promote increased locomotor behavior. We interpret these data together with the data from fiber photometry studies to show SuMVGLUT2+::POA neurons are recruited during acute stressors, contribute to aversive affective component of stress, and promote active behaviors without constraining the behavioral pattern.

      Regarding genotype, we address this in comments above as well but believe that clarifying the use of litter mates, the extensive use of the VGLUT2-Cre line by multiple groups, and experimental design allowing for comparison to baseline, stimulation evoked, and post stimulation behaviors within and across genotypes mitigate possible concerns relating to the genotype.

      (8) What is the role of GABA neurons in the SuM and how does this relate to their function and interaction with glutamate neurons? In Supp 8, GABA neuron activation also modulates locomotion and in Fig 7 there is an effect on immobility, so this seems pretty important for the overall interpretation and should probably be mentioned in the abstract.

      Thank you for noting these interesting findings. We added text to highlight these findings to the abstract. Possible roles of GABAergic neurons in SuM extend beyond the scope of the current study particularly since SuM neurons have been shown to release both GABA and glutamate (Li Y, Bao H, Luo Y, et al. 2020, Root DH, Zhang S, Barker DJ et al. 2018). GABAergic neurons regulate dentate gyrus (Ajibola MI, Wu JW, Abdulmajeed WI, Lien CC 2021), REM sleep (Billwiller F, Renouard L, Clement O, Fort P, Luppi PH 2017), and novelty processing Chen S, He L, Huang AJY, Boehringer R et al. 2020). The population of exclusively GABAergic vs dual neurotransmitter neurons in SuM requires further dissection to be understood. How they may relate to SUMVGLUT2+::POA neurons require further investigation.

      Questions about figure presentation:

      (9) In Fig 3, why are heat maps shown as a single animal for the first couple and a group average for the others?

      Thank you for highlighting this point for further clarification. We modified the labels in the figure to help make clear which figures are from one animal across multiple trials and those that are from multiple animals. In the ambush assay each animal one had one trial, to avoid habituation to the mock predator. Accordingly, we do not have multiple trials for each animal in this test. In contrast, the dunk assay (10 trial/animal) and the shock (5 trials/animal) had multiple trials for each animal. We present data from a representative animal when there are multiple trials per animal and the aggerate data.

      Why is the temporal resolution for J and K different even though the time scale shown is the same?

      Thank you for noticing this error carried forward from a prior draft of the figure so we could correct it. We replaced the image in 3J with a more correctly scaled heatmap.

      What is the evidence that these signal changes are not due to movement per se?

      Thank you for the question. There are two points of evidence. First, all the 465 nm excitation (Ca2+ dependent) data was collected in interleaved fashion with 415 nm (isosbestic) excitation data. The isosbestic signal is derived from GCaMP emission but is independent of Ca2+ binding (Martianova E, Aronson S, Proulx CD. 2019). This approach, time-division multiplexing, can correct calcium-dependent for changes in signal most often due to mechanical change. The second piece of evidence is experimental. Using multiple cohorts of mice, we examined if the change in Ca2+ signal was correlated with movement. We used the threshold of velocity of movement seen following the ambush. We found no correlation between high velocity movements and Ca2+ signal (Figure 3K) including cross correlational analysis (Supplemental figure 5). Based on these points together we conclude the change in the Ca2+ signal in SUMVGLUT2+::POA neurons is not due to movement induced mechanical changes and we find no correlation to movement unless a stressor is present, i.e. mock predator ambush or forced swim. Further, the stressors evoke very different locomotor responses fleeing, jumping, or swimming.

      (10) In Fig 4, the authors carefully code various behaviors in mice. While they pick a few and show them as bars, they do not show the distribution of behaviors in Cre- vs Cre+ mice before manipulation (to show they have similar behaviors) or how these behaviors shift categories in each group with stimulation. Which behaviors in each group are shifting to others across the stim and post-stim periods compared to pre-stim?

      This is an important point. We selected behaviors to highlight in Figure4 C-E because these behaviors are exhibited in response to stress (De Boer & Koolhaas, 2003; van Erp et al., 1994). For the highlighted behaviors, jumping, treading/digging, grooming, we show baseline (pre photostimulation), stimulation, and post stimulation for Cre+ and Cre- mice with the values for each animal plotted. We show all nine behaviors as a heat map in Figure 4B. The panels show changes that may occur as a function of time and show changes induced by photostimulation.

      The heatmaps demonstrate that photostimulation of SUMVGLUT2+::POA neurons causes a suppression of walking, grooming, and immobile behaviors with an increase in jumping, digging/treading, and rapid locomotion. After stimulation stops, there is an increase in grooming and time immobile. The control mice show a range of behaviors with no shifts noted with the onset or termination of photostimulation.

      Of note, issues of statistics, genotype, and SABV are important here. For example, the hint that treading/digging may have a slightly different pre-stim basal expression, it seems important to first evaluate strain and sex differences before interpreting these data.

      We examined the effects of sex as a biological variable in the experiments reported in the manuscript and found no differences among males and females in any of the experiments where we had enough animals in each sex (minimum of 5 mice) for meaningful comparisons. We did this by comparing means and SEM of males and females within each group (e.g. Cre+ males vs Cre+ female, Cre- males vs Cre- females) and then conducted a t-test to see if there was a difference. For figures that show time as a variable (e.g Figure 6C-E), we compared males and females with time x sex as main factors and compared them (including multiple comparisons if needed). We found no significant main effects or interactions between males and females. Because of this, and to maximize statistical power, we decided to move forward to keep males and females together in all the analyses presented in the manuscript. It is worth noting also that the core of the experimental design employed is a change in behavior caused by photostimulation. The mice are also the same strain with only difference being the modification to add an IRES and sequence for Cre behind the coding sequence of the Slc17A6 (VGLUT2) gene.

      (11) Why do the authors use 10 Hz stimulation primarily? is this a physiologically relevant stim frequency? They show that they get effects with 1 Hz, which can be quite different in terms of plasticity compared to 10 Hz.

      Thank you for the raising this important question. Because tests like open field and forced swim are subject to habituation and cannot be run multiple times per animal a test frequency was needed to use across multiple experiments for consistency. The frequency of 10Hz was selected because it falls within the rate of reported firing rates for SuM neurons (Farrel et al., 2021; Pedersen et al., 2017) and based on the robust but sub maximal effects seen in the real-time place preference assays. Identification of the native firing rates during stress response would be ideal but gathering this data for the identified population remains a dauting task.

      (12) In Fig 5A-F, it is unclear whether locomotion differences are playing a role. Entrances (which are low for both groups) are shown but distance traveled or velocity are not.

      In B, there is no color in the lower left panel. where are these mice spending their time? How is the entirety of the upper left panel brighter than the lower left? If the heat map is based on time distribution during the session, there should be more color in between blue and red in the lower left when you start to lose the red hot spots in the upper left, for example. That is, the mice have to be somewhere in apparatus. If the heat map is based on distance, it would seem the Cre- mice move less during the stim.

      We appreciate the opportunity to address this question, and the attention to detail the reviewer applied to our paper. In the real time place preference test (RTPP) stimulation would only be provided while the animal was on the stimulation side. Mice quickly leave the stimulation side of the arena, as seen in the supplemental video, particularly at the higher frequencies. Thus, the time stimulation is applied is quite low. The mice often retreat to a corner from entering the stimulation side during trials using higher frequency stimulation. Changing locomotor activity along could drive changes in the number entrances but we did not find this. In regard to the heat map, the color scale is dynamically set for each of the paired examples that are pulled from a single trial. To maximize the visibility between the paired examples the color scale does not transfer between the trials. As a result, in the example for 10 Hz the mouse spent a larger amount of time in the in the area corresponding to the lower right corner of the image and the maximum value of the color scale is assigned to that region. As seen in the supplemental video, mice often retreated to the corner of the non-stimulation side after entering the stimulation side. The control animal did not spend a concentrated amount of time in any one region, thus there is a lack of warmer colors. In contrast the baseline condition both Cre+ and Cre- mice spent time in areas disturbed on both sides of arena, as expected. As a result, the maximum value in the heat map is lower and more area are coded in warmer colors allowing for easier visual comparison between the pair. Using the scale for the 10 Hz pair across all leads to mostly dark images. We considered ways to optimized visualization across and within pairs and focused on the within pair comparison for visualization.

      (13) By starting with 1 hz, are the experimenters inducing LTD in the circuit? what would happen if you stop stimming after the first epoch? Would the behavioral effect continue? What does the heat map for the 1 hz stim look like?

      Relatedly, it is a lot of consistent stimulation over time and you likely would get glutamate depletion without a break in the stim for that long.

      Thank you for the opportunity to add clarity around this point regarding the trials in RTPP testing. Importantly, the trials were not carried out in order of increasing frequency of stimulation, as plotted. Rather, the order of trials was, to the extent possible with the number of mice, counterbalanced across the five conditions. Thus, possible contribution of effects of one trial on the next were minimized by altering the order of the trials.

      We have added a heat map for the 1 Hz condition to figure 5B.

      For experiments on RTPP the average stimulation time at 10Hz was less than 10 seconds per event. As a result, the data are unlikely to be affected by possible depletion of synaptic glutamate. For experiments using sustained stimulation (open field or light dark choice assays) we have no clear data to address if this might be a factor where 10Hz stimulation was applied for the entire trial.

      (14) In Fig 6, the authors show that the Cre- mice just don't do the task, so it is unclear what the utility of the rest of the figure is (such as the PR part). Relatedly, the pause is dependent on the activation, so isn't C just the same as D? In G and H, why ids a subset of Cre+ mice shown?

      Why not all mice, including Cre- mice?

      Thank you for the opportunity to improve the clarity of this section. A central aspect of the experiments in Figure 6 is the aversiveness of SUMVGLUT2+::POA neuron photostimulation, as shown in Figure 5B-F. The aversion to photostimulation drives task performance in the negative reinforcer paradigm. The mice perform a task (active port activation) to terminate the negative reinforcer (photostimulation of SuMVGLUT2+::POA neurons). Accordingly, control mice are not expected to perform the task because SuMVGLUT2+::POA neurons are not activated and, thus the mice are not motivated to perform the task.

      A central point we aim to covey in this figure is that while SuMVGLUT2+::POA neurons are being stimulated, mice perform the operant task. They selectively activated the active port (Supplemental Figure 7). As expected, control mice activate the active port at a low level in the process of exploring the arena. This diminishes on subsequent trials as mice habituate to the arena (Figure 6D). The data in Figures 6 C and D are related but can be divergent. Each pause in stimulation requires a port activation of a FR1 test but the number of port activations can exceed the pauses, which are 10 seconds long, if the animal continues to activate the port. Comparing data in Figures 6 C and D revels that mice generally activated the port two to three times for each pause earned with a trend towards greater efficiency on day 4 with more rewards and fewer activations.

      The purpose of the progressive ratio test is to examine if photostimulation of SuMVGLUT2+::POA continues to drive behavior as the effort required to terminate the negative stimuli increases. As seen in Figures 6 G and H, the stimulation of SuMVGLUT2+::POA neurons remains highly motivating. In the 20-minute trial we did not find a break point even as the number of port activations required to pause the stimulation exceed 50. We do not show the Cre- mice is Figure 6G and H because they did not perform the task, as seen in Figure 6F. For technical reasons in early trials, we have fully timely time stamped data for rewards and port activations from a subset of the Cre+ mice. Of note, this contains both the highest and lowest performing mice from the entire data set.

      Taken together, we interpret the results of the operant behavioral testing as demonstrating that SuMVGLUT2+::POA neuron activation is aversive, can drive performance of an operant tasks (as opposed to fixed escape behaviors), and is highly motivating.

      (15) In Fig 7, what does the GCaMP signal look like if aligned to the onset of immobility? It looks like since the hindpaw swimming is short and seems to precede immobility, and the increase in the signal is ramping up at the onset of hindpaw swimming, it may be that the calcium signal is aligned with the onset of immobility.

      What does it look like for swimming onset?

      In I, what is the temporal resolution for the decrease in immobility? Does it start prior to the termination of the stim, or does it require some elapsed time after the termination, etc?

      Thank for the opportunity to addresses these points and improve that clarity of our interpretation of the data. Regarding aligning the Ca2+ signal from fiber photometry recordings to swimming onset and offset, it is important to note that the swimming bouts are not the same length. As a result, in the time prior to alignment to offset of behaviors animals will have been swimming for different lengths of time. In Figure 7 C, we use the behavioral heat map to convey the behavioral average. Below we show the Ca2+ dependent signal aligned at the offset of hindpaw swim for an individual mouse (A) and for the total cohort (B). This alignment shows that the Ca2+ dependent signal declines corresponding to the termination of hindpaw swimming. Because these bouts last less than the total the widow shown, the data is largely included in Figure 7 C and D, which is aligned to onset. Due to the nuance of the difference is the alignment and the partial redundancy, we elected to include the requested alignment to swimming offset in the reply rather in primary figure.

      Author response image 1.

      Turning to the question regarding swimming onset, the animals started swimming immediately when placed in the water and maintained swimming and climbing behaviors until shifting behaviors as illustrated in Figure 7A and B. During this time the Ca2+-dependent signal was elevated but there is only one trial per animal. This question can perhaps be better addressed in the dunk assay presented in Figure 3C, F and G and Supplemental Figure 4 H and I. Here swimming started with each dunk and the Ca2+ signal increased.

      Regarding the question for about figure 7I. We scored for entire periods (2 mins) in aggerate. We noted in videos of the behavior test that there was an abrupt decrease in immobility tightly corresponding to the end of stimulation. In a few animals this shift occurred approximately 15-20s before the end of stimulation. This may relate to the depletion of neurotransmitter as suggested by the reviewer.

      Reviewer 3

      Major points

      (1) Results in Figure 1 suggested that SuM-Vglu2::POA projected not only POA but also to the diverse brain regions. We can think of two models which account for this. One is that homogeneous populations of neurons in SuM-Vglu2::POA have collaterals and innervated all the efferent targets shown in Figure 1. Another is to think of distinct subpopulations of neurons projecting subsets of efferent targets shown in Figure 1 as well as POA. It is suggested to address this by combining approaches taken in experiments for Figure 1 and Supplemental Figure 2.

      Thank you for raising this interesting point. We have attempted combining retroAAV injections into multiple areas that receive projections from SUMVGLUT2+::POA neurons. However, we have found the results unsatisfactory for separating the two models proposed. Using eYFP and tdTomato expressing we saw some overlapping expressing in SuM. We are not able to conclude if this indicates separate populations or partial labeling of a homogenous populations. A third option seems possible as well. There could be a mix of neurons projecting to different combinations of downstream targets. This seems particularly difficult to address using fluorophores. We are preparing to apply additional methodologies to this question, but it extends beyond the scope of this manuscript.

      (2) Since the authors drew a hypothetical model in which the diverse brain regions mediate the effect of SuM-Vglu2::POA activation in behavioral alterations at least in part, examination of the concurrent activation of those brain regions upon photoactivation of SuM-Vglu2::POA. This must help the readers to understand which neural circuits act upon the induction of active coping behavior under stress.

      Thank you for raising this important point. We agree that activating glutamatergic neurons should lead to activation of post synaptic neurons in the target regions. Delineating this in vivo is less straight forward. Doing so requires much greater knowledge of post synaptic partners of SUMVGLUT2+::POA neurons. There are a number of issues that would need to be accounted for. Undertaking two color photo stimulation plus fiber photometry is possible but not a technical triviality. Further, it is possible that we would measure Ca2+ signals in neurons that have no relevant input or that local circuits in a region may shape the signal. We would also lack temporal resolution to identify mono-postsynaptic vs polysynaptic connections. Thus, we would struggle to know if the change in signal was due to the excitatory input from SuM or from a second region. At present, we remain unclear on how to pursue this question experimentally in a manner that is likely to generate clearly interpretable results.

      (3) In Figure 4, "active coping behaviors" must be called "behaviors relevant to the active behaviors" or "active coping-like behaviors", since those behaviors were in the absence of stressors to cope with.

      Thank you for the suggestion on how to clarify our terminology. We have adopted the active coping-like term.

      (4) For the Dunk test, it is suggested to describe the results and methods more in detail, since the readers would be new to it. In particular, the mice could change their behavior between dunks under this test, although they still showed immobility across trials as in Supplemental Figure 4I. Since neural activity during the test was summarized across trials as in Figure 3, it is critical to examine whether the behavior changes according to time.

      Thank you for identifying this opportunity to improve our manuscript. We have expanded and added a detailed description of the dunk test in the methods section.

      As for Supplemental Figure 4I, we apologize for the confusion because the purpose of this figure is to show that mice remained mobile for the entire 30-second dunk trial. This did not appreciably change over the 10 trials. We have revised this figure to plot both immobile and mobile time to achieve greater clarity on this point.

      Minor points

      Typos

      In Figure 1, please add a serotype of AAVs to make it compatible with other figures and their legends.

      In the main text and Figure 2K, the authors used MHb/LHb and mHb/lHb in a mixed fashion. Please make them unified.

      In the figure legend of Figure 6, change "SuMVGLUT2+::POA neurons drive" to "SuMVGLUT2+::POA neurons " in the title.

      In line 86, please change "Retro-AAV2-Nuc-flox(mCherry)-eGFP" to "AAV5-Nuc-flox(mCherry)eGFP".

      In line 80, please change "Positive controls" to "As positive controls, ".

      Thank you for taking the time and making the effort to identify and call these out. We have corrected them.

    1. Metadata is information about some data. So we often think about a dataset as consisting of the main pieces of data (whatever those are in a specific situation), and whatever other information we have about that data (metadata).

      When you consider the quantity of information that can be obtained from a single post, it is mind-boggling. Metadata is a strong tool that may be used by a large number of individuals. I believe that it has the potential to do a great deal of damage if it is misused. The information obtained from the post may be used by criminals to commit crimes. One example of this is the singer and artist Pop Smoke, who passed away recently. During his stay in Los Angeles, Pop Smoke shared a photo on his Instagram account that included the time and location of his location. In less than forty-eight hours, a bunch of criminals were successful in locating him and tragically ended his life.

    1. Author Response

      The following is the authors’ response to the original reviews.

      General remarks for the Editor and the Reviewers

      We would like to thank the Editor and the Reviewers for their feedback. Below we address their comments and present our point-by-point responses as well as the related changes in the manuscript.

      In addition to these changes, in a few cases we have found it necessary to move some texts and provide some additional explanations within the manuscript. We emphasize that these amendments have been made for only technical reasons, and do not alter the results and conclusions of the paper, but may help to render the text more coherent and understandable to readers with little knowledge of the subject.

      These minor corrections are:

      • We extended the Introduction section by a sentence (lines 40-42) that is intended to fit the proposed template directed, non-enzymatic replication mechanism into a more general prebiotic evolutionary context, thus emphasizing its biological relevance. This sentence includes an additional reference (Rosenberger et al., 2021).

      • Two very methodologically oriented and repeated descriptions of random sequence generation have been moved to the Methods section (lines 178-185) from the Results section (lines 336-339 and lines 351-354).

      • We complemented the Data availability statement with licensing information (lines 684-685).

      • Further minor changes (also indicated by red texts) have been implemented to remedy logical and grammatical glitches.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Szathmary and colleagues explore the parabolic growth regime of replicator evolution. Parabolic growth occurs when nucleic acid strain separation is the rate-limiting step of the replication process which would have been the case for non-enzymatic replication of short oligonucleotide that could precede the emergence of ribozyme polymerases and helicases. The key result is that parabolic replication is conducive to the maintenance of genetic diversity, that is, the coexistence of numerous master sequences (the Gause principle does not apply). Another important finding is that there is no error threshold for parabolic replication except for the extreme case of zero fidelity.

      Strengths:

      I find both the analytic and the numerical results to be quite convincing and well-described. The results of this work are potentially important because they reveal aspects of a realistic evolutionary scenario for the origin of replicators.

      Weaknesses:

      There are no obvious technical weaknesses. It can be argued that the results represent an incremental advance because many aspects of parabolic replication have been explored previously (the relevant publications are properly cited). Obviously, the work is purely theoretical, experimental study of parabolic replication is due. In the opinion of this reviewer, though, these are understandable limitations that do not actually detract from the value of this work.

      We are grateful that this Reviewer appreciates our work. We completely agree that the ultimate validation must come from experiments. It is important to stress that in this field theory often preceded experimental work by decades, and the former often guided the latter. We hope that for the topic of the present paper experiments will follow considerably faster.

      Reviewer #2 (Public Review):

      Summary:

      A dominant hypothesis concerning the origin of life is that, before the appearance of the first enzymes, RNA replicated non-enzymatically by templating. However, this replication was probably not very efficient, due to the propensity of single strands to bind to each other, thus inhibiting template replication. This phenomenon, known as product inhibition, has been shown to lead to parabolic growth instead of exponential growth. Previous works have shown that this situation limits competition between alternative replicators and therefore promotes RNA population diversity. The present work examines this scenario in a model of RNA replication, taking into account finite population size, mutations, and differences in GC content. The main results are (1) confirmation that parabolic growth promotes diversity, but that when the population size is small enough, sequences least efficient at replicating may nevertheless go extinct; (2) the observation that fitness is not only controlled by the replicability of sequences, but also by their GC content; (3) the observation that parabolic growth attenuates the impact of mutations and, in particular, that the error threshold to which exponentially growing sequences are subject can be exceeded, enabling sequence identity to be maintained at higher mutation rates.

      Strengths:

      The analyses are sound and the observations are intriguing. Indeed, it has been noted previously that parabolic growth promotes coexistence, its role in mitigating the error threshold catastrophe - which is often presented as a major obstacle to our understanding of the origin of life - had not been examined before.

      Weaknesses:

      Although all the conclusions are interesting, most are not very surprising for people familiar with the literature. As the authors point out, parabolic growth is well known to promote diversity (SzathmaryGladkih 89) and it has also been noted previously that a form of Darwinian selection can be found at small population sizes (Davis 2000).

      Given that under parabolic growth, no sequence is ever excluded for infinite populations, it is also not surprising to find that mutations have a less dramatic exclusionary impact.

      In the two articles cited (Szathmary-Gladkih 1989 and Davis 2000) the subexponentiality of the system was implemented in a mechanistic way, by introducing the exponent 0 < 𝑝 < 1. Although the behaviour of these models is more or less consistent with experimental findings (von Kiedrowski, 1986; Zielinski and Orgel, 1987), the divergence of per capita growth rates (𝑥̇/𝑥) at very low concentrations–which guarantees the ability to maintain unlimited diversity in the case of infinite population sizes–makes this formal approach partly unrealistic.

      To avoid the possible artefacts of this mechanistic approach, and as there are no previous studies analysing the diversity maintaining ability of finite populations of parabolic replicators in an individual-based model context, we implemented a simplified template replication mechanism leading to parabolic growth and analysed the dynamics in an individual-based stochastic model context. The key point of our investigation is that considerable diversity can be maintained in the system even when the population size is quite small.

      Regarding the Reviewer’s comment on selection: Darwinian selection can only occur in a simple subexponential dynamics if the ratio of replicabilities diverges, cf. Eq. (8) and the preceding paragraph in Davis, 2000.

      Our results also show (Figs. 4B and 4C) that high mutation rates and the error threshold problem can still be considered as a major limiting factor for parabolically replicating systems in terms of their diversity-maintaining ability. In the light of the above, potential mechanisms to relax the error threshold in such systems, one of which is demonstrated in the present study, seem to be important steps to account for the sequence diversification and increase in molecular complexity during the early evolution of RNA replicators.

      A general weakness is the presentation of models and parameters, whose choices often appear arbitrary. Modeling choices that would deserve to be further discussed include the association of the monomers with the strands and the ensuing polymerization, which are combined into a single association/polymerization reaction (see also below), or the choice to restrict to oligomers of length L = 10. Other models, similar to the one employed here, have been proposed that do not make these assumptions, e.g. Rosenberger et al. Self-Assembly of Informational Polymers by Templated Ligation, PRX 2021. To understand how such assumptions affect the results, it would be helpful to present the model from the perspective of existing models.

      The assumption of one-step polymerization reactions that we used here is a common technique for modelling template replication of sequence-represented replicators [see, e.g., Fontana and Schuster, 1998 (10.1126/science.280.5368.1451), Könnyű et al., 2008 (10.1186/1471-2148-8267), Vig-Milkovics et al, 2019 (10.1016/j.jtbi.2018.11.020) or Szilágyi et al., 2020 (10.1371/journal.pgen.1009155)]. This is because assuming base-to-base polymerisation of the copy would lead to a very large number of different types of intermediates, which a Gillespietype stochastic simulation algorithm could not handle in reasonable computation times, even if the sequences were relatively short. For comparison, in our model, where polymerization is one-step, the characteristic time of a simulation for 𝐿 = 10, 𝑁 = 105 and 𝛿 = 0.01 was 552 hours.

      Note that in Rosenberg et al. (PRX 2021), in contrast to a pioneering work [Fernando et al, 2007 (10.1007/s00239-006-0218-4)], sequences of replicators are not represented, which makes this approach completely inapplicable to our case, in which sequence defines the fitness. In sum, we suggest that this valid criticism points to possible future work.

      The values of the (many) parameters, often very specific, also very often lack justifications. For example, why is the "predefined error factor" ε = 0.2 and not lower or higher? How would that affect the results?

      A general remark. For the more important parameters , several values were used to test the behaviour of the model (see Table 1), but due to the considerable number of parameters, it is impossible to examine all possible combinations. 𝑐+ = 1 fixes the timescale, 𝐿 is set to 10 to obtain reasonable running times (see above).

      𝜀 characterizes how replicability decreases as the number of mutations increases. In the manuscript we used the following default vector: 𝜀 = (0.05, 0.2, 1) in which the third element corresponds to the mutation-free sequence, so it must to be 1. The first element determines the baseline replicability (see Methods), which we preferred not to change because it would fundamentally alter the ratio of replication propensities to association and dissociation propensities (as the substantial amount of complementary sequences of the master sequences are of baseline replicability) and thus would alter the reaction kinetics to an extent that it is not comparable with the original results. Therefore, only the second element can be adjusted. Accordingly, we have analysed the behaviour of the model in the cases of a steeper and a more gradual loss of replicability using the following two vectors, respectively: 𝜀, = (0.05, 𝟎. 𝟎𝟓, 1) and 𝜀,, = (0.05, 𝟎. 𝟓, 1). The choice of 𝜀, is chemically more plausible, since for very short oligomers the loss of chemical activity and replicability as a function of the number of mutations can be very sharp. We performed a series of simulations with all possible combinations of 𝛿 = 0.001, 0.005, 0.1 and 𝑁 = 103, 104, 105 for 𝜀′ and 𝜀,,in the constant population and chemostat model context (36 different runs). For other parameters, we took the default values, see Table 1. These values also correspond to the parameters we used in Figures 2 and 6. The results show that the steeper loss of replicability (𝜀,) slightly increases the diversity maintaining ability of the system, whereas the more gradual loss of replicability (𝜀,,) moderately decreases the diversity-maintaining ability of the system, and that these shifts are more pronounced in the constant population size model (Author response image 1) than in the chemostat model (Author response image 2). Altogether, these results confirm that the qualitative outcome of the model is robust in a wide range of loss of replicability (𝜀 vector) values.

      Author response image 1.

      Replicator coexistence in the constant population model with different loss of replicability (𝜀 vector) values. Within a given combination of 𝛿 and 𝑁 parameter values, the upper panel corresponds to the steeper loss of replicability (𝜀!), the middle panel to the default 𝜀 vector (Figure 2A), and the bottom panel to the more gradual loss of replicability vector (𝜀!!). Within each 𝛿; 𝑁 parameter combination, the same master sequence set was used with the three different 𝜀 vectors for comparability.

      Author response image 2.

      Replicator coexistence in the chemostat model with different loss of replicability (𝜀 vector) values. Within a given combination of 𝛿 and 𝑁 parameter values, the upper panel corresponds to the steeper loss of replicability (𝜀!), the middle panel to the default 𝜀 vector (Figure 6A), and the bottom panel to the more gradual loss of replicability vector (𝜀!!). Within each 𝛿; 𝑁 parameter combination, the same master sequence set was used with the three different 𝜀 vectors for comparability.

      Similarly, in equation (11), where does the factor 0.8 come from?

      This factor scales the decay rate of duplex sequences (𝑐"!") as the function of the binding energy

      (𝐸b). The value of 0.8 is an arbitrary choice, the value should be in the interval (0,1) and is only relevant in the chemostat model. It is expected to have a similar effect on the dynamics as the duplex decay factor parameter 𝑓, which we have investigated in a wide range of different values (cf. Table 1, Fig. 6), although 𝑓 is independent of the binding energy (𝐸/): increasing/decreasing the 0.8 factor is expected to decrease/increase the average total population size. We have investigated the diversity maintaining ability of the system at smaller (0.6) and larger (0.9) parameter values at different population sizes (𝑁 ≈ 103, 104 and 105) and at different replicability distances (δ = 0.001, 0.005 and 0.01) as shown in Fig. 6. We have found that the number of coexisting master types changes very little in response to changes in this factor. Only two shifts could be detected (underlined): factor 0.9 combined with 𝑁 ≈ 104 and 𝛿 = 0.001 caused the number of surviving master types to decrease by one, while factor 0.9 combined with 𝑁 ≈ 103 and 𝛿 = 0.01 caused the number of surviving master types to increase by one (Author response table 1). Factor 0.6 produced the same number of surviving types as the default (Author response table 1). In summary, the model shows marked robustness to changes in the values of this parameter.

      Author response table 1.

      Number of coexisting master types in the chemostat model with different binding energy dependent duplex decay rates. Within each 𝛿; 𝑁 parameter combination, the same master sequence set was used with the three different factor values: 0.6, 0.8 (the original) and 0.9 for comparability.

      Why is the kinetic constant for duplex decay reaction 1.15e10−8?

      Note that this value is the minimum of the duplex decay rate, Table 1 correctly shows the interval of this kinetic constant as: [1.15 ⋅ 10-8, 6.4 ⋅ 10-5]. Both values are derived from the basic parameters of the system and can be computed according to Eq. (11). The minimum: as the parameter set corresponding to this value is: . The maximum: with .

      Are those values related to experiments, or are they chosen because specific behaviors can happen only then?

      See above.

      The choice of the model and parameters potentially impact the two main results, the attenuation of the error threshold and the role of GC content:

      Regarding the error threshold, it is also noted (lines 379-385) that it disappears when back mutations are taken into account. This suggests that overcoming the error threshold might not be as difficult as suggested, and can be achieved in several ways, which calls into question the importance of the particular role of parabolic growth. Besides, when the concentration of replicators is low, product inhibition may be negligible, such that a "parabolic replicator" is effectively growing exponentially and an error catastrophe may occur. Do the authors think that this consideration could affect their conclusion? Can simulations be performed?

      The assumption of back mutation only provides a theoretical solution to the error threshold problem: back mutation guarantees a positive (non-zero) concentration of a master type, but, since the probability of back mutation is generally very low, this equilibrium concentration may be extremely low, or negligible for typical system sizes. Consequently, back mutation alone does not solve the problem of the error catastrophe: in our system back mutation is present (the probability that a sequence with 𝑘 errors mutates back to a master sequence is 𝜇k(1−𝜇)L-k), and the diversity-maintaining ability is limited. The effect of back mutation decreases exponentially with increasing sequence length.

      Regarding the role of the GC content, GC-rich oligomers are found to perform the worst but no rationale is provided.

      For GC-rich oligonucleotides the dissociation probability of a template-copy complex is relatively low (cf. Eqs. (9, 10)), thus they have a relatively low number of offspring, cf. lines 557-561: “a relatively high dissociation probability and the consequential higher propensity of being in a simple stranded form provides an advantage for sequences with relatively low GC content in terms of their replication affinity, that is, the expected number of offspring in case of such variants will be relatively high.”. Note that the simulation results shown in Fig. 3A, demonstrate the realization of this effect with prepared sequences (along a GC content gradient).

      One may assume that it happens because GC-rich sequences are comparatively longer to release the product. However, it is also conceivable that higher GC content may help in the polymerization of the monomers as the monomers attach longer on the template (as described in Eq. (9)). This is an instance where the choice to pull into a single step the association and polymerization reactions are pulled into a single step independent of GC content may be critical.

      It would be important to show that the result arises from the actual physics and not from this modeling choice.

      Some more specific points that would deserve to be addressed:

      • Line 53: it is said that p "reflects how easily the template-reaction product complex dissociates". This statement is not correct. A reaction order p<1 reflects product inhibition, the propensity of templates to bind to each other, not slow product release. Product release can be limiting, yet a reaction order of 1 can be achieved if substrate concentrations are sufficiently high relative to oligomer concentrations (von Kiedrowski et al., 1991).

      We think the key reference is Von Kiedrowski (1993) in this case. Other things being equal, his Table 1 on p. 134 shows that a sufficient increase in 𝐾4, i.e., the stability of the duplex (template and copy) (association rate divided by dissociation rate) throws the system into the parabolic regime. This is what we had in mind. In order to clarify this, we modified the quoted sentence thus: “In this kinetics, the growth order is equal or close to 0.5 (i.e., the dynamics is sub-exponential) because increased stability of the template-copy complex (rate of association divided by dissociation) promotes parabolic growth (von Kiedrowski et al., 1991; von Kiedrowski & Szathmáry, 2001).”

      • Population size is a key parameter, and a comparison is made between small (10^3) and large (10^5) populations, but without explaining what determines the scale (small/large relative to what?).

      The “small” value (103) corresponds to the smallest meaningful population size, significantly smaller population sizes (e.g. 102) cannot maintain the 10 master types (or any subset of them) and are chemically unrealistic. The “large value” (105) is the largest population size for which simulation times are still acceptable, in the case of 106 the runtimes are in the order of months.

      • In the same vein, we might expect size not to be the only important parameter, but also concentration.

      With constant volume population size and concentration are strictly coupled.

      • Lines 543-546: if understanding correctly, the quantitative result is that the error threshold rises from 0.1 in the exponential case to 0.196 in the parabolic. Are the authors suggesting that a factor of 2 is a significant difference?

      In this paragraph we compared the empirical error threshold of our system (which is close to 𝑝"#$ = 0.15) with the error threshold of the well-known single peak fitness landscape (which can be approximated by ) as a reference case. To make the message even clearer we have extended the last sentence (lines 596-597) as follows: “but note that applying this approach to our system is a serious oversimplification”. The 0.196 is simply the probability of error-free replication of a sequence when , but we have removed this sentence (“corresponding to the replication accuracy of a master sequence”) from the manuscript as it seems to be confusing.

      • Figure 3C: this figure shows no statistically significant effect?

      Thank you for pointing out this. We statistically tested the hypothesis that the GC content between the survived and the extinct master subsets are different. This analysis revealed that the differences between these two groups are statistically significant, which we now included in the manuscript at lines 380-390: “A direct investigation of whether the sequence composition of the master types is associated with their survival outcome was conducted using the data from the constant population model simulation results (Figure 2). In these data, the average GC content was measured to be lower in the surviving master subpopulations than in the extinct subpopulations (Figure 3C). To determine whether this difference was statistically significant, nonparametric, two-sample Wilcoxon rank-sum tests (Hollander & Wolfe, 1999) were performed on the GC content of the extinct-surviving master subsets. The GC content was significantly different between these two groups in all nine investigated parameter combinations of population size (N) and replicability distance (δ) at p<0.05 level, indicating a selective advantage for a lower GC content in the constant population model context. The exact p values obtained from this analysis are shown in Figure 3C.”

      • line 542: "phase transition-like species extension (Figure 4B)": such a clear threshold is not apparent.

      Thank you for pointing out the incorrect phrasing. As there is no clear threshold in the number of coexisting types as a function of the mutation rate, we removed the “phase transition-like” expression: “However, when finite population sizes and stochastic effects are taken into account, at the largest investigated per-base mutation rate (𝑝mut = 0.15), the summed relative steady-state master frequencies approach zero (Figure 4C) with accelerating species extinction (Figure 4B), indicating that this value is close to the system׳s empirical error threshold.” (lines 589-594).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      On the whole, the work is well done and presented, there are no major recommendations. It seems a good idea to cite and briefly discuss this recent paper: https://pubmed.ncbi.nlm.nih.gov/36996101/ which develops a symbiotic scenario of the coevolution of primordial replicators and reproducers that appears to be fully compatible with the results of the current work.

      Thank you for bringing this article to our attention. We have inserted the following sentence at lines 621-624: “The demonstrated diversity-maintaining mechanism of finite parabolic populations can be used as a plug-in model to investigate the coevolution of naked and encapsulated molecular replicators (e.g., Babajanyan et al., 2023).”

      The manuscript is well written, but there are some minor glitches that merit attention. For example:

      l. 5 "carriers presents a problem, because product formation and mutual hybridization" - "mutual" is superfluous here, delete

      l. 13 "amplification. In addition, sequence effects (GC content) and the strength of resource" - hardly "effects" - should be 'features' or 'properties'

      l. 41 "If enzyme-free replication of oligomer modules with a high degree of sequence" - "modules" here is only confusing - simply, "oligomers"

      l. 44 "under ecological competition conditions with which distinct replicator types with different" - delete "with" etc, there are many such minor glitches that are best corrected.

      Thank you for pointing out, we have corrected! Other drafting errors, glitches, superfluous sentences have also been corrected.

      Reviewer #2 (Recommendations For The Authors):

      None

      Editor (Recommendations For The Authors):

      In the manuscript, it appears that coexistence is assessed at a given point in time, while figures seem to show that it remains time-dependent. It would be great if the authors could clarify this and/or discuss this.

      We appreciate you bringing this to our attention, as we have indeed missed to elaborate on this important point. The steady state characteristic of the coexistence is assessed in our model in the following way: the relative frequency of each master sequence is tested for the condition of ≥ 100- (cut-off relative frequency for survival) in every 2,000th replication step in the interval between 10,000 replication steps before termination and actual termination (10= replication steps). If the above condition is true more than once, we consider the master type in question as survived (we have included this explanation in the Methods section: lines 258-268). Although this relatively narrow time interval can still be regarded as a snapshot of the state of the system, according to our numerical experiences, the resulting measure is a reliable quantitative indicator of the apparent stability of species coexistence in the parabolic dynamics.

    1. Author Response

      eLife assessment

      In this study, the authors offer a theoretical explanation for the emergence of nematic bundles in the actin cortex, carrying implications for the assembly of actomyosin stress fibers. As such, the study is a valuable contribution to the field actomyosin organization in the actin cortex. While the theoretical work is solid, experimental evidence in support of the model assumptions remains incomplete. The presentation could be improved to enhance accessibility for readers without a strong background in hydrodynamic and nematic theories.

      To address the weaknesses identified in this assessment, we plan to expand the description of the theoretical model to make it more accessible to a broader spectrum of readers. We will discuss in more detail the relation between the different mathematical terms and physical processes at the molecular scale, as well as the experimental evidence supporting the model assumptions. We will also discuss more explicitly how our results are relevant to different systems exhibiting actomyosin nematic bundles beyond stress fibers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this article, Mirza et al developed a continuum active gel model of actomyosin cytoskeleton that account for nematic order and density variations in actomyosin. Using this model, they identify the requirements for the formation of dense nematic structures. In particular, they show that self-organization into nematic bundles requires both flow-induced alignment and active tension anisotropy in the system. By varying model parameters that control active tension and nematic alignment, the authors show that their model reproduces a rich variety of actomyosin structures, including tactoids, fibres, asters as well as crystalline networks. Additionally, discrete simulations are employed to calculate the activity parameters in the continuum model, providing a microscopic perspective on the conditions driving the formation of fibrillar patterns.

      Strengths:

      The strength of the work lies in its delineation of the parameter ranges that generate distinct types of nematic organization within actomyosin networks. The authors pinpoint the physical mechanisms behind the formation of fibrillar patterns, which may offer valuable insights into stress fiber assembly. Another strength of the work is connecting activity parameters in the continuum theory with microscopic simulations.

      We thank the referee for these comments.

      Weaknesses:

      This paper is a very difficult read for nonspecialists, especially if you are not well-versed in continuum hydrodynamic theories. Efforts should be made to connect various elements of theory with biological mechanisms, which is mostly lacking in this paper. The comparison with experiments is predominantly qualitative.

      We agree with the referee that the manuscript will benefit from a better description of the theoretical model and the results in relation with specific molecular and cellular mechanisms. We will further emphasize how a number of experimental observations in the literature support our model assumptions and can be explained by our results. A quantitative comparison is difficult for several reasons. First, many of the parameters in our theory have not been measured, and in fact estimates in the literature often rely on comparison with hydrodynamic models such as ours. Second, the effective physical properties of actomyosin gels can vary wildly between cells, which may explain the diversity of forms, dynamics and functions. For these reasons, we chose to delineate regimes leading to qualitatively different emerging architectures and dynamics. In the revised manuscript, we will make this point clearer and will further study the literature to seek quantitative comparison.

      It is unclear if the theory is suited for in vitro or in vivo actomyosin systems. The justification for various model assumptions, especially concerning their applicability to actomyosin networks, requires a more thorough examination.

      We thank the referee for this comment. Our theory is applicable to actomyosin in living cells. To our knowledge, reconstituted actomyosin gels currently lack the ability to sustain the dynamical steady-states involved in the proposed self-organization mechanism, which balance actin flows with turnover. In addition to actomyosin gels in living cells, in vitro systems based on encapsulated cell extracts can also sustain such dynamical steady states [e.g. https://doi.org/10.1038/s41567-018-0413-4], and therefore our theory may be applicable to these systems as well. Of course, with advancements in the field of reconstituted systems, this may change in the near future. We will explicitly discuss this point in the revised manuscript.

      The classification of different structures demands further justification. For example, the rationale behind categorizing structures as sarcomeric remains unclear when nematic order is perpendicular to the axis of the bands. Sarcomeres traditionally exhibit a specific ordering of actin filaments with alternating polarity patterns.

      We agree and will avoid the term “sarcomeric”.

      Similarly, the criteria for distinguishing between contractile and extensile structures need clarification, as one would expect extensile structures to be under tension contrary to the authors' claim.

      We plan to clarify this point by representing in a main figure the stress profiles across dense nematic structures (currently in Supp Fig 2), along with a more detailed description. In short, depending on the parameter regime, the competition between active and viscous stresses in the actin gel determine whether the emergent structures are extensile or contractile. In our system tension is positive in all directions at all times. However, in “contractile” structures, tension is larger along the bundle, whereas in “extensile” structures, tension is larger perpendicular to the bundle. This is consistent with the common expression for active stress of incompressible nematic systems [see e.g. https://doi.org/10.1038/s41467-018-05666-8], that takes the form –zQ, where z is positive for an extensile system, showing that in this case active tension is negative along the nematic direction. This point, also been raised by another referee, will be clarified and connected to existing literature.

      Additionally, its unclear if the model's predictions for fiber dynamics align with observations in cells, as stress fibers exhibit a high degree of dynamism and tend to coalesce with neighboring fibers during their assembly phase.

      In the present work, we focus on the self-organization of a periodic patch of actomyosin gel. However, in adherent cells boundary conditions play an essential role, e.g. with inflow at the cell edge as a result of polymerization and exclusion at the nucleus. In ongoing work, we are studying with the present model the dynamics of assembly and reconfiguration of dense nematic structures in domains with boundary conditions mimicking in adherent cells, as suggested by the referee. We would like to note, however, that the prominent stress fibers in cells adhered to stiff substrates, so abundantly reported in the literature, are not the only instance of dense nematic actin bundles, and may not be representative of physiologically relevant situations. In the present manuscript, we emphasize the relation of the predicted organizations with those found in different in vivo contexts not related to stress fibers, such as the aligned patterns of bundles in insects (trachea, scales in butterfly wings), in hydra, or in reproductive organs of C elegans; the highly dynamical network of bundles observed in C elegans early embryos; or the labyrinth patters of micro-ridges in the apical surface of epidermal cells in fish. We will further emphasize these points in the revised manuscript.

      Finally, it seems that the microscopic model is unable to recapitulate the density patterns predicted by the continuum theory, raising questions about the suitability of the simulation model.

      We thank the referee for raising this question, which needs further clarification. The goal of the microscopic model is not to reproduce the self-organized patterns predicted by the active gel theory. The microscopic model lacks essential ingredients, notably a realistic description of hydrodynamics and turnover. Our goal with the agent-based simulations is to extract the relation between nematic order and active stresses for a small homogeneous sample of the network. This small domain is meant to represent the homogeneous active gel prior to pattern formation, and it allows us to substantiate key assumptions of the continuum model leading to pattern formation, notably the dependence of isotropic and deviatoric components of the active stress on density and nematic order (Eq. 7) and the active generalized stress promoting ordering.

      We should mention that reproducing the range of out-of-equilibrium mesoscale architectures predicted by our active gel model with agent-based simulations seems at present not possible, or at least significantly beyond the state-of-the-art. We note for instance that parameter regimes in which agent-based simulations of actin gels display extended contractile steady-states are non-generic, as these simulations often lead to irreversible clumping (as do many reconstituted contractile systems), see e.g. https://doi.org/10.1038/ncomms10323 or https://doi.org/10.1371/journal.pcbi.1005277. Very few references report sustained actin flows or the organization of a few bundles (https://doi.org/10.1371/journal.pcbi.1009506). While agent-based cytoskeletal simulations are very attractive because they directly connect with molecular mechanisms, active gel continuum models are better suited to describe out-ofequilibrium emergent hydrodynamics at a mesoscale. We believe that these two complementary modeling frameworks are rather disconnected in the literature, and for this reason, we have attempted substantiate our continuum modeling with discrete simulations. In the revised manuscript, we will better frame the relationship between them.

      Reviewer #2 (Public Review):

      Summary:

      The article by Waleed et al discusses the self organization of actin cytoskeleton using the theory of active nematics. Linear stability analysis of the governing equations and computer simulations show that the system is unstable to density fluctuations and self organized structures can emerge. While the context is interesting, I am not sure whether the physics is new. Hence I have reservations about recommending this article.

      We thank the referee for these comments. In the revised manuscript, we will highlight the novelty of the paper in terms of the theoretical model, the mechanism of patterning of dense nematic structures, the nature and dynamics of the resulting architectures, their relation with the experimental record, and the connection with microscopic models.

      We will emphasize the fact that nematic architectures in the actin cytoskeleton are characterized by a co-localization of order and density (and strong variations in each of these fields), that recent work shows that isotropic and nematic organizations coexist and are part of a single heterogeneous network, that the emergence and maintenance of nematic order requires active contraction, and that the assembly and maintenance of dense nematic bundles involves convergent flows. None of these key features can be described by the common incompressible models of active nematics. To address this, we develop here a compressible and density dependent model for an active nematic gel. We will carefully justify that the proposed model is meaningful for actomyosin gels, and we will highlight the commonalities and differences with previous models of active nematics.

      Strengths:

      (i) Analytical calculations complemented with simulations (ii) Theory for cytoskeletal network

      Weaknesses:

      Not placed in the context or literature on active nematics.

      We agree with the referee that the manuscript requires a better contextualization of the work in relation with the very active field of active nematics. In the revised manuscript, we will clearly describe the relation of our model with existing ones.

      Reviewer #3 (Public Review):

      The manuscript "Theory of active self-organization of dense nematic structures in the actin cytoskeleton" analysis self-organized pattern formation within a two-dimensional nematic liquid crystal theory and uses microscopic simulations to test the plausibility of some of the conclusions drawn from that analysis. After performing an analytic linear stability analysis that indicates the possibility of patterning instabilities, the authors perform fully non-linear numerical simulations and identify the emergence of stripelike patterning when anisotropic active stresses are present. Following a range of qualitative numerical observations on how parameter changes affect these patterns, the authors identify, besides isotropic and nematic stress, also active self-alignment as an important ingredient to form the observed patterns. Finally, microscopic simulations are used to test the plausibility of some of the conclusions drawn from continuum simulations.

      The paper is well written, figures are mostly clear and the theoretical analysis presented in both, main text and supplement, is rigorous. Mechano-chemical coupling has emerged in recent years as a crucial element of cell cortex and tissue organization and it is plausible to think that both, isotropic and anisotropic active stresses, are present within such effectively compressible structures. Even though not yet stated this way by the authors, I would argue that combining these two is of the key ingredients that distinguishes this theoretical paper from similar ones. The diversity of patterning processes experimentally observed is nicely elaborated on in the introduction of the paper, though other closely related previous work could also have been included in these references (see below for examples).

      We thank the referee for these comments and for the suggestion to emphasize the interplay of isotropic and anisotropic active tension, which is possible only in a compressible gel. We thank the suggestions of the referee to better connect with existing literature.

      To introduce the continuum model, the authors exclusively cite their own, unpublished pre-print, even though the final equations take the same form as previously derived and used by other groups working in the field of active hydrodynamics (a certainly incomplete list: Marenduzzo et al (PRL, 2007), Salbreux et al (PRL, 2009, cited elsewhere in the paper), Jülicher et al (Rep Prog Phys, 2018), Giomi (PRX, 2015),...). To make better contact with the broad active liquid crystal community and to delineate the present work more compellingly from existing results, it would be helpful to include a more comprehensive discussion of the background of the existing theoretical understanding on active nematics. In fact, I found it often agrees nicely with the observations made in the present work, an opportunity to consolidate the results that is sometimes currently missed out on. For example, it is known that self-organised active isotropic fluids form in 2D hexagonal and pulsatory patterns (Kumar et al, PRL, 2014), as well as contractile patches (Mietke et al, PRL 2019), just as shown and discussed in Fig. 2. It is also known that extensile nematics, \kappa<0 here, draw in material laterally of the nematic axis and expel it along the nematic axis (the other way around for \kappa>0, see e.g. Doostmohammadi et al, Nat Comm, 2018 "Active Nematics" for a review that makes this point), consistent with all relative nematic director/flow orientations shown in Figs. 2 and 3 of the present work.

      We thank the referee for these suggestions. Indeed, in the original submission we had outsourced much of the justification of the model and the relevant literature to a related pre-print, but this is not reasonable. In the revised manuscript, we will discuss our model in the context of the state-of-the-art, emphasizing connections with existing results.

      The results of numerical simulations are well-presented. Large parts of the discussion of numerical observations - specifically around Fig. 3 - are qualitative and it is not clear why the analysis is restricted to \kappa<0. Some of the observations resonate with recent discussions in the field, for example the observation of effectively extensile dynamics in a contractile system is interesting and reminiscent of ambiguities about extensile/contractile properties discussed in recent preprints (https://arxiv.org/abs/2309.04224). It is convincingly concluded that, besides nematic stress on top of isotropic one, active self-alignment is a key ingredient to produce the observed patterns.

      We thank the referee for these comments. We will expand the description of the results around Figure 3. We are reluctant to extend the detailed analysis of emergent architectures and dynamics to the case \kappa > 0 as it leads to architectures not observed, to our knowledge, in actin networks. We will expand the characterization of emergent contractile/extensile networks by describing the distribution of the different components of the stress tensor across the bundles and will place our results in the context of related recent work.

      I compliment the authors for trying to gain further mechanistic insights into this conclusion with microscopic filament simulations that are diligently performed. It is rightfully stated that these simulations only provide plausibility tests and, within this scope, I would say the authors are successful. At the same time, it leaves open questions that could have been discussed more carefully. For example, I wonder what can be said about the regime \kappa>0 (which is dropped ad-hoc from Fig. 3 onward) microscopically, in which the continuum theory does also predict the formation of stripe patterns - besides the short comment at the very end? How does the spatial inhomogeneous organization the continuum theory predicts fit in the presented, microscopic picture and vice versa?

      We thank the referee for this compliment. We think that the point raised by the referee is very interesting. It is reasonable to expect that the sign of \kappa will not be a constant but rather depend on S and \rho. Indeed, for a sparse network with low order, the progressive bundling by crosslinkers acting on nearby filaments is likely to produce a large active stress perpendicular to the nematic direction, whereas in a dense and highly ordered region, myosin motors are more likely to effectively contract along the nematic direction whereas there is little room for additional lateral contraction by additional bundling. In the revised manuscript, we envision to further deepen in this issue in two ways. First, we plan to perform additional agent-based simulations in a regime leading to kappa > 0. Second, we will modify the active gel model such that kappa < 0 for low density/order, so that a fibrillar pattern is assembled, and kappa > 0 for high density/order, so that the emergent fibers are highly contractile.

      Overall, the paper represents a valuable contribution to the field of active matter and, if strengthened further, might provide a fruitful basis to develop new hypothesis about the dynamic self-organisation of dense filamentous bundles in biological systems.

    1. Author Response

      We would like to thank the editorial board and the reviewers for their assessment of our manuscript and their constructive feedback that we believe will make our manuscript stronger and clearer. Please find below our provisional response to the public reviews; these responses outline our plan to address the concerns of the reviewers for a planned resubmission. Our responses are written in red.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Misic et al showed that white matter properties can be used to classify subacute back pain patients that will develop persisting pain.

      Strengths:

      Compared to most previous papers studying associations between white matter properties and chronic pain, the strength of the method is to perform a prediction in unseen data. Another strength of the paper is the use of three different cohorts. This is an interesting paper that provides a valuable contribution to the field.

      We thank the reviewer for emphasizing the strength of our paper and the importance of validation on multiple unseen cohorts.

      Weaknesses:

      The authors imply that their biomarker could outperform traditional questionnaires to predict pain: "While these models are of great value showing that few of these variables (e.g. work factors) might have significant prognostic power on the long-term outcome of back pain and provide easy-to-use brief questionnaires-based tools, (21, 25) parameters often explain no more than 30% of the variance (28-30) and their prognostic accuracy is limited.(31)". I don't think this is correct; questionnaire-based tools can achieve far greater prediction than their model in about half a million individuals from the UK Biobank (Tanguay-Sabourin et al., A prognostic risk score for the development and spread of chronic pain, Nature Medicine 2023).

      We agree with the reviewer that we might have under-estimated the prognostic accuracy of questionnaire-based tools, especially, the strong predictive accuracy shown by Tangay-Sabourin 2023. In the revised version, we will change both the introduction and the discussion to reflect the the questionnaires based prognostic accuracy reported in the seminal work by TangaySabourin. We do note here, however, that the latter paper while very novel is unique in showing the power of questionnaires. In addition, the questionnaires we have tested in our cohort did not show any baseline differences suggestive of prognostic accuracy.

      Moreover, the main weakness of this study is the sample size. It remains small despite having 3 cohorts. This is problematic because results are often overfitted in such a small sample size brain imaging study, especially when all the data are available to the authors at the time of training the model (Poldrack et al., Scanning the horizon: towards transparent and reproducible neuroimaging research, Nature Reviews in Neuroscience 2017). Thus, having access to all the data, the authors have a high degree of flexibility in data analysis, as they can retrain their model any number of times until it generalizes across all three cohorts. In this case, the testing set could easily become part of the training making it difficult to assess the real performance, especially for small sample size studies.

      The reviewer raises a very important point of limited sample size and of the methodology intrinsic of model development and testing. We acknowledge the small sample size in the “Limitations” section of the discussion. In the resubmission, we will acknowledge the degree of flexibility that is afforded by having access to all the data at once. However, we will also note that our SLF-FA based model is a simple cut-off approach that does not include any learning or hidden layers and that the data obtained from Open Pain were never part of the “training” set at any point at either the New Haven or the Mannheim site. Regarding our SVC approach we follow standard procedures for machine learning where we never mix the training and testing sets. The models are trained on the training data with parameters selected based on crossvalidation within the training data. Therefore, no models have ever seen the test data set. The model performances we reported reflect the prognostic accuracy of our model. Finally, as discussed by Spisak et al., 1 the key determinant of the required sample size in predictive modeling is the ” true effect size of the brain-phenotype relationship” which we think is the determinant of the replication we observe in this study. As such the effect size in the New Haven and Mannheim data is Cohen’s d >1.

      Even if the performance was properly assessed, their models show AUCs between 0.65-0.70, which is usually considered as poor, and most likely without potential clinical use. Despite this, their conclusion was: "This biomarker is easy to obtain (~10 min 18 of scanning time) and opens the door for translation into clinical practice." One may ask who is really willing to use an MRI signature with a relatively poor performance that can be outperformed by self-report questionnaires?

      The reviewer is correct, the model performance is poor to fair which limits its usefulness for clinical translation. We wanted to emphasize that obtaining diffusion images can be done in a short period of time and, hence, as such models predictive accuracy improves, clinical translation becomes closer to reality. In addition, our findings are based on old diffusion data and limited sample size coming from different sites and different acquisition sequences. This by itself would limit the accuracy especially that evidence shows that sample size affect also model performance (i.e. testing AUC)1. In the revision, we will re-word the sentence mentioned by the reviewer to reflect the points discussed here. This also motivates us to collect a more homogeneous and larger sample.

      Overall, these criticisms are more about the wording sometimes used and the inference they made. I think the strength of the evidence is incomplete to support the main claims of the paper.

      Despite these limitations, I still think this is a very relevant contribution to the field. Showing predictive performance through cross-validation and testing in multiple cohorts is not an easy task and this is a strong effort by the team. I strongly believe this approach is the right one and I believe the authors did a good job.

      We thank the reviewer for acknowledging that our effort and approach were the right ones.

      Minor points:

      Methods:

      I get the voxel-wise analysis, but I don't understand the methods for the structural connectivity analysis between the 88 ROIs. Have the authors run tractography or have they used a predetermined streamlined form of 'population-based connectome'? They report that models of AUC above 0.75 were considered and tested in the Chicago dataset, but we have no information about what the model actually learned (although this can be tricky for decision tree algorithms).

      We apologize for the lack of clarity; we did run tractography and we did not use a predetermined streamlined form of the connectome. We will clarify this point in the methods section.

      Finding which connections are important for the classification of SBPr and SBPp is difficult because of our choices during data preprocessing and SVC model development: (1) preprocessing steps which included TNPCA for dimensionality reduction, and regressing out the confounders (i.e., age, sex, and head motion); (2) the harmonization for effects of sites; and (3) the Support Vector Classifier which is a hard classification model2. Such models cannot tell us the features that are important in classifying the groups. Our model is considered a black-box predictive model like neural networks.

      Minor:

      What results are shown in Figure 7? It looks more descriptive than the actual results.

      The reviewer is correct; Figure 7 and supplementary Figure 4 are both qualitatively illustrating the shape of the SLF.

      Reviewer #2 (Public Review):

      The present study aims to investigate brain white matter predictors of back pain chronicity. To this end, a discovery cohort of 28 patients with subacute back pain (SBP) was studied using white matter diffusion imaging. The cohort was investigated at baseline and one-year follow-up when 16 patients had recovered (SBPr) and 12 had persistent back pain (SBPp). A comparison of baseline scans revealed that SBPr patients had higher fractional anisotropy values in the right superior longitudinal fasciculus SLF) than SBPp patients and that FA values predicted changes in pain severity. Moreover, the FA values of SBPr patients were larger than those of healthy participants, suggesting a role of FA of the SLF in resilience to chronic pain. These findings were replicated in two other independent datasets. The authors conclude that the right SLF might be a robust predictive biomarker of CBP development with the potential for clinical translation.

      Developing predictive biomarkers for pain chronicity is an interesting, timely, and potentially clinically relevant topic. The paradigm and the analysis are sound, the results are convincing, and the interpretation is adequate. A particular strength of the study is the discovery-replication approach with replications of the findings in two independent datasets.

      We thank reviewer 2 for pointing to the strength of our study.

      The following revisions might help to improve the manuscript further.

      Definition of recovery. In the New Haven and Chicago datasets, SBPr and SBPp patients are distinguished by reductions of >30% in pain intensity. In contrast, in the Mannheim dataset, both groups are distinguished by reductions of >20%. This should be harmonized. Moreover, as there is no established definition of recovery (reference 79 does not provide a clear criterion), it would be interesting to know whether the results hold for different definitions of recovery. Control analyses for different thresholds could strengthen the robustness of the findings.

      The reviewer raises an important point regarding the definition of recovery. To address the reviewers concern we will add a supplementary figure showing the results in the Mannheim data set if a 30% reduction is used as a recovery criterion. We would like to emphasize here several points that support the use of different recovery thresholds between New Haven and Mannheim. The New Haven primary pain ratings relied on visual analogue scale (VAS) while the Mannheim data relied on the German version of the West-Haven-Yale Multidimensional Pain Inventory. In addition, the Mannheim data was pre-registered with a definition of recovery at 20% and is part of a larger sub-acute to chronic pain study with prior publications from this cohort using the 20% cut-off3. Finally, a more recent consensus publication4 from IMMPACT indicates that a change of at least 30% is needed for a moderate improvement in pain on the 0-10 Numerical Rating Scale but that this percentage depends on baseline pain levels.

      Analysis of the Chicago dataset. The manuscript includes results on FA values and their association with pain severity for the New Haven and Mannheim datasets but not for the Chicago dataset. It would be straightforward to show figures like Figures 1 - 4 for the Chicago dataset, as well.

      We welcome the reviewer’s suggestion; we will therefore add these analyses to the results section of our manuscript upon resubmission

      Data sharing. The discovery-replication approach of the present study distinguishes the present from previous approaches. This approach enhances the belief in the robustness of the findings. This belief would be further enhanced by making the data openly available. It would be extremely valuable for the community if other researchers could reproduce and replicate the findings without restrictions. It is not clear why the fact that the studies are ongoing prevents the unrestricted sharing of the data used in the present study.

      Reviewer #3 (Public Review):

      Summary:

      Authors suggest a new biomarker of chronic back pain with the option to predict the result of treatment. The authors found a significant difference in a fractional anisotropy measure in superior longitudinal fasciculus for recovered patients with chronic back pain.

      Strengths:

      The results were reproduced in three different groups at different studies/sites.

      Weaknesses:

      The number of participants is still low.

      We have discussed this point in our replies to reviewer number 1.

      An explanation of microstructure changes was not given.

      The reviewer points to an important gap in our discussion. While we cannot do a direct study of actual tissue micro-structure, we will explore further the changes observed in the SLF by calculating diffusivity measures and discuss possible explanations of these changes.

      Some technical drawbacks are presented.

      We are uncertain if the reviewer is suggesting that we have acknowledged certain technical drawbacks and expects further elaboration on our part. We kindly request that the reviewer specify what particular issues they would like us to address so that we can respond appropriately.

      (1) Spisak T, Bingel U, Wager TD. Multivariate BWAS can be replicable with moderate sample sizes. Nature 2023;615:E4-E7.

      (2) Liu Y, Zhang HH, Wu Y. Hard or Soft Classification? Large-margin Unified Machines. J Am Stat Assoc 2011;106:166-177.

      (3) Loffler M, Levine SM, Usai K, et al. Corticostriatal circuits in the transition to chronic back pain: The predictive role of reward learning. Cell Rep Med 2022;3:100677.

      (4) Smith SM, Dworkin RH, Turk DC, et al. Interpretation of chronic pain clinical trial outcomes: IMMPACT recommended considerations. Pain 2020;161:2446-2461.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses:

      One important question is needed to further clarify the mechanisms of aberrant Ca2+ microwaves as described below.

      Synapsin promoter labels both excitatory pyramidal neurons and inhibitory neurons. To avoid aberrant Ca2+ microwave, a combination of Flex virus and CaMKII-Cre or Thy-1-GCaMP6s and 6f mice were tested. However, all these approaches limit the number of infected pyramidal neurons. While the comprehensive display of these results is appreciated, a crucial question remains unanswered. To distinguish whether the microwave of Ca2+ is caused selectively via the abnormality of interneurons, or just a matter of pyramidal neuron density, testing Flex-GCaMP6 in interneuron specific mouse lines such as PV-Cre and SOM-Cre will be critical.

      We agree that unravelling the role of interneurons is important to the understanding of the cellular mechanisms. However, the primary goal of this preprint was to alert the field and those embarking on in vivo Ca2+ imaging to AAV transduction induced artefacts mediated by one of the most widely used viral constructs for Ca2+ imaging in the field. It was important to us to distribute this finding among the community in a timely manner to avoid the unnecessary waste of resources.

      We consider a thorough understanding of cell-type specific mechanisms interesting. However, the biological relevance of the Ca2+ waves is as yet unclear and to disentangle exactly which cellular and subcellular factors that drive the aberrant phenomenon will require a large systematic effort which goes beyond our resources. For instance, it will be technically not trivial to separate biologically relevant contributions from technical differences. For instance, the absence of Ca2+ waves under the principal neuron promotor CaMKII may suggest the involvement of interneurons. However, alternate possibilities are a reduced density of expression across principal neurons or that the expression levels between the 2 promoters is different.

      The important, take-home message of the preprint, in our opinion, is that users check carefully their viral protocols, adjust the protocols for their specific scientific question and report any issues. We now emphasise the fact that although Ca2+ waves were not observed following conditional expression of syn.GCaMP with CaMKII.cre, this may not be due to a requirement for interneuronal expression but simply reflect differences in final GCaMP expression density and levels between the two transduction procedures (P12, L298-303).

      Reviewer #2 (Public Review):

      Weaknesses:

      Whether micro-waves are associated with the age of mice was not quantified. This would be good to know and the authors do have this data.

      We plotted the animal age at the time of injection for all injections of Syn.GCaMP6 into CA1/CA3 and found no correlation in either the occurrence of Ca2+ waves nor the frequency of Ca2+ waves during the age period between 5 – 79 wks (see reviewer Fig1; linear regression fit to the Ca2+ wave frequency against age was not significant: intercept = 1.37, slope = -0.007, p=0.62, n = 14; and generalized linear model relating Ca2+ wave ~ age was not significant: z score = 0.19, deviance above null = 0.04, p = 0.85, n=24). We have now added a statement to this in the revised manuscript (P14 L354-359) and for the reviewers we have added the plots below.

      Author response image 1.

      Plot of Ca2+ micro-wave frequency (left: number of Ca2+ waves/min) or occurrence (right: yes/no) against the animal age at the time of viral injection. Blue line is linear (left) or logistic (right) fit to the data with 95% confidence level.

      The effect of micro-waves on single cell function was not analyzed. It would be useful, for example, if we knew the influence of micro-waves on place fields. Can a place cell still express a place field in a hippocampus that produces micro-waves? What effect might a microwave passing over a cell have on its place field? Mice were not trained in these experiments, so the authors do not have the data.

      We agree that these are interesting questions; however, the preprint is focused on describing the GECI expression conditions prone to generating these artefacts. Studying the effects of Ca2+ micro-waves on the circuitry are scientific questions, and would require an experimental framework of testing the aberrant activity on a specific physiological function e.g. place activity or specific oscillations (e.g. sharp-wave activity). Ca2+ microwaves, as the ones described here, have not been reported under physiological conditions or pathophysiological conditions and studying the effects of such artefactual waves on the circuit was not our intention.

      With respect to place cell activity, specifically, it is intuitive that during the Ca2+ micro-wave the participating cell’s place field activity would be obscured by the artefactual activity. Cell activity appears to return immediately following the wave suggesting that the cells could exhibit place activity outside their participation in the Ca2+ micro-waves. However, we do not know if the Ca2+ micro-wave activity disrupts the generation or maintenance of place fields. We have now added a brief reference to possible effects on place coding to the paper (P12, L315-317).

      The CaMKII-Cre approach for flexed-syn-GCaMP expression shows no micro-waves and is convincing, but it is only from 2 animals, even though both had no micro-waves. In light of the reviewer’s comment, we have added a further 3 animals with conditional expression of GCaMP6m from the DZNE to complement the current dataset with conditional expression of GCaMP6s from UoB (P10, L236 & 239 and revised table 1). Although Ca2+ waves were not observed in any of the in total 5 animals, we still do not know with all certainty whether this approach is completely safe. Time will show if researchers still encounter the phenotype under certain conditions when using this conditional approach.

      The authors state in their Discussion that even without observable microwaves, a syn-Ca2+-indicator transduction strategy could still be problematic. This may be true, but they do not check this in their analysis, so it remains unknown

      We agree with the reviewer and have now made this point clearer in the revised discussion (P11, L257-258)

      Reviewer #3 (Public Review):

      Weaknesses:

      I believe that the weaknesses of the manuscript are appropriately highlighted by the authors themselves in the discussion. I would, however, like to emphasize several additional points.

      As the authors state, the exact conditions that lead to Ca2+ micro-waves are unclear from this manuscript. It is also unclear if Ca2+ micro-waves are specific to GECI expression or if high-titer viral transduction of other proteins such as genetically encoded voltage indicators, static fluorescent proteins, recombinases, etc could also cause Ca2+ micro-waves.

      The high expression of other proteins has been shown to result in artefactual phenomenon such as toxicity or fluorescent puncta (for GFP see Hechler et al. 2006; Katayama et al. 2008 for GEVI see Rühl et al. 2021), but we are not aware of reports of micro-waves. Although it is certainly possible that high expression levels of other proteins could lead to waves, we suspect the Ca2+ micro-waves observed in this preprint result from a dysregulation of Ca2+ homeostasis. This is not to suggest that voltage indicators could not result in micro-waves (e.g. Ca2+ homeostasis may be indirectly affected).

      The authors almost exclusively tested high titer (>5x10^12 vg/mL) large volume (500-1000 nL) injections using the synapsin promoter and AAV1 serotypes. It is possible that Ca2+ micro-waves are dramatically less frequent when titers are lowered further but still kept high enough to be useful for in vivo imaging (e.g. 1x10^12 vg/mL) or smaller injection volumes are used. It is also possible that Ca2+ micro-waves occur with high titer injections using other viral promoter sequences such as EF1α or CaMKIIα. There may additionally be effects of viral serotype on micro-wave occurrence.

      We agree with all points raised by the reviewer. Notably, we used viral transduction protocols with titers and volumes within in the range of those previously used for viral transduction of GCaMP under the synapsin promoter (see P11 L269-275) and we observed Ca2+ micro-waves. As the reviewer suggested, we did find that lowering the titer is an important factor in reducing these Ca2+ micro-waves and there is likely a wide range of approaches that avoid the phenomenon. With regards to viral serotype, we show that micro-waves occurred across AAV1 and 9, but it is possible that other serotypes may avoid the phenomenon.

      We reiterate in the abstract of the revised manuscript that expression level is a crucial factor (P2, L40 and P2, L44-45) and now mention that other promoters and induction protocols that result in high Ca2+ indicator expression may result in Ca2+ micro-waves (P12, L291-294.

      The number of animals in any particular condition are fairly low (Table 1) with the exception of V1 imaging and thy1-GCaMP6 imaging. This prohibits rigorous comparison of the frequency of pathological calcium activity across conditions.

      We have now added 3 more animals with conditional GCaMP6 expression. In total, the study contains 34 animals with viral injection into the hippocampus from different laboratories and under different conditions resulting in multiple groups. As such we are cognizant of the resulting limitations for statistical evaluation.

      However, in light of the reviewer’s comment, we have now employed a generalized linear model tested on all the data to examine the relationship between the Ca2+ micro-wave incidence and the different factors. The multivariate GLM did find a significant relationship between Ca2+ micro-wave incidence and both viral dilution and weeks post injection (see below and revised manuscript P8, L189-193).

      For injections into CA1 in the hippocampus (n=28), a GLM found no relationship between Ca2+ micro-waves and each of the individual variables x (Ca-wave ~ x) ; viral dilution: z score = 1.14, deviance above null = 1.31, p = 0.254; post injection weeks: : z score = 1.18, deviance above null = 1.44, p = 0.239; injection volume: : z score = -0.76, deviance above null = 0.59, p = 0.45; construct: : z score = 1.18, difference in deviance above null = 1.44, p = 0.239)

      However, a multivariable logistic GLM relating dilution and post injection weeks (Ca-wave ~ dilution + p.i_wks) showed that together both variables were significantly related to Ca2+ micro-waves (Deviation above null = 7.5; Dilution: z score = 2.18, p < 0.05; p.i_wks : z score = 2.22, p < 0.05).

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Results are straightforward and convincing. While a couple of ways to reduce the aberrant microwaves of calcium responses were demonstrated, delving into the functions of interneurons is crucial for a more comprehensive understanding of cellular causality.

      As mentioned in the public response, disentangling cellular mechanism from technical requirements will need a large and systematic study. To determine the contribution from interneurons, the use of specific interneuron promoters would be required, and viral titers systematically varied to result in similar cellular GCaMP expression levels as seen under the synapsin promoter condition.

      Reviewer #2 (Recommendations For The Authors):

      Do the authors think the cells are firing when they participate in a micro-wave, or do they think the calcium influx is due to something else? A discussion point on this would be good.

      This is an excellent point raised by the reviewer. We do not know if the elevated cellular Ca2+ during the artifactual Ca2+ micro-wave reflects action potential firing or an increase of Ca2+ from intracellular stores. As already described in the text of the preprint, their optical spatiotemporal profile neither fits with known microseizure progression patterns, nor with spreading depolarization/depression. We have adopted the reviewer’s suggestion and added the following point to the discussion section in the revised preprint (P12, L308-315):

      In a limited dataset, we attempted to detect the Ca2+ micro-waves by hippocampal LFP recordings (using a conventional insulated Tungsten wire, diameter ~110µm). We could not identify a specific signature, e.g. ictal activity or LFP depression, which may correspond to these Ca2+ micro-waves. The crucial shortcoming of this experiment of course is that with these LFP recordings, we could not simultaneous perform hippocampal 2-photon microscopy. Thus, it is uncertain if the Ca2+ micro-waves indeed occurred in proximity to our electrode.

      The results seem to suggest that micro-waves may involve interneurons as their CaMKII-Cre strategy avoids waves - possibly due to a lack of expression of GECIs in interneurons. It would be great to hear the author's thoughts on this and add a brief discussion point.

      As mentioned in public response to Reviewer 1, it is difficult to disentangle cellular mechanisms from technical requirements, and the exact requirements for the Ca2+ micro-waves to occur are still not fully clear. The absence of Ca2+ micro-waves in our CaMKII-Cre dataset may indeed reflect the requirement of interneurons. However, it could just as well be due to a sparse labelling of principle cells or simply reflect differences in the expression levels of GCaMP under the different promotors.

      All in all, a more complete understanding of the requirements of such Ca2+ micro-waves will require a community effort. Therefore, it is important that each group check the safety profile of their GECI and report problems to the community.

      We have added these points to the revised preprint (P12, L291 and P12, L298)

      Plotting the incidence of micro-waves as a function of the age of mice would be a nice addition (the authors have the data).

      There was no relationship of Ca2+ micro-wave occurrence or frequency with age over the range of 5-79 wks (see public response) and this has been added to the preprint (P14, L354)

      Reviewer #3 (Recommendations For The Authors):

      I appreciate the authors raising the awareness of this issue. I had personally observed micro-waves in my own data as well. In agreement with their findings, I found that the occurrence of micro-waves was dramatically lower when I reduced the viral titer. Anecdotally, I also observed voltage micro-waves when virally transducing genetically encoded voltage indicators at similar titers. For that reason, I am skeptical that this issue is exclusive to GECIs.

      We find it interesting that the reviewer has also seen artefactual micro-waves following viral transduction of genetically encoded voltage indicators. Without seeing the voltage waves the referee is referring to or the conditions, it is of course difficult to compare with the Ca2+ micro-waves we report. However, this comment again raises the question of mechanism. We believe that in the GECI framework, Ca2+ homeostatic aspects are important. Voltage indicators are based on different sensor mechanisms, and expressed in the cell membrane, but it may very well be that there are overlapping factors between Ca2+ and voltage indicators that could trigger a similar, or even the same phenomenon in the end.

      Minor comments:

      (1) Line 131-132: I believe the authors only tested for micro-waves in V1. This should be made clear in the results. It could be that micro-waves could occur in other parts of cortex with the same viral titers.

      Both V1 and somatosensory cortex were tested as described in the methods (P15, L395-397), we have made this clearer in the revised preprint (P6, L138).

      (2) There are no statistics associated with the data from Fig 1e.

      We have now added statistics (P5, L126).

      (3) The authors may be able to make a stronger claim about the pathological nature of the micro-waves if there are differences in the histology between the injected and non-injected hemispheres. For example, is there evidence of widespread cell death in the injected hemisphere (e.g. lower cell count, smaller hippocampal volume, caspase staining, etc).

      We found no evidence of gross morphological changes to the hippocampus following viral transduction with no changes in CA1 pyramidal cell layer thickness or CA1 thickness (pyramidal cell layer thickness: 49 ± 12.5 µm ipsilateral and 50.3 ± 11.1 µm contralateral, n=4, Student’s t-test p=0.89; CA1 thickness: 553.3 ± 14 µm ipsilateral and 555.8 ± 62 µm contralateral, n = 4, Student’s t-test p=0.94; 48 ± 13 weeks post injection at time of perfusion).

      We have added this to the preprint (P5, L117-122)

      (4) The broader micro-waves in the stratum oriens versus the stratum pyramidale are likely due to the spread of the basal dendrites of pyramidal cells. If the typical size of the basal dendritic arbor of CA1 pyramidal neurons is taken into account, does this explain the wider calcium waves in this layer.

      Absolutely, great point, yes, we completely agree on this. It is likely the active neuropil (including dendritic arbour) are contributing to the apparent broader diameter. In addition, as evident in the video 5 cell somata in the stratum Oriens (possibly interneurons) are active and their processes also contribute.

      We have now mentioned these points in the revised preprint (P5, L132)

      (5) Lines 179-181: Is the difference in the prevalence of micro-waves between viral titers statistically significant?

      Although we have a large number of animals in total (n=34) with viral injection into the hippocampus, the number of animals in each condition, given the many factors, is low. We therefore used a generalized linear model to test the relationship between the Ca2+ micro-waves and the variables.

      We have now added this analysis to the revised preprint (P8, L189-193)

      (6) Lines 200-203: The CA3 micro-waves were only observed at one institution. The current wording is slightly misleading.

      We agree and have changed this to be clearer (P9 L216)

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) Methods, please state the sex of the mice.

      This has now been added to the methods section:

      “Three to nine month old Thy1-GCaMP6S mice (Strain GP4.3, Jax Labs), N=16 stroke (average age: 5.4 months; 13 male, 3 female), and 5 sham (average age: 6 months; 3 male, 2 female), were used in this study.”

      (2) The analysis in Fig 3B-D, 4B-C, and 6A, B highlights the loss of limb function, firing rate, or connections at 1 week but this phenomenon is clearly persisting longer in some datasets (Fig. 3 and 6). Was there not a statistical difference at weeks 2,3,4,8 relative to "Pre-stroke" or were comparisons only made to equivalent time points in the sham group? Personally, I think it is useful to compare to "pre-stroke" which should be more reflective of that sample of animals than comparing to a different set of animals in the Sham group. A 1 sample t-test could be used in Fig 4 and 6 normalized data.

      On further analysis of our datasets, normalization throughout the manuscript was unnecessary for proper depiction of results, and all normalized datasets have been replaced with nonnormalized datasets. All within group statistics are now indicated within the manuscript.

      (3) Fig 4A shows a very striking change in activity that doesn't seem to be borne out with group comparisons. Since many neurons are quiet or show very little activity, did the authors ever consider subgrouping their analysis based on cells that show high activity levels (top 20 or 30% of cells) vs those that are inactive most of the time? Recent research has shown that the effects of stroke can have a disproportionate impact on these highly active cells versus the minimally active ones.

      A qualitative analysis supports a loss of cells with high activity at the 1-week post-stroke timepoint, and examination of average firing rates at 1-week shows reductions in the animals with the highest average rates. However, we have not tracked responses within individual neurons or quantitatively analyzed the data by subdividing cells into groups based on their prestroke activity levels. We have amended the discussion of the manuscript with the following to highlight the previous data as it relates to our study:

      “Recent research also indicates that stroke causes distinct patterns of disruption to the network topology of excitatory and inhibitory cells [73], and that stroke can disproportionately disrupt the function of high activity compared to low activity neurons in specific neuron sub-types [61]. Mouse models with genetically labelled neuronal sub-types (including different classes of inhibitory interneurons) could be used to track the function of those populations over time in awake animals.”

      (4) Fig 4 shows normalized firing rates when moving and at rest but it would be interesting to know what the true difference in activity was in these 2 states. My assumption is that stroke reduces movement therefore one normalizes the data. The authors could consider putting non-normalized data in a Supp figure, or at least provide a rationale for not showing this, such as stating that movement output was significantly suppressed, hence the need for normalization.

      On further analysis of our datasets, normalization throughout the manuscript was unnecessary for proper depiction of results, and all normalized datasets have been replaced with nonnormalized datasets.

      (5) One thought for the discussion. The fact that the authors did not find any changes in "distant" cortex may be specific to the region they chose to sample (caudal FL cortex). It is possible that examining different "distant" regions could yield a different outcome. For example, one could argue that there may have been no reason for this area to "change" since it was responsive to FL stimuli before stroke. Further, since it was posterior to the stroke, thalamocortical projects should have been minimally disturbed.

      We would like to thank the reviewer for this comment. We have amended the discussion with the following:

      “Our results suggest a limited spatial distance over which the peri-infarct somatosensory cortex displays significant network functional deficits during movement and rest. Our results are consistent with a spatial gradient of plasticity mediating factors that are generally enhanced with closer proximity to the infarct core [84,88,90,91]. However, our analysis outside peri-infarct cortex is limited to a single distal area caudal to the pre-stroke cFL representation. Although somatosensory maps in the present study were defined by a statistical criterion for delineating highly responsive cortical regions from those with weak responses, the distal area in this study may have been a site of activity that did not meet the statistical criterion for inclusion in the baseline map. The lack of detectable changes in population correlations, functional connectivity, assembly architecture and assembly activations in the distal region may reflect minimal pressure for plastic change as networks in regions below the threshold for regional map inclusion prior to stroke may still be functional in the distal cortex. Thus, threshold-based assessment of remapping may further overestimate the neuroplasticity underlying functional reorganization suggested by anaesthetized preparations with strong stimulation. Future studies could examine distal areas medial and anterior to the cFL somatosensory area, such as the motor and pre-motor cortex, to further define the effect of FL targeted stroke on neuroplasticity within other functionally relevant regions. Moreover, the restriction of these network changes to peri-infarct cortex could also reflect the small penumbra associated with photothrombotic stroke, and future studies could make use of stroke models with larger penumbral regions, such as the middle cerebral artery occlusion model. Larger injuries induce more sustained sensorimotor impairment, and the relationship between neuronal firing, connectivity, and neuronal assemblies could be further probed relative to recovery or sustained impairment in these models.”

      Minor comments:

      Line 129, I don't necessarily think the infarct shows "hyper-fluorescence", it just absorbs less white light (or reflects more light) than blood-rich neighbouring regions.

      Sentence in the manuscript has been changed to:

      “Resulting infarcts lesioned this region, and borders could be defined by a region of decreased light absorption 1 week post-stroke (Fig 1D, Top).”

      Line 130-132: the authors refer to Fig 1D to show cellular changes but these cannot be seen from the images presented. Perhaps a supplementary zoomed-in image would be helpful.

      As changes to the morphology of neurons are not one of the primary objectives of this study, and sampled resolution was not sufficiently high to clearly delineate the processes of neurons necessary for morphological assessment, we have amended the text as follows:

      “Within the peri-infarct imaging region, cellular dysmorphia and swelling was visually apparent in some cells during two photon imaging 1-week after stroke, but recovered over the 2 month poststroke imaging timeframe (data not shown). These gross morphological changes were not visually apparent in the more distal imaging region lateral to the cHL.”

      Lines 541-543, was there a rationale for defining movement as >30mm/s? Based on a statistical estimate of noise?

      Text has been altered as follows:

      “Animal movement within the homecage during each Ca2+ imaging session was tracked to determine animal speed and position. Movement periods were manually annotated on a subset of timeseries by co-recording animal movement using both the Mobile Homecage tracker, as well as a webcam (Logitech C270) with infrared filter removed. Movement tracking data was low pass filtered to remove spurious movement artifacts lasting below 6 recording frames (240ms). Based on annotated times of animal movement from the webcam recordings and Homecage tracking, a threshold of 30mm/s from the tracking data was determined as frames of animal movement, whereas speeds below 30mm/s was taken as periods of rest.”

      Lines 191-195: Note that although the finding of reduced neural activity is in disagreement with a multi-unit recording study, it is consistent with other very recent single-cell Ca++ imaging data after stroke (PMID: 34172735 , 34671051).

      Text has been altered as follows:

      “These results indicate decreased neuronal spiking 1-week after stroke in regions immediately adjacent to the infarct, but not in distal regions, that is strongly related to sensorimotor impairment. This finding runs contrary to a previous report of increased spontaneous multi-unit activity as early as 3-7 days after focal photothrombotic stroke in the peri-infarct cortex [1], but is in agreement with recent single-cell calcium imaging data demonstrating reduced sensoryevoked activity in neurons within the peri-infarct cortex after stroke [60,61].”

      Fig 7. I don't understand what the color code represents. Are these neurons belonging to the same assembly (or membership?).

      That is correct, neurons with identical color code belong to the same assembly. The legend of Fig 7 has been modified as follows to make this more explicit:

      “Fig 7. Color coded neural assembly plots depict altered neural assembly architecture after stroke in the peri-infarct region. (A) Representative cellular Ca2+ fluorescence images with neural assemblies color coded and overlaid for each timepoint. Neurons belonging to the same assembly have been pseudocolored with identical color. A loss in the number of neural assemblies after stroke in the peri-infarct region is visually apparent, along with a concurrent increase in the number of neurons for each remaining assembly. (B) Representative sham animal displays no visible change in the number of assemblies or number of neurons per assembly.”

      Reviewer #2 (Recommendations For The Authors):

      Materials and methods

      Identification of forelimb and hindlimb somatosensory cortex representations [...] Cortical response areas are calculated using a threshold of 95% peak activity within the trial. The threshold is presumably used to discriminate between the sensory-evoked response and collateral activation / less "relevant" response (noise). Since the peak intensity is lower after stroke, the "response" area is larger - lower main signal results in less noise exclusion. Predictably, areas that show a higher response before stroke than after are excluded from the response area before stroke and included after. While it is expected that the remapped areas will exhibit a lower response than the original and considering the absence of neuronal activity, assembly architecture, or functional connectivity in the "remapped" regions, a minimal criterion for remapping should be to exhibit higher activation than before stroke. Please use a different criterion to map the cortical response area after stroke.

      We would like to thank the reviewer for this comment. We agree with the reviewer’s assessment of 95% of peak as an arbitrary criterion of mapped areas. To exclude noise from the analysis of mapped regions, a new statistical criterion of 5X the standard deviation of the baseline period was used to determine the threshold to use to define each response map. These maps were used to determine the peak intensity of the forelimb response. We also measured a separate ROI specifically overlapping the distal region, lateral to the hindlimb map, to determine specific changes to widefield Ca2+ responses within this distal region. We have amended the text as follows and have altered Figure 2 with new data generated from our new criterion for cortical mapping.

      “The trials for each limb were averaged in ImageJ software (NIH). 10 imaging frames (1s) after stimulus onset were averaged and divided by the 10 baseline frames 1s before stimulus onset to generate a response map for each limb. Response maps were thresholded at 5 times the standard deviation of the baseline period deltaFoF to determine limb associated response maps. These were merged and overlaid on an image of surface vasculature to delineate the cFL and cHL somatosensory representations and were also used to determine peak Ca2+ response amplitude from the timeseries recordings. For cFL stimulation trials, an additional ROI was placed over the region lateral to the cHL representation (denoted as “distal region” in Fig 2E) to measure the distal region cFL evoked Ca2+ response amplitude pre- and post-stroke. The dimensions and position of the distal ROI was held consistent relative to surface vasculature for each animal from pre- to post-stroke.”

      Animals

      Mice used have an age that goes from 3 to 9 months. This is a big difference given that literature on healthy aging reports changes in neurovascular coupling starting from 8-9 months old mice. Consider adding age as a covariate in the analysis.

      We do not have sufficient numbers of animals within this study to examine the effect of age on the results observed herein. We have amended the discussion with the following to address this point:

      “A potential limitation of our data is the undefined effect of age and sex on cortical dynamics in this cohort of mice (with ages ranging from 3-9 months) after stroke. Aging can impair neurovascular coupling [102–107] and reduce ischemic tolerance [108–111], and greater investigation of cortical activity changes after stroke in aged animals would more effectively model stroke in humans. Future research could replicate this study with mice in middle-age and aged mice (e.g. 9 months and 18+ months of age), and with sufficient quantities of both sexes, to better examine age and sex effects on measures of cortical function.”

      Statistics

      Please describe the "normalization" that was applied to the firing rate. Since a mixedeffects model was used, why wasn't baseline simply added as a covariate? With this type of data, normalization is useful for visualization purposes.

      On further analysis of our datasets, normalization throughout the manuscript was unnecessary for the visualization of results, and all normalized datasets have been replaced with nonnormalized datasets. All within group comparisons are now indicated throughout the manuscript and in the figures.

      Introduction

      Line 93 awake, freely behaving but head-fixed. That's not freely. Should just say behaving.

      Sentence has been edited as follows:

      “We used awake, behaving but head-fixed mice in a mobile homecage to longitudinally measure cortical activity, then used computational methods to assess functional connectivity and neural assembly architecture at baseline and each week for 2 months following stroke.”

      110 - 112 The last part of this sentence is unjustified because these areas have been incorrectly identified as locations of representational remapping.

      We agree with the reviewer and have amended the manuscript as follows after re-analyzing the dataset on widefield Ca2+ imaging of sensory-evoked responses: “Surprisingly, we also show that significant alterations in neuronal activity (firing rate), functional connectivity, and neural assembly architecture are absent within more distal regions of cortex as little as 750 µm from the stroke border, even in areas identified by regional functional imaging (under anaesthesia) as ‘remapped’ locations of sensory-evoked FL activity 8-weeks post-stroke.”

      Results

      149-152 There is no observed increase in the evoked response area. There is an observed change in the criteria for what is considered a response.

      We agree with the reviewer. Text has been amended as follows:

      “Fig 2A shows representative montages from a stroke animal illustrating the cortical cFL and cHL Ca2+ responses to 1s, 100Hz limb stimulation of the contralateral limbs at the pre-stroke and 8week post-stroke timepoints. The location and magnitude of the cortical responses changes drastically between timepoints, with substantial loss of supra-threshold activity within the prestroke cFL representation located anterior to the cHL map, and an apparent shift of the remapped representation into regions lateral to the cHL representation at 8-weeks post-stroke. A significant decrease in the cFL evoked Ca2+ response amplitude was observed in the stroke group at 8-weeks post-stroke relative to pre-stroke (Fig 2B). This is in agreement with past studies [19–25], and suggests that cFL targeted stroke reduces forelimb evoked activity across the cFL somatosensory cortex in anaesthetized animals even after 2 months of recovery. There was no statistical change in the average size of cFL evoked representation 8-weeks after stroke (Fig 2C), but a significant posterior shift of the supra-threshold cFL map was detected (Fig 2D). Unmasking of previously sub-threshold cFL responsive cortex in areas posterior to the original cFL map at 8-weeks post-stroke could contribute to this apparent remapping. However, the amplitude of the cFL evoked widefield Ca2+ response in this distal region at 8-weeks post-stroke remains reduced relative to pre-stroke activation (Fig 2E). Previous studies suggest strong inhibition of cFL evoked activity during the first weeks after photothrombosis [25]. Without longitudinal measurement in this study to quantify this reduced activation prior to 8-weeks poststroke, we cannot differentiate potential remapping due to unmasking of the cFL representation that enhances the cFL-evoked widefield Ca2+ response from apparent remapping that simply reflects changes in the signal-to-noise ratio used to define the functional representations. There were no group differences between stroke and sham groups in cHL evoked intensity, area, or map position (data not shown).”

      A lot of the nonsignificant results are reported as "statistical trends towards..." While the term "trend" is problematic, it remains common in its use. However, assigning directionality to the trend, as if it is actively approaching a main effect, should be avoided. The results aren't moving towards or away from significance. Consider rewording the way in which these results are reported.

      We have amended the text to remove directionality from our mention of statistical trends.

      R squared and p values for significant results are reported in the "impaired performance on tapered beam..." and "firing rate of neurons in the peri-infarct cortex..." subsections of the results, but not the other sections. Please report the results in a consistent manner.

      R-squared and p-values have been removed from the results section and are now reported in figure captions consistently.

      Discussion

      288 Remapping is defined as "new sensory-evoked spiking". This should be the main criterion for remapping, but it is not operationalized correctly by the threshold method.

      With our new criterion for determining limb maps using a statistical threshold of 5X the standard deviation of baseline fluorescence, we have edited text throughout the manuscript to better emphasize that we may not be measuring new sensory-evoked spiking with the mesoscale mapping that was done. We have edited the discussion as follows:

      “Here, we used longitudinal two photon calcium imaging of awake, head-fixed mice in a mobile homecage to examine how focal photothrombotic stroke to the forelimb sensorimotor cortex alters the activity and connectivity of neurons adjacent and distal to the infarct. Consistent with previous studies using intrinsic optical signal imaging, mesoscale imaging of regional calcium responses (reflecting bulk neuronal spiking in that region) showed that targeted stroke to the cFL somatosensory area disrupts the sensory-evoked forelimb representation in the infarcted region. Consistent with previous studies, this functional representation exhibited a posterior shift 8-weeks after injury, with activation in a region lateral to the cHL representation. Notably, sensory-evoked cFL representations exhibited reduced amplitudes of activity relative to prestroke activation measured in the cFL representation and in the region lateral the cHL representation. Longitudinal two-photon calcium imaging in awake animals was used to probe single neuron and local network changes adjacent the infarct and in a distal region that corresponded to the shifted region of cFL activation. This imaging revealed a decrease in firing rate at 1-week post-stroke in the peri-infarct region that was significantly negatively correlated with the number of errors made with the stroke-affected limbs on the tapered beam task. Periinfarct cortical networks also exhibited a reduction in the number of functional connections per neuron and a sustained disruption in neural assembly structure, including a reduction in the number of assemblies and an increased recruitment of neurons into functional assemblies. Elevated correlation between assemblies within the peri-infarct region peaked 1-week after stroke and was sustained throughout recovery. Surprisingly, distal networks, even in the region associated with the shifted cFL functional map in anaesthetized preparations, were largely undisturbed.”

      “Cortical plasticity after stroke Plasticity within and between cortical regions contributes to partial recovery of function and is proportional to both the extent of damage, as well as the form and quantity of rehabilitative therapy post-stroke [80,81]. A critical period of highest plasticity begins shortly after the onset of stroke, is greatest during the first few weeks, and progressively diminishes over the weeks to months after stroke [19,82–86]. Functional recovery after stroke is thought to depend largely on the adaptive plasticity of surviving neurons that reinforce existing connections and/or replace the function of lost networks [25,52,87–89]. This neuronal plasticity is believed to lead to topographical shifts in somatosensory functional maps to adjacent areas of the cortex. The driver for this process has largely been ascribed to a complex cascade of intra- and extracellular signaling that ultimately leads to plastic re-organization of the microarchitecture and function of surviving peri-infarct tissue [52,80,84,88,90–92]. Likewise, structural and functional remodeling has previously been found to be dependent on the distance from the stroke core, with closer tissue undergoing greater re-organization than more distant tissue (for review, see [52]).”

      “Previous research examining the region at the border between the cFL and cHL somatosensory maps has shown this region to be a primary site for functional remapping after cFL directed photothrombotic stroke, resulting in a region of cFL and cHL map functional overlap [25]. Within this overlapping area, neurons have been shown to lose limb selectivity 1-month post-stroke [25]. This is followed by the acquisition of more selective responses 2-months post-stroke and is associated with reduced regional overlap between cFL and cHL functional maps [25]. Notably, this functional plasticity at the cellular level was assessed using strong vibrotactile stimulation of the limbs in anaesthetized animals. Our findings using longitudinal imaging in awake animals show an initial reduction in firing rate at 1-week post-stroke within the peri-infarct region that was predictive of functional impairment in the tapered beam task. This transient reduction may be associated with reduced or dysfunctional thalamic connectivity [93–95] and reduced transmission of signals from hypo-excitable thalamo-cortical projections [96]. Importantly, the strong negative correlation we observed between firing rate of the neural population within the peri-infarct cortex and the number of errors on the affected side, as well as the rapid recovery of firing rate and tapered beam performance, suggests that neuronal activity within the peri-infarct region contributes to the impairment and recovery. The common timescale of neuronal and functional recovery also coincides with angiogenesis and re-establishment of vascular support for peri-infarct tissue [83,97–100].”

      “Consistent with previous research using mechanical limb stimulation under anaesthesia [25], we show that at the 8-week timepoint after cFL photothrombotic stroke the cFL representation is shifted posterior from its pre-stroke location into the area lateral to the cHL map. Notably, our distal region for awake imaging was directly within this 8-week post-stroke cFL representation. Despite our prediction that this distal area would be a hotspot for plastic changes, there was no detectable alteration to the level of population correlation, functional connectivity, assembly architecture or assembly activations after stroke. Moreover, we found little change in the firing rate in either moving or resting states in this region. Contrary to our results, somatosensoryevoked activity assessed by two photon calcium imaging in anesthetized animals has demonstrated an increase in cFL responsive neurons within a region lateral to the cHL representation 1-2 months after focal cFL stroke [25]. Notably, this previous study measured sensory-evoked single cell activity using strong vibrotactile (1s 100Hz) limb stimulation under aneasthesia [25]. This frequency of limb stimulation has been shown to elicit near maximal neuronal responses within the limb-associated somatosensory cortex under anesthesia [101]. Thus, strong stimulation and anaesthesia may have unmasked non-physiological activity in neurons in the distal region that is not apparent during more naturalistic activation during awake locomotion or rest. Regional mapping defined using strong stimulation in anesthetized animals may therefore overestimate plasticity at the cellular level.”

      “Our results suggest a limited spatial distance over which the peri-infarct somatosensory cortex displays significant network functional deficits during movement and rest. Our results are consistent with a spatial gradient of plasticity mediating factors that are generally enhanced with closer proximity to the infarct core [84,88,90,91]. However, our analysis outside peri-infarct cortex is limited to a single distal area caudal to the pre-stroke cFL representation. Although somatosensory maps in the present study were defined by a statistical criterion for delineating highly responsive cortical regions from those with weak responses, the distal area in this study may have been a site of activity that did not meet the statistical criterion for inclusion in the baseline map. The lack of detectable changes in population correlations, functional connectivity, assembly architecture and assembly activations in the distal region may reflect minimal pressure for plastic change as networks in regions below the threshold for regional map inclusion prior to stroke may still be functional in the distal cortex. Thus, threshold-based assessment of remapping may further overestimate the neuroplasticity underlying functional reorganization suggested by anaesthetized preparations with strong stimulation. Future studies could examine distal areas medial and anterior to the cFL somatosensory area, such as the motor and pre-motor cortex, to further define the effect of FL targeted stroke on neuroplasticity within other functionally relevant regions. Moreover, the restriction of these network changes to peri-infarct cortex could also reflect the small penumbra associated with photothrombotic stroke, and future studies could make use of stroke models with larger penumbral regions, such as the middle cerebral artery occlusion model. Larger injuries induce more sustained sensorimotor impairment, and the relationship between neuronal firing, connectivity, and neuronal assemblies could be further probed relative to recovery or sustained impairment in these models. Recent research also indicates that stroke causes distinct patterns of disruption to the network topology of excitatory and inhibitory cells [73], and that stroke can disproportionately disrupt the function of high activity compared to low activity neurons in specific neuron sub-types [61]. Mouse models with genetically labelled neuronal sub-types (including different classes of inhibitory interneurons) could be used to track the function of those populations over time in awake animals. A potential limitation of our data is the undefined effect of age and sex on cortical dynamics in this cohort of mice (with ages ranging from 3-9 months) after stroke. Aging can impair neurovascular coupling [102–107] and reduce ischemic tolerance [108–111], and greater investigation of cortical activity changes after stroke in aged animals would more effectively model stroke in humans. Future research could replicate this study with mice in middle-age and aged mice (e.g. 9 months and 18+ months of age), and with sufficient quantities of both sexes, to better examine age and sex effects on measures of cortical function.”

      315 - 317 Remodelling is dependent on the distance from the stroke core, with closer tissue undergoing greater reorganization than more distant tissue. There is no evidence that the more distant tissue undergoes any reorganization at all.

      We agree with the reviewer that no remodelling is apparent in our distal area. We have removed reference to our study showing remodeling in the distal area, and have amended the text as follows:

      “Likewise, structural and functional remodeling has previously been found to be dependent on the distance from the stroke core, with closer tissue undergoing greater re-organization than more distant tissue (for review, see [52]).”

      412-414 The authors speculate that a strong stimulation under anaesthesia may unmask connectivity in distal regions. However, the motivation for this paper is that anaesthesia is a confounding factor. It appears to me that, given the results of this study, the authors should argue that the functional connectivity observed under anaesthesia may be spurious.

      The incorrect word was used here. We have corrected the paragraph of the discussion and amended it as follows:

      “Consistent with previous research using mechanical limb stimulation under anaesthesia [25], we show that at the 8-week timepoint after cFL photothrombotic stroke the cFL representation is shifted posterior from its pre-stroke location into the area lateral to the cHL map. Notably, our distal region for awake imaging was directly within this 8-week post-stroke cFL representation. Despite our prediction that this distal area would be a hotspot for plastic changes, there was no detectable alteration to the level of population correlation, functional connectivity, assembly architecture or assembly activations after stroke. Moreover, we found little change in the firing rate in either moving or resting states in this region. Contrary to our results, somatosensoryevoked activity assessed by two photon calcium imaging in anesthetized animals has demonstrated an increase in cFL responsive neurons within a region lateral to the cHL representation 1-2 months after focal cFL stroke [25]. Notably, this previous study measured sensory-evoked single cell activity using strong vibrotactile (1s 100Hz) limb stimulation under aneasthesia [25]. This frequency of limb stimulation has been shown to elicit near maximal neuronal responses within the limb-associated somatosensory cortex under anesthesia [101]. Thus, strong stimulation and anaesthesia may have unmasked non-physiological activity in neurons in the distal region that is not apparent during more naturalistic activation during awake locomotion or rest. Regional mapping defined using strong stimulation in anesthetized animals may therefore overestimate plasticity at the cellular level.”

      Figures

      Figure 1 and 2: Scale bar missing.

      Scale bars added to both figures.

      Figure 2: The representative image shows a drastic reduction of the forelimb response area, contrary to the general description of the findings. It would also be beneficial to see a graph with lines connecting the pre-stroke and 8-week datapoints.

      The data for Figure 2 has been re-analyzed using a new criterion of 5X the standard deviation of the baseline period for determining the threshold for limb mapping. Figure 2 and relevant manuscript and figure legend text has been amended. In agreement with the reviewers observation, there is no increase in forelimb response area, but instead a non-significant decrease in the average forelimb area.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Overall comments

      We are pleased by the reviewers' comments and appreciate their suggestions for improvements. In addition to correcting small typos throughout the manuscript, we have made the following additions or changes in response to reviewer comments and suggestions:

      1. New complementation experiments to verify the impacts of mgtA and PA4824 on bacterial fitness in fungal co-culture.
      2. New experiments to measure intracellular Mg2+ levels in corA or mgtE mutants to strengthen our conclusion that neither of these constitutive Mg2+ transporters is required for maintaining intracellular Mg2+ levels in co-culture.
      3. New experiments to confirm that the * cerevisiae mnr2D mutant does not have a fitness defect compared to WT in co-culture. This finding rules out the possibility that metabolic defects in the mnr2D mutant restore the fitness of bacterial mgtA* mutant in co-culture and strengthens our hypothesis that Mg2+ sequestration by fungal vacuole triggers Mg2+ nutritional competition with bacteria.
      4. Clarification of bacterial species we tested in our study as suggested by Reviewer #3.
      5. Revised discussion to highlight how our findings relate to any fungal-bacterial interaction both in ecological and infection contexts and any known role of mgtA in antibiotic susceptibility, as suggested by Reviewer #2. All changes in response to the reviewer's comments have been detailed in our point-by-point response (below).

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This manuscript investigates polymicrobial interactions between two clinically relevant species, Pseudomonas aeruginosa and Candida albicans. The findings that C. albicans mediates P. aeruginosa tolerance to antibiotics through sequestration of magnesium provides insight into a specific interaction at play between these two organisms, and the underlying mechanism. The manuscript is well composed and generally the claims throughout are supported by the provided evidence. As a result, most comments are either for clarification, or minor in nature.

      We thank the reviewer for their positive comments and their suggestions for improvement.

      Major comments:

      1) For their experiments, the authors frequently switch between 30C and 37C, but there is no rationale for why a specific temperature was used, or both were. E.g. some of the antibiotic survival assays, and fungal-bacterial co-culture assays were performed at both temperatures, while the colistin resistance, fitness competition and RNA sequencing were performed at 30C. Given the fact that the two organisms are both human pathogens and co-exist in human infections, it is not clear why 30C was used. The authors should provide clarity for why these two temperatures were used.

      We thank the reviewer for raising this point. Fungal-bacterial interactions occur in a range of temperatures in ecological contexts (e.g., in soil or on plants) or during infection in animal hosts. Both 30oC and 37oC degree temperatures are used in C. albicans studies whereas 37oC is most preferred for P. aeruginosa studies. By providing data from both temperatures for critical experiments, we demonstrate that our findings are not dependent on temperature. Our studies also allow for an easy comparison to previously published studies performed at both temperatures. We chose to screen initial co-culture conditions showing fungal antagonism at 30oC, as C. albicans cells can reach higher CFUs than at 37oC due to growth in the single-celled yeast form.

      We agree with the reviewer that 37oC is more physiologically relevant for conditions under which these two species coexist in animal hosts. Thus, we tested our findings of Mg2+ competition and antibiotic survival at 37oC.

      We now clarify our reasoning in the revised Materials and Methods section as follows: "We chose 30oC for the initial co-culture assays for two reasons. First, C. albicans cells reached higher CFU at 30oC than 37oC, which would impose a stronger competition with bacteria. Second, C. albicans cells form hyphae at 37oC, which can have multiple cells in one filament and thus confound CFU measurements. We further confirmed that our findings of Mg2+ competition are independent of temperatures by setting up co-culture assays at both 30oC and 37oC."

      2) Lines 184-191: It would be useful to measure intracellular Mg2+ (using the Mg sensor) in the corA and mgtB tn mutants in media as well as the fungal spent media, to provide stronger support for the claim that "MgtA is a key bacterial Mg2+ transporter that is highly induced under low Mg2+ conditions".

      We thank the reviewer for this suggestion. Based on our experiments, neither CorA or MgtE are induced (in RNA-seq analyses) nor required in co-culture (in Tn-seq analyses), suggesting neither is involved in Mg2+ competition with C. albicans. In contrast, MgtA is highly induced in co-culture. Loss of mgtA significantly reduces bacterial fitness in co-culture and intracellular Mg2+ levels only in C. albicans-spent BHI, but not fresh BHI. These results suggest that MgtA is the key Mg2+ transporter required for bacterial Mg2+ uptake and fitness in co-culture.

      Nevertheless, we agree with the reviewer that despite being constitutively expressed, CorA or MgtE might play an important role in importing Mg2+ in BHI and C. albicans-spent BHI. To test this possibility, we performed a new experiment suggested by the reviewer (now included in the revised manuscript) in which we measured intracellular Mg2+ levels in corA or mgtE loss-of-function mutants in BHI versus C. albicans-spent BHI, and compared them to intracellular Mg2+ levels in a mgtA loss-of-function mutant strain. We find that lack of either corA or mgtE does not significantly reduce bacterial Mg2+ levels in C. albicans-spent BHI compared to DmgtA mutant (Fig. S7C). Thus, our results strengthen our conclusion that MgtA is the key Mg2+ transporter that gram-negative bacteria use to overcome fungal-mediated Mg2+ sequestration.

      3) Line no. 276. Does the mnr2∆ S. cerevisiae mutant have a growth defect compared to the WT? This would test whether the effect of the mnr2 mutant on P. aeruginosa fitness is strictly due to Mg2 and not due to reduced growth or metabolism of the mutant.

      We agree with the possibility raised by the reviewer. In new experiments included with our revision as Figure S10, we find that the S. cerevisiae mnr2 deletion mutant exhibits similar CFU as WT in monoculture as well as co-culture. Thus, the rescuing effect of mnr2D is less likely due to reduced growth or metabolism.

      4) The authors use the term 'antibiotic resistance' throughout the manuscript. However, the assays they perform do not directly test for antibiotic resistance which is defined as the ability to grow at higher concentrations of antibiotics (e.g. as measured by MIC tests). The authors should rephrase their phenotype as antibiotic survival or antibiotic tolerance.

      We agree with the reviewer and thank them for raising this point. We replaced the phrase 'antibiotic resistance' with 'antibiotic survival' throughout the revised manuscript. We also accordingly changed our title to 'Widespread fungal-bacterial competition for magnesium lowers antibiotic susceptibility'

      5) Also, the authors have two different assays, both measuring survival in antibiotic, but one is called a colistin resistance assay (line 508) and the other a colistin survival assay (line 523). It's not obvious what is the difference between what is being assayed in the two experiments, except perhaps the growth phase of the cells when they are exposed to the antibiotic? The authors should explain the difference, and the rationale for using two different assays.

      We thank the reviewer for raising this point. In the revised manuscript, we explain the rationale of our two assays. The first assay measures the bacterial survival after colistin treatment in C. albicans-spent BHI, and the second measures the bacterial survival after colistin treatment in co-culture with C. albicans. We performed both assays because C. albicans-spent BHI mimics Mg2+-depleted conditions by C. albicans but might not represent all aspects of fungal presence in co-culture. To make sure our findings are consistent across these two experiments, we specify the difference in these two assays in the revised manuscript as the following: "Since fungal spent media cannot fully recapitulate fungal presence in co-culture conditions, we tested whether fungal co-culture also conferred increased colistin survival."

      Minor comments:

      • For almost all the figures, blue and orange dots are used for 'monoculture' and 'coculture' respectively, while orange and black dots are used for WT and the mgtA mutant. However, the black and blue dots are hard to tell apart, and for several figure sub-panels, the legends are not provided (e.g. figures 2D, 2F, S9H), making it a little confusing to figure out what is being shown. It would be best if the WT and mgtA symbols were in colors completely different from the monoculture/co-culture colors, making it easier to tell those apart.

      We have updated these figures as the reviewer suggested.

      Line no 122 and Figure 1A. The term "defense genes" in bacteria typically refers to genes conferring protection against phage infections. Perhaps the authors can use a different term (e.g. 'protective genes').

      We agree with the reviewer. We have changed "defense genes" to "fungal-defense genes" to disambiguate the terms.

      Line no 186. 'However, neither MgtA...' should be 'However, neither MgtE...'

      We thank the reviewer for pointing out this typo. We have fixed this in our revision.

      Line no 268. Does fungal-mediated Mg2+ competition extend to Gram positive bacteria?

      We thank the reviewer for raising this interesting point. MgtA is prevalent in diverse gram-negative bacteria but rare in gram-positive bacteria. Using the fitness effect of mgtA mutants in co-culture vs monoculture allowed us to infer Mg2+ competition easily for diverse gram-negative bacteria. Currently, we do not have the experimental tools to extend this finding to gram-positive bacteria. Co-culture growth kinetics for gram-positive bacteria are also likely to be different from gram-negative bacteria in a way that makes direct comparisons challenging. We have clarified our writing in the revised manuscript: "This mode of competition might be highly specific between fungi and diverse gram-negative g-proteobacteria we have tested.... Whether fungi can suppress gram-positive bacteria through the same mechanism of Mg2+ competition remains an open question."

      Line no 314. It is unclear whether the 'transient co-culture' is the same or a different assay as the colistin survival assay.

      We apologize for the confusion and have removed the word 'transient' for clarity. The assays is the same as the 'colistin survival assay in fungal co-culture,' where we co-cultured log-phase P. aeruginosa cells with C. albicans for 5 hours and treated them with colistin.

      Line no 316. For the bacterial survival assays shown in figures 3 and 4 (and other supplementary figures), please provide absolute numbers as cfu (as in figures 1 and 2), as opposed to a percentage, for cell counts. This will allow readers to appropriately interpret the data.

      We thank the reviewer for this suggestion. We now include the raw CFU counts of colistin survival assays in Fig 3 and 4 and other supplementary figures in new supplementary figures (Fig. S11, S13, S14, S15, and S17) in our revision.

      Line no 934-5: Italicize P. aeruginosa.

      This typo has been fixed in our revision.

      Reviewer #1 (Significance (Required)):

      This study identifies a novel interaction between two the co-infecting human pathogens Pseudomonas aeruginosa and Candida albicans, where C. albicans causes Mg2+ limitation for P. aeruginosa. Further, the authors show that this interaction affects levels of antibiotic resistance, as well as the adaptive mutations seen during the evolution of antibiotic resistance. This advances the field by delineating how microbial interactions can affect clinically relevant phenotypes, and potentially clinical outcomes. The study should be of interest to a broad audience of researchers studying microbial ecology, evolutionary biology, microbiology, and infectious diseases.

      We are grateful for the reviewer's positive appraisal.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      This paper looks at the interaction between the fungus Candida albicans (Ca) and the bacterium Pseudomonas aeruginosa (Pa), which are found together in some environments. Co-culture experiments showed that Ca can inhibit the growth of Pa. The goal of this study is to determine the reason for this phenomenon and how widespread it is. This was performed by Tnseq analysis of Pa that identified 3 genes which showed significant decreases in the presence of Ca. Interestingly these were all in an operon that was recognized by the authors as being induced by RNAseq during co-culture. One of these genes, mgtA is a known Mg2+ transporter and therefore the remainder of the paper discusses the importance of competition for Mg2+.

      The experiments seem to be well carried out and appropriately controlled.

      We thank the reviewer's appreciation of our science and the rigor of our experiments.

      The use of the Mg2+ genetic sensor reporter in Pa is an interesting approach to determine the intracellular Mg2+ concentrations, however how these levels relate to one another between different experiments is not clear. In Fig. S5, the levels are 5 AU for growth in minimal media +low (10uM) Mg2+ and 38 AU for growth in minimal media+high (10mM) Mg2+. But the levels seen in Figure 1E are all much lower. With such low levels, it is difficult to determine if the impact of ∆PA4824 and ∆mgtA (while perhaps additive) are relevant. Would differences be seen with these various strains grown under different conditions?

      We thank the reviewer for this query. The reviewer is right in that we do not use absolute quantification of intracellular Mg2+ levels. While our Mg2+ genetic sensor assay does not facilitate comparison of absolute Mg2+ levels across experiments, it provides a robust comparative measurement of relative intracellular Mg2+ levels in mutants versus WT cells, or between two different media conditions.

      Using this Mg2+ genetic sensor assay, we tested intracellular Mg2+ levels of WT P. aeruginosa under various media conditions. We found that lower intracellular Mg2+ levels in P. aeruginosa cells and the requirement of mgtA in these media are well-correlated at lower total Mg2+ levels in media (Fig. S9A-E). In contrast, there are no significant differences in intracellular Mg2+ levels between DmgtA (or DPA4824) and WT cells in BHI media, which has higher total Mg2+ levels than fungal-spent BHI media. Our experiments reveal that the lack of mgtA or PA4824 only affects intracellular Mg2+ levels when P. aeruginosa is cultured in media below a threshold level of Mg2+ concentration in media.

      The experiments suggesting that the protein PA4824 is also a Mg2+ transporter seem to be related only to alpha fold predictions.

      We clarify that our speculation that PA4824 encodes a potential novel Mg2+ transporter was first motivated by finding that it is induced in low Mg2+ conditions, its genetic importance in Tn-seq experiments independent of mgtA, and our finding that cells with loss-of-function mutations in PA4824 experience lower intracellular Mg2+ than WT cells. However, the reviewer is correct that this statement is speculative based on the Alphafold prediction. In the revised manuscript, we have clarified this point as the following: "Based on our co-culture RNA-seq and Tn-seq experiments, results from the Mg2+ genetic sensor assay, and the Alphafold prediction of PA4824 protein structure, we speculate that PA4824 potentially acts as a novel Mg2+ transporter."* * Is the statement in line 186 a typo? It is stated that "neither MgtA nor CorA was implicated in competition". Do the authors actually mean "MgtE"?

      It is a typo. We thank the reviewer for pointing this out and have changed this to "MgtE".

      Reviewer #2 (Significance (Required)):

      Ca and Pa are known to inhabit the same niches and previous studies have shown both can have antagonist effects on one another. Nutritional competition is one mechanism of antagonism that has not been that well studied between these two genera. That makes the finding of some significance and relevant to those with an interest in either of these microbes and co-infections. The authors also found that it was not just Ca that had this effect, but other fungi as well. And this effect was not just reserved to effect Pa, but also other bacteria, suggesting a more global impact.

      We thank the reviewer for an accurate summary of our findings.

      However, diminishing the impact of this finding is the question as to whether this is simply a phenomena seen under the very specific laboratory conditions tested here. Furthermore how these findings exactly relate to any infection environment is not clear.

      Fungal-bacterial interactions occur in a variety of broad biological contexts, including during infection in animal hosts or in environmental-associated microbial communities. Our study is the first to identify nutritional competition for Mg2+ as one of the most important axes of competition between fungi and bacteria. Our study also identifies MgtA as one of the key bacterial genes that mediates this interaction. MgtA is only induced upon experiencing low Mg2+ conditions; the fact that most gram-negative bacteria encode MgtA implies they must encounter low Mg2+ conditions and face fitness consequences in those conditions. To address the reviewer's concerns, we also highlight three additional points in our revised Discussion:

      1. Fungal-bacterial competition for Mg2+ is not restricted only to BHI media alone. We also found the same phenomenon in TSB media medium. Indeed, we show (Fig. S9F) also that Mg2+ competition occurs whenever the environmental Mg2+ level is lower than 0.45mM, a critical threshold for fungi and bacteria to compete for this vital ion.
      2. During infection in cystic fibrosis airways, proteomic experiments and Mg2+ measurement in CF sputum both suggest that * aeruginosa* experiences Mg2+ restriction.
      3. Many previous studies have shown that many Gram-negative bacteria, including Salmonella Typhimurium, encounter reduced magnesium concentrations upon infection of hosts (PMID: 29118452). Our discovery that fungal co-culture may generally exacerbate fitness challenges associated with low magnesium levels is of high importance to all studies of gram-negative bacteria, not just to Pa.
      4. In addition to infections in animal hosts, low Mg2+ is associated with worse outcomes of infections in plants. Our study suggests the importance of studying the role of Mg2+ competition in various infection contexts and the strategies of manipulating Mg2+ levels or fungal-bacterial interactions to constrain polymicrobial infectious diseases in diverse eukaryotic hosts and ecological conditions. The authors also seem to vastly overinterpret the significance of their findings; the impact on Pa is only to slow growth, not necessarily effect fitness, per se. The final number of bacteria appears to be the same, it just takes slightly longer to get there.

      We are puzzled by this comment from the reviewer; slow growth IS a fitness effect! Although we agree with the reviewer's point that C. albicans is more likely to inhibit bacterial growth rate than viability (bacteriostatic, not bacteriocidal), there are many bacteriostatic antibiotic mechanisms.

      In our co-culture assay, bacterial CFUs after 40 hours in co-culture are 10-100 times lower than in monoculture (this is not a subtle effect!). After 40 hours, bacterial cultures have already reached the stationary phase, which is why even slower growing bacterial cells in co-culture can 'catch up' (they are still lower by nearly 10-fold), despite fungal inhibition. Moreover, the co-culture condition provided enough of a fitness challenge to allow us to identify bacterial protective genes even in a pooled assay.

      The authors speculate that that since Mg2+ supplementation did not totally restore growth to Pa during co-culture, that other Mg2+ independent "axes of antagonism" must exist. This also tends to diminish the significance of these finding.

      Again, we are puzzled by this comment from the reviewer. Fungal-bacterial competition, like all microbial competition, is a multifactorial process, so we should not be surprised that Mg2+ isn't the only axis of competition. Indeed, our study reinforces the importance of investigating all potential axes of competition to get a complete understanding of the mechanisms of fungal-bacterial competition.

      The importance of mgtA on antibiotic susceptibility has been well studied in a number of bacteria including Pa making these findings generally confirmatory.

      We would like to clarify this comment. To the best of our knowledge, mgtA in P. aeruginosa has not been reported in antibiotic susceptibility studies. Instead, P. aeruginosa mgtE is induced upon treatment with aminoglycoside antibiotics, but its expression does not change antibiotic resistance (PMID: 24162608).

      The reviewer may be referring to studies in S. Typhimurium, where the DmgtA mutant shows increased susceptibility to nitrooxidative stress (PMID: 29118452) and to cyclohexane (PMID: 18487336), suggesting Mg2+ homeostasis might be generally important for bacterial survival to antimicrobial treatments. Although this is not the main focus of our study, we now include these references in our revised discussion to provide readers with more background on the relevance of our work: "Mg2+ has been implicated in altering the susceptibility of gram-negative bacteria to antibiotics other than colistin. For instance, in S. Typhimurium, impaired mgtA or Mg2+ homeostasis increases susceptibility to cyclohexane or nitrooxidative stress. In line with these observations, our study also highlights the importance of studying how Mg2+ homeostasis broadly impacts antimicrobial resistance in gram-negative bacteria."

      The importance of different mutations that emerge in Pa during mono vs. co-culture in the presence of colistin is not clearly explained. Why should co-culture inhibit the emergence of hypermutator Pa strains?

      We thank the reviewer for the opportunity to clarify this important point. Previous studies have shown, both in Pa as well as other bacteria, that hypermutator strains often arise when bacteria adapt to strong and continuous antibiotic stress (PMID: 28630206) to maximize exploration of mutation space necessary to acquire beneficial resistance mutations even though hypermutation itself is inherently deleterious to bacterial fitness. We show that fungal co-culture protects P. aeruginosa from high concentrations of colistin by sequestering the Mg2+ co-factor required for colistin action (Fig. 4C). Thus, under co-culture conditions, bacteria experience lower levels of colistin than the levels administered and are subject to less severe fitness challenges, allowing them to eschew the deleterious route of acquiring adaptive mutations with hypermutation.

      Our discovery that bacteria have an entirely different means of enhancing colistin resistance under fungal co-culture (or low Mg2+) conditions is one of the highlights of our study. Understanding the biological basis of this novel model of colistin resistance will be an active area of investigation to pursue in the future.

      No additional experiments are likely needed but the authors should be encouraged to place their findings more clearly in what is already known in the field as well as articulate the limitations of their study.

      We thank the reviewer for their detailed comments and suggestions. We hope our revisions have both clarified the importance and limitations of our study and provided the right context sought by the reviewer.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In this study, Hsieh et al. find a critical axis of competition between Pseudomonas aeruginosa and Candida albicans is Mg2+ sequestered by Candida. The authors find that use of BHI, which is has lower Mg2+ levels compared to other media, allowed this discovery. The authors further demonstrate critical genes for this axis in multiple gammaproteobacteria and fungal species. The authors further show that fungal Mg2+ sequestration promotes polymyxin resistance in multiple gammaproteobacteria and show that it alters the course of Pseudomonas aeruginosa evolution of polymyxin resistance. Finally, they show that for populations evolved polymyxin resistance in the presence of Candida, removal of Candida by antifungal treatment re-induces sensitivity to polymyxins.

      We thank the reviewer for a concise and accurate summary of our study.

      Major comments: -The claims and conclusions are generally supported; however, a key phenotype of the ∆mgtA and ∆PA4824 mutants should be complemented in trans or in a second site of the chromosome.

      We thank the reviewer for this comment and agree with the suggestion. In our revised manuscript, we now provide results of new complementation experiments recommended by the reviewer, which find that expression of PA4824 or mgtA in trans restore the fitness cost of either deletion mutant (Fig. S4C and S4D).

      -The authors note that "This mode of competition appears to be highly specific between fungi and gram-negative bacteria." However, it does not appear that gram-positive bacteria were tested in competition with fungi. Additionally, the only gram-negative tested were gammaproteobacteria (although do represent diverse gammaproteobacteria). This could be addressed by clarifying the text or OPTIONAL additional experimentation.

      We agree with the reviewer. We had intended to highlight that we had only tested this mode of competition between fungi and gram-negative bacteria, but inadvertently phrased this to suggest that gram-positive bacteria are not subject to this competition. As we highlight in our response to Reviewer 1, we are unable to test this (so far) for gram-positive bacteria. We clarify this in our revision: ""This mode of competition might be highly specific between fungi and diverse gram-negative g-proteobacteria we have tested.... Whether fungi can suppress gram-positive bacteria through the same mechanism of Mg2+ competition remains an open question."

      -Figure 3A: is this depiction of modifications on the O-antigen correct? PhoQ- and PmrB-activated enzymes seem to modify the lipid A portion of LPS (eg PMID: 31142822)

      We thank the reviewer for noting this error, which we have now fixed in the revision.

      • For many of the figures, multiple t-tests are used and it seems like perhaps an ANOVA with multiple comparisons would be more appropriate

      We thank the reviewer for this feedback. In our revision, we now use Dunnett's one-way ANOVA test for figures with multiple comparisons; our conclusions are unchanged.

      Minor comments: - The text and figures are clear and accurate

      We thank the reviewer for this feedback.

      -the cited nutritional immunity reviews are out of date (e.g. reference 37) and there are more recent reviews on the topic (e.g. PMID: 35641670)

      We have added the suggested reference in our revision.

      -Line 293: Unclear why polymyxin resistance would be "unexpected" following the explanation of why Mg2+ depletion might confer it

      We agree and have removed 'unexpected.'

      -Line 318: "antibitoics" typo

      We thank the reviewer for pointing out this typo, which we have now corrected.

      Reviewer #3 (Significance (Required)):

      The following aspects are important:

      • General assessment: This study is very mechanistic, identifying the role of Mg2+ sequestration by fungi that limit gram-negative bacterial growth in Mg2+ deplete environments. The strengths are that relevant Mg2+ acquisition genes are identified or tested in Pseudomonas aeruginosa, the main test organism, as well as Salmonella enterica and Escherichia coli. Additionally, the authors identify a relevant Mg2+ mechanism in fungal species tested, including showing the importance with a genetic knockout. The limitations are relatively minor, and include lack of complementation, potential issues in model figure depiction of LPS modifications, and potential minor issues in statistical tests used. Future directions discussed include expanding analysis to clinical isolates, which is outside the scope of this manuscript which already showed the same mechanism in diverse gammaproteobacterial.

      We thank the reviewer for their positive appraisal.

      • Advance: This study has two major advances: The first is uncovering this critical Mg2+ sequestration axis in competition between fungal species and gammaproteobacteria. The second is the finding that the Mg+ sequestration induces polymyxin resistance and alters the evolutionary path to further polymyxin resistance. While nutrient metals as an axis of competition is not a conceptual advance, the specific role of Mg2+ and its affect on evolution of polymyxin antibiotic resistance is a conceptual advance.
      • Audience: I think this study would be of interest to a relatively broad audience. The study itself touches on multiple fields including intermicrobial competition, nutritional immunity, antimicrobial resistance, and microbial evolution. Additionally, there are clinical implications for the potential to use antifungals to resensitize polymyxin-resistant P. aeruginosa to polymyxins.
      • My field of expertise is bacterial genetics and physiology, nutritional immunity, and bacterial cell envelope. I do not have expertise in fungus.

      We appreciate the reviewer's positive and constructive feedback on our study and for highlighting the relevance of our research to a broader audience in microbiology and evolution. We do hope our mechanistic understanding of fungal-bacterial competition will spark further conversation or collaboration between evolutionary microbiologists and physician-scientists.